<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>exortech.com &#187; weekly release</title>
	<atom:link href="http://exortech.com/blog/tag/weekly-release/feed/" rel="self" type="application/rss+xml" />
	<link>http://exortech.com/blog</link>
	<description>Peripatetic thinking</description>
	<lastBuildDate>Tue, 01 Dec 2009 05:56:13 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Weekly Release #53 &#8211; Environment bug bites</title>
		<link>http://exortech.com/blog/2009/11/30/weekly-release-53-environment-bug-bites/</link>
		<comments>http://exortech.com/blog/2009/11/30/weekly-release-53-environment-bug-bites/#comments</comments>
		<pubDate>Tue, 01 Dec 2009 05:47:33 +0000</pubDate>
		<dc:creator>exortech</dc:creator>
				<category><![CDATA[release blog]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[weekly release]]></category>

		<guid isPermaLink="false">http://exortech.com/blog/?p=239</guid>
		<description><![CDATA[Last week I encountered one of the more bizarre bugs of my career. Following the release, gaps started appearing on charts. The strange thing was that the data was all in the database; it just wasn&#8217;t coming through the user interface. Unfortunately, for this release, rolling back wasn&#8217;t really an option, so we needed to [...]]]></description>
			<content:encoded><![CDATA[<p>Last week I encountered one of the more bizarre bugs of my career. Following the release, gaps started appearing on charts. The strange thing was that the data was all in the database; it just wasn&#8217;t coming through the user interface. Unfortunately, for this release, rolling back wasn&#8217;t really an option, so we needed to quickly identify and correct the source of the problem.</p>
<p>After spending the better part of a day banging our heads against the problem, we had isolated it to a few statements. The code had changed in this area in the last release (one of the advantages of weekly releases is that it is easier to pinpoint the source of a problem as each release contains only one week&#8217;s worth of changes) but not in a way that should have caused the problem we were seeing. And everything ran correctly in the test environment. It seemed like the problem could be environment-related.</p>
<p>We didn&#8217;t have the necessary hooks and logging to properly exercise the problem area in isolation, but after a quick patch release, we did (another benefit of zero-downtime deployment is that it allows greater flexibility with the time and frequency of deployment). After the patch deploy, we were able to see that the production system returned one fewer result than the corresponding functionality run against the test environment. If the system was supposed to return only one result, the production system was returning none &#8211; which explained the gaps on the chart. The problem was clearly at a SQL driver or database level.</p>
<p>Just prior to the release, the MySQL driver had been upgraded to the latest version (5.1.10). We had been running this version in the test environment for several weeks without issue, so it seemed odd that it could be the source of the problem. The version of the database, however, was inconsistent between the two environments (MySQL 5.1.39 in production vs 5.1.36 in test). The newer database version had been running fine in the production environment for several days and hadn&#8217;t been touched with the release, so it seemed odd that this could be the source of the problem. This was enough to go from, however, and we reverted the version of the MySQL driver on the web servers which ended up fixing the problem.</p>
<p>Later, while trolling the MySQL release logs, I <a href="http://bugs.mysql.com/bug.php?id=47963">came across the nasty bug that bit us</a>. Evidently, the 5.1.10 driver had changed the format of dates in a way that triggered this bug in 5.1.39. So it was the combination of these two version of the software that caused the problem &#8211; each in isolation worked fine. The issue has been fixed in MySQL 5.1.41, but it&#8217;s a pretty serious bug in core SQL functionality to come out of a sanctioned release.</p>
<p>Coming out of our 5 Whys analysis, the experience points to several ways that we need to tighten up our process as well.</p>
<ol>
<li>We need to separate environment changes from software releases. Even seemingly innocuous changes can have repercussions when combined with other systems. Keeping environment changes separate will make it easier and faster to pinpoint the source of  problems (ie. is it an environment problem or a software problem?).</li>
<li>We need to narrow the discrepancies between the test and production environments. Environment discrepancies are a common source of unexpected risk. While it&#8217;s not feasible to keep the environments perfectly in sync, this could be better. Ironically, it was an attempt to make the environments more consistent that caused this problem.</li>
<li>We need a more comprehensive set of automated validation tests that we can run against the system subsequent to deployment that verifies the integrity of the release. This is one of the tasks for this week.</li>
</ol>
<p>While these actions may seem fairly obvious, it often (unfortunately) takes getting bitten by these types of bugs to illuminate areas where we need to improve. Have you faced something similar? What additional actions do you take to prevent these types of environment problems?</p>
]]></content:encoded>
			<wfw:commentRss>http://exortech.com/blog/2009/11/30/weekly-release-53-environment-bug-bites/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Back from Bangalore (and Hyderabad and Mumbai)</title>
		<link>http://exortech.com/blog/2009/09/16/back-from-bangalore-and-hyderabad-and-mumbai/</link>
		<comments>http://exortech.com/blog/2009/09/16/back-from-bangalore-and-hyderabad-and-mumbai/#comments</comments>
		<pubDate>Thu, 17 Sep 2009 05:56:38 +0000</pubDate>
		<dc:creator>exortech</dc:creator>
				<category><![CDATA[agile]]></category>
		<category><![CDATA[speaking]]></category>
		<category><![CDATA[weekly release]]></category>

		<guid isPermaLink="false">http://exortech.com/blog/?p=227</guid>
		<description><![CDATA[I arrived home last night after a quick whirlwind trip to India for the Codechef conference. With three tech talks in three cities in three days, it didn&#8217;t leave much time for sightseeing. But I did have a few days at the end in Bangalore to catch up with friends and former colleagues. Here are [...]]]></description>
			<content:encoded><![CDATA[<p>I arrived home last night after a quick whirlwind trip to India for the <a href="http://www.codechef.com/techtalks">Codechef conference</a>. With three tech talks in three cities in three days, it didn&#8217;t leave much time for sightseeing. But I did have a few days at the end in Bangalore to catch up with friends and <a href="http://www.thoughtworks.co.in/">former colleagues</a>.</p>
<p>Here are the slides from the presentation. They did evolve a bit over the course of the tech talks (and hopefully improve).</p>
<div style="width:425px;text-align:left" id="__ss_2010044"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" href="http://www.slideshare.net/exortech/releasing-to-production-every-week-india" title="Releasing To Production Every Week   India">Releasing To Production Every Week   India</a><object style="margin:0px" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=releasingtoproductioneveryweek-india-090917003935-phpapp02&#038;stripped_title=releasing-to-production-every-week-india" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=releasingtoproductioneveryweek-india-090917003935-phpapp02&#038;stripped_title=releasing-to-production-every-week-india" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object>
<div style="font-size:11px;font-family:tahoma,arial;height:26px;padding-top:2px;">View more <a style="text-decoration:underline;" href="http://www.slideshare.net/">documents</a> from <a style="text-decoration:underline;" href="http://www.slideshare.net/exortech">exortech</a>.</div>
</div>
<p>The presentation also stimulated <a href="http://twitter.com/#search?q=exortech">some good side chatter on twitter</a>. In general, <a href="http://exortech.com/blog/2009/02/01/weekly-release-blog-11-zero-downtime-database-deployment/">zero-downtime database deployment</a>, <a href="http://exortech.com/blog/2009/01/18/weekly-release-blog-9-production-monitoring/">continuous monitoring</a> and WAGMI seemed to be popular topics. Thanks to everyone who made it out and contributed. Your feedback has been very helpful in refining the presentation.</p>
<p>Also, a big thanks to <a href="http://blogs.agilefaqs.com/">Naresh</a> and <a href="http://amitklein.com/">Amit</a> for organizing the event and ensuring that we were well taken care of.</p>
]]></content:encoded>
			<wfw:commentRss>http://exortech.com/blog/2009/09/16/back-from-bangalore-and-hyderabad-and-mumbai/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weekly Release Blog #40: Evolving Site Design Using CSS</title>
		<link>http://exortech.com/blog/2009/08/31/weekly-release-blog-40-evolving-site-design-using-css/</link>
		<comments>http://exortech.com/blog/2009/08/31/weekly-release-blog-40-evolving-site-design-using-css/#comments</comments>
		<pubDate>Tue, 01 Sep 2009 05:10:45 +0000</pubDate>
		<dc:creator>exortech</dc:creator>
				<category><![CDATA[agile]]></category>
		<category><![CDATA[release blog]]></category>
		<category><![CDATA[css]]></category>
		<category><![CDATA[weekly release]]></category>

		<guid isPermaLink="false">http://exortech.com/blog/?p=209</guid>
		<description><![CDATA[I&#8217;m a big fan of CSS &#8211; it keeps things looking consistent, it separates structure and design, and it keeps markup clean, simple and maintainable. The greatest strength and weakness of stylesheets is their scope. A large number of pages are typically styled by a single stylesheet. This is great for consistency and reuse, but [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m a big fan of <a href="http://en.wikipedia.org/wiki/Cascading_Style_Sheets">CSS</a> &#8211; it keeps things looking consistent, it separates structure and design, and it keeps markup clean, simple and maintainable. The greatest strength and weakness of stylesheets is their scope. A large number of pages are typically styled by a single stylesheet. This is great for consistency and reuse, but it means that it can be difficult to assess the impact of changing a style without verifying every page that uses it &#8211; in every supported browser! Changing a global style could break the layout somewhere in the site in ways that could easily go unnoticed.</p>
<p>Hence, when releasing software to production every week, the cost of making style changes can be prohibitive. It is difficult to regression test all impacted pages in all browser combinations within a reasonable amount of time. Another challenge is that in CSS there are many ways to achieve the same thing, though each approach could render differently in different browsers. And in a site that is changing rapidly, existing designs need to evolve to incorporate new features, usability improvements, user feedback and better ways of doing things. </p>
<p>Stylesheets can&#8217;t be seen as being too risky to change. Otherwise they will be subverted through local inline styles and other workarounds (the same goes for common javascript libraries or pretty much any shared component for that matter). While local styles may seem expedient for that particular page or feature, they only introduce inconsistencies and make the site more difficult to maintain in the long term. So what&#8217;s a web dev to do?</p>
<p>We&#8217;ve developed an approach to help deal with this problem. I accept that there are probably smarter ways to achieve this &#8211; if you have a better idea, please let me know. Here goes:</p>
<ol>
<li>All styles should be relative to some top-level class. For example:<br />
	<code>form.standard { margin: 0px 150px; }</code><br />
The class describes the type of component that is being styled &#8211; the type of table, form, or component we are designing here. In the example, we are defining a &#8220;standard&#8221; form, which is a particular type of form. Using a top-level class allows for other types of forms to be styled as required. </li>
<li>All sub-elements are defined relative to the top-level class:<br />
<code>form.standard label { margin-left: -150px; }</code><br />
All labels for a &#8220;standard&#8221; form have a negative left margin. It is acceptable (even preferable) to style bare tags as long as the style is relative to a top-level class. This keeps the markup simple and consistent. We only add CSS classes to child elements as required.</li>
<li>The corollary to the above is that it is not acceptable to style bare tags (unless we are doing a style reset). The problem with styling bare tags directly is that it locks us in to one specific style and that makes it difficult to evolve the design of the site. For example:<br />
<code>label { font-weight: bold }</code><br />
means that all form labels will be in bold. If we introduce a form later that shouldn&#8217;t have bold labels, we will have to explicitly override this style (which tends to be brittle and limiting).</li>
<li>Now, if we need to evolve an existing style, we have a few options. Say we want to replace the &#8220;standard&#8221; form with a new design. Rather than change the styles for the &#8220;standard&#8221; form class directly, which would immediately impact every page where this style is used, we can incrementally and selectively rollout the new design on a page-by-page basis. If the new style is significantly different from the initial style, we can fork the original style by defining a new top-level class:<br />
<code>form.danger { background-color: red; }</code><br />
Simply by changing the class attribute for each form element from &#8220;standard&#8221; to &#8220;danger&#8221;, we can then roll out the new style to select pages testing the design in all supported browsers as we go. Think of this being like continuous integration for site design.</li>
<li>If the style changes are relatively minimal, we can override the style for specific elements. One way to achieve this is to use multiple top-level CSS classes. For example, we could incrementally apply both the &#8220;standard&#8221; and the &#8220;danger&#8221; classes to the form elements on each page testing as we go. The &#8220;danger&#8221; class could override styles as required &#8211; though dealing with precedence can be tricky. Alternately, the new class could be defined relative to a top-level identifier. This solves the precedence problem as styles defined relative to an identifier take precedence over styles that are relative to a class. Another option is to define specific classes for the child elements to be restyled &#8211; but this means changing a lot more markup during the rollout.</li>
</ol>
<p>That&#8217;s it. It seems pretty simple &#8211; intuitive even. But I haven&#8217;t found many references to how others tackle this problem. Again, if you have a better idea, please let me know.</p>
]]></content:encoded>
			<wfw:commentRss>http://exortech.com/blog/2009/08/31/weekly-release-blog-40-evolving-site-design-using-css/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Frequent Releases Reduce Risk: Talk at VanDev</title>
		<link>http://exortech.com/blog/2009/07/22/frequent-releases-reduce-risk-talk-at-vandev/</link>
		<comments>http://exortech.com/blog/2009/07/22/frequent-releases-reduce-risk-talk-at-vandev/#comments</comments>
		<pubDate>Thu, 23 Jul 2009 06:01:12 +0000</pubDate>
		<dc:creator>exortech</dc:creator>
				<category><![CDATA[agile]]></category>
		<category><![CDATA[speaking]]></category>
		<category><![CDATA[weekly release]]></category>

		<guid isPermaLink="false">http://exortech.com/blog/?p=193</guid>
		<description><![CDATA[Last week I delivered a presentation at the Vancouver Software Developer Network meetup on the relationship between risk and frequent releases. In the presentation I proposed that building the capability to release software frequently (daily or weekly) actually reduces risk and that concerns about frequent releases are founded on a localized understanding of risk. You [...]]]></description>
			<content:encoded><![CDATA[<p>Last week I delivered a presentation at the <a href="http://www.meetup.com/VanDev/">Vancouver Software Developer Network</a> meetup on the relationship between risk and frequent releases. In the presentation I proposed that building the capability to release software frequently (daily or weekly) actually reduces risk and that concerns about frequent releases are founded on a localized understanding of risk. You can find the slidecast for the presentation below.</p>
<div style="width:425px;text-align:left" id="__ss_1748981"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" href="http://www.slideshare.net/exortech/frequent-releases-reduce-risk-1748981" title="Frequent Releases Reduce Risk">Frequent Releases Reduce Risk</a><object style="margin:0px" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=frequentreleasesreducerisk-090721101327-phpapp01&#038;rel=0&#038;stripped_title=frequent-releases-reduce-risk-1748981" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=frequentreleasesreducerisk-090721101327-phpapp01&#038;rel=0&#038;stripped_title=frequent-releases-reduce-risk-1748981" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object>
<div style="font-size:11px;font-family:tahoma,arial;height:26px;padding-top:2px;">View more <a style="text-decoration:underline;" href="http://www.slideshare.net/">documents</a> from <a style="text-decoration:underline;" href="http://www.slideshare.net/exortech">exortech</a>.</div>
</div>
<p>My intention with the presentation was to lay out the basis for an argument that would be subsequently debated. This proved to be more challenging than my <a href="http://exortech.com/blog/2009/06/10/speaking-at-devteach-vancouver/">last presentation on frequent releases</a> which more of an experience report. It&#8217;s difficult to clearly convey the layers of a logical argument through a presentation. I&#8217;m impressed by how lawyers manage to do it.</p>
<p>One of the things that I asked attendees to do was to list the top three things that are preventing them from releasing software every week. I&#8217;ve compiled their responses in the chart below:<br />
<img src="http://exortech.com/blog/wp-content/uploads/2009/07/frequent-releases.png" alt="frequent-releases" title="frequent-releases" width="434" height="222" class="alignnone size-full wp-image-198" /></p>
<p>The results are hardly scientific, but I was heartened to see substantial overlap between the concerns that I addressed in the presentation and those identified by the audience. It turned out quite a large contingent of the audience are rich client developers, which brings its own shared of deployment headaches. Many are also working in regulated industries that require their software to be submitted to third party certification agencies for review. To those that attended the session, thanks for participating and for sharing your concerns with me.</p>
]]></content:encoded>
			<wfw:commentRss>http://exortech.com/blog/2009/07/22/frequent-releases-reduce-risk-talk-at-vandev/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weekly Release Blog #32: When things go down that shouldn&#8217;t</title>
		<link>http://exortech.com/blog/2009/07/07/weekly-release-blog-32-when-things-go-down-that-shouldnt/</link>
		<comments>http://exortech.com/blog/2009/07/07/weekly-release-blog-32-when-things-go-down-that-shouldnt/#comments</comments>
		<pubDate>Tue, 07 Jul 2009 14:44:54 +0000</pubDate>
		<dc:creator>exortech</dc:creator>
				<category><![CDATA[release blog]]></category>
		<category><![CDATA[weekly release]]></category>

		<guid isPermaLink="false">http://exortech.com/blog/?p=186</guid>
		<description><![CDATA[Last week, our site sustained a prolonged outage during core business hours. While testing their backup power systems, our data centre provider tripped a breaker leading to a cascade of failures that, among other things, produced a power surge that fried our hardware firewall&#8217;s power supply. The hardware firewall is one of those standard pieces [...]]]></description>
			<content:encoded><![CDATA[<p>Last week, our site sustained a prolonged outage during core business hours. While testing their backup power systems, our data centre provider tripped a breaker leading to a cascade of failures that, among other things, produced a power surge that fried our hardware firewall&#8217;s power supply. The hardware firewall is one of those standard pieces of system hardware that are so simple that they are assumed to be failure resistant &#8211; one of the pieces of an infrastructure least likely to fail. The reality is that they are antiquated, commodity hardware that the host provider has long ago paid off and that have sustained the load of numerous sites before ours. The question is not <em>if</em> they&#8217;re going to fail, but <em>when</em>. And the implications of their failure is quite severe. </p>
<p>By design, the firewall serves as the single access point into and out of the site. Even though we had taken redundancy and failover measures in the web server, application server and database clusters behind the firewall, it doesn&#8217;t matter much if the traffic can&#8217;t get though. Essentially the hardware firewall is a big old <a href="http://en.wikipedia.org/wiki/Single_Point_of_Failure">SPOF</a>.</p>
<p>Normally a hardware power supply is one of those things that a data centre can very quickly replace. But when the data centre itself is in turmoil because of a significant outage, replacing a power supply for some small customer is the last thing on their mind. When it comes down to it, the only one who cares about your site is you and your customers. We, of course, knew about the failure immediately because of the <a href="http://exortech.com/blog/2009/01/18/weekly-release-blog-9-production-monitoring/">monitoring we have in place</a>. But there wasn&#8217;t much we could do. When managed hardware fails, there&#8217;s not much you can do except log a ticket (assuming that the ticketing system is up &#8211; which in this case it wasn&#8217;t), sit back and wait. Of course there are SLAs in place (there&#8217;s a one hour replacement window on these types of things), but they don&#8217;t keep your site up and going through the negotiations to sort out the ramifications of a failure are a big waste of everyone&#8217;s time. The bottom line is that we need to eliminate this SPOF from our infrastructure by obtaining a secondary firewall that we can failover to.</p>
]]></content:encoded>
			<wfw:commentRss>http://exortech.com/blog/2009/07/07/weekly-release-blog-32-when-things-go-down-that-shouldnt/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Speaking at DevTeach Vancouver</title>
		<link>http://exortech.com/blog/2009/06/10/speaking-at-devteach-vancouver/</link>
		<comments>http://exortech.com/blog/2009/06/10/speaking-at-devteach-vancouver/#comments</comments>
		<pubDate>Wed, 10 Jun 2009 15:25:20 +0000</pubDate>
		<dc:creator>exortech</dc:creator>
				<category><![CDATA[agile]]></category>
		<category><![CDATA[event]]></category>
		<category><![CDATA[speaking]]></category>
		<category><![CDATA[weekly release]]></category>

		<guid isPermaLink="false">http://exortech.com/blog/?p=183</guid>
		<description><![CDATA[I&#8217;ll be speaking this Thursday at DevTeach Vancouver about our experiences doing weekly production deployments. Some topics that I will cover: Creating a weekly release process Zero down-time deployment Continuous Monitoring 5 whys Automated deployment As I haven&#8217;t been doing .NET development in over a year, I feel like a bit of an impostor at [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll be speaking this Thursday at <a href="http://www.devteach.com/">DevTeach Vancouver</a> about our experiences doing <a href="http://exortech.com/blog/category/release-blog/">weekly production deployments</a>. Some topics that I will cover:</p>
<ul>
<li><a href="http://exortech.com/blog/2008/12/16/weekly-release-blog-5-rotating-release-responsibility/">Creating a weekly release process</a></li>
<li><a href="http://exortech.com/blog/2009/02/01/weekly-release-blog-11-zero-downtime-database-deployment/">Zero down-time deployment</a></li>
<li><a href="http://exortech.com/blog/2009/01/18/weekly-release-blog-9-production-monitoring/">Continuous Monitoring</a></li>
<li>5 whys</li>
<li><a href="http://exortech.com/blog/2008/12/23/weekly-release-blog-5-release-scripting/">Automated deployment</a></li>
</ul>
<p>As I haven&#8217;t been doing .NET development in over a year, I feel like a bit of an impostor at the conference. However, I think that the ideas and experiences of short release cycles transcend technology. I also think that there&#8217;s a lot that the Java and .NET communities can learn from each other.</p>
<p>Here are the slides:</p>
<div style="width:425px;text-align:left" id="__ss_1608974"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" href="http://www.slideshare.net/exortech/releasing-to-production-every-week?type=presentation" title="Releasing To Production Every Week">Releasing To Production Every Week</a><object style="margin:0px" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=releasingtoproductioneveryweek-090619094100-phpapp01&#038;rel=0&#038;stripped_title=releasing-to-production-every-week" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=releasingtoproductioneveryweek-090619094100-phpapp01&#038;rel=0&#038;stripped_title=releasing-to-production-every-week" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object>
<div style="font-size:11px;font-family:tahoma,arial;height:26px;padding-top:2px;">View more <a style="text-decoration:underline;" href="http://www.slideshare.net/">Microsoft Word documents</a> from <a style="text-decoration:underline;" href="http://www.slideshare.net/exortech">exortech</a>.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://exortech.com/blog/2009/06/10/speaking-at-devteach-vancouver/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Weekly Release Blog #25 &#8211; Improving the signal-to-noise ratio</title>
		<link>http://exortech.com/blog/2009/05/13/weekly-release-blog-25-improving-the-signal-to-noise-ratio/</link>
		<comments>http://exortech.com/blog/2009/05/13/weekly-release-blog-25-improving-the-signal-to-noise-ratio/#comments</comments>
		<pubDate>Thu, 14 May 2009 06:21:05 +0000</pubDate>
		<dc:creator>exortech</dc:creator>
				<category><![CDATA[agile]]></category>
		<category><![CDATA[release blog]]></category>
		<category><![CDATA[continuous monitoring]]></category>
		<category><![CDATA[weekly release]]></category>

		<guid isPermaLink="false">http://exortech.com/blog/?p=161</guid>
		<description><![CDATA[At my company, we use a form of Continuous Monitoring: every time our system logs a warning or an error we immediately receive an email identifying the source and nature of the problem. This allows us to respond rapidly to problems as they arise and gives us good visibility into the health of our system. [...]]]></description>
			<content:encoded><![CDATA[<p>At my company, we use a form of <a href="http://exortech.com/blog/2008/08/14/continuous-monitoring-tutorial-at-agile-2008/">Continuous Monitoring</a>: every time our system logs a warning or an error we immediately receive an email identifying the source and nature of the problem. This allows us to respond rapidly to problems as they arise and gives us good visibility into the health of our system. Following the mantra of &#8220;do in test as is done in prod&#8221;, we have the same monitoring system set up in both environments to help us find issues in test before they find their way into production.</p>
<p>The downside to this level of monitoring is that it can amount to <strong>a lot</strong> of messages. Our challenge is to manage the signal-to-noise ratio so that:</p>
<ul>
<li>we are only notified about things that require immediate action,</li>
<li>we don&#8217;t suffer from information overload; and</li>
<li>emails that matter aren&#8217;t buried under a bunch of emails that don&#8217;t.</li>
</ul>
<p>As part of our <a href="http://startuplessonslearned.blogspot.com/2008/11/five-whys.html">5 Whys</a> activity for production issues, we have found that most production issues actually occurred first in test, but just went unnoticed. This provides a compelling reason to keep the signal ratio high in all environments. Any time that we find ourselves automatically archiving or filtering an alert indicates an opportunity for improvement. </p>
<p>We have found that refining and tuning these alert messages is an ongoing maintenance activity. As part of our weekly meeting, we try to select one message to clarify or dispatch each week. We have a script that trawls the support emails received in the past week and builds a pareto distribution of the number of messages by logger. This helps us decide where to focus our efforts and to quantify the impact of our actions on the volume of messages we receive.</p>
<p>Determining what kinds of things we need to be alerted about is difficult to assess in advance. Often things that we are concerned about when building a feature turn out to less important in production, and conversely, we miss things in development that turn out to be very important once real customers start using them. Fortunately, deploying every week gives plenty of opportunity for improvement. Also if a message is logged more frequently than intended, we only have to put up with it for a week before it can be rectified. </p>
<p>I should mention that we have a <a href="http://www.amazon.com/Release-Production-Ready-Software-Pragmatic-Programmers/dp/0978739213">circuit breaker</a> in place in the log monitor. We do not allow duplicate messages to be sent any more frequently than once per hour. (Relatively early on we managed to get temporarily blacklisted by a mail provider when an errant message was generated much too frequently).</p>
<p>In terms of managing the signal-to-noise ratio, I&#8217;ve found that there are a few broad categories of messages to deal with:</p>
<ul>
<li>Message source: did the message originate in our code or in one of the libraries that we depend on? Clearly, warnings coming from our code are easier deal with than those from outside. I&#8217;ve been frustrated by the laissez-faire attitude that various open source Java frameworks take to logging errors and warnings. We use <a href="http://cxf.apache.org/">Apache CXF</a>, and it generates over 10 severe messages with lengthy stacktraces every time the application starts up to inform us that JMS integration through JNDI is not enabled. WTF?!? Sometimes these messages can be controlled by setting custom log levels for specific loggers, but not always. And it typically feels a bit disconcerting to shut down logging just in case something important is missed.</li>
<li>System conditions: was the message generated during normal operations, during a shut down or a crash? I&#8217;ve found that systems tend to be very noisy during shutdown, but (perversely) pretty quiet during a crash. In the world of Java app servers where memory leaks across deployments are common, trying to quietly quiesce a server is a real challenge.</li>
</ul>
<p>In the (enterprise) environments that I&#8217;ve worked in the past, there was very little interaction between development and operations. Logs were used only for analyzing severe production problems &#8211; generally after a severe system problem (a crash) or a user had reported a problem. The log files were poorly tuned for diagnosing problems and they tended to be full of junk &#8211; problems that no one had noticed or reported that may have been going on for months (or longer).</p>
<p>In contrast, the approach that we follow at my current company means we are able to use logs to proactively find and remedy problems. It requires effort to maintain a high signal-to-noise ratio, but it is very worthwhile.</p>
]]></content:encoded>
			<wfw:commentRss>http://exortech.com/blog/2009/05/13/weekly-release-blog-25-improving-the-signal-to-noise-ratio/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Weekly Release Blog #24 &#8211; Downgrading from Glassfish 2.1</title>
		<link>http://exortech.com/blog/2009/05/06/weekly-release-blog-downgrading-from-glassfish-21/</link>
		<comments>http://exortech.com/blog/2009/05/06/weekly-release-blog-downgrading-from-glassfish-21/#comments</comments>
		<pubDate>Thu, 07 May 2009 06:22:33 +0000</pubDate>
		<dc:creator>exortech</dc:creator>
				<category><![CDATA[release blog]]></category>
		<category><![CDATA[Glassfish]]></category>
		<category><![CDATA[weekly release]]></category>

		<guid isPermaLink="false">http://exortech.com/blog/?p=156</guid>
		<description><![CDATA[Last week, one of our Glassfish instances stopped responding. The process was running, but no longer handling requests. The good news is that the load balancer automatically failed over so there was no downtime to the site. The bad news is that we didn&#8217;t receive any direct notification of the failure. We have monitoring on [...]]]></description>
			<content:encoded><![CDATA[<p>Last week, one of our <a href="https://glassfish.dev.java.net/">Glassfish</a> instances stopped responding. The process was running, but no longer handling requests. The good news is that the load balancer automatically failed over so there was no downtime to the site. The bad news is that we didn&#8217;t receive any direct notification of the failure. We have monitoring on the box, but it is primarily at a system-level. In this case, everything was fine with the system, it was just the JVM that was having issues. And the problem wasn&#8217;t load per se, more lack thereof. Looking at the Ganglia graphs, the only thing suspicious was the absence of activity.</p>
<p>To rectify the situation, we brought the application server up and down a few times and tried redeploying the application, but still no dice. We had previously seen occasions where Glassfish had become corrupted, so the next action was to rebuild the instance. One nice feature of Glassfish is that it is quite scriptable and we fleshed out our script for rebuilding a production instance. Strangely, rebuilding the application server didn&#8217;t seem to help. The clean instance would run for a while and then just lock up. It seemed to do this non-deterministically.</p>
<p>We were feeling really stumped. As a last resort, we decided to reboot the server. This is something that I would have considered earlier if it was a Windows box, but this was a Linux server that had been up and running reliably since we first commissioned it 7 months earlier. Also this seemed to be a JVM issue and the JVM process was being brought up and down with each application server restart. Fortunately, rebooting seemed to do the trick. There must have been some malignant process or OS lock that was interfering with the JVM, but it wasn&#8217;t clear what was the cause.</p>
<p>Unfortunately this wasn&#8217;t the end of our problems. When we decided to rebuild Glassfish, we had opted to upgrade from V2ur2 to 2.1. Many of us had been running Glassfish 2.1 in development and it seemed more reliable than the V2ur2 release. Besides, it was just a minor point upgrade. When we went to reconnect our remote clients with the rebuilt server, they started throwing SerializationExceptions on a Sun library OrderedSet class. The IIOP/CORBA communication protocol uses binary serialization to transmit objects for remote JNDI lookups as part of the JMS handshake. Some genius on the project had decided to upgrade a key library as part of a point release that broke backwards compatibility for standard JMS clients. Nice.</p>
<p>Buried in the Glassfish 2.1 upgrade guide, the <a href="http://docs.sun.com/app/docs/doc/820-4331/geyyk?a=view">Application Client Interoperability section</a> states:</p>
<blockquote><p>You cannot run application clients with one version of the application server runtime with a server that has a different version. Most often, this would happen if you upgraded the server but had not upgraded all the application client installations. You can use the Java Web Start support to distribute and launch the application client. If the runtime on the server has changed since the end-user last used the application client, Java Web Start automatically retrieves the updated runtime. Java Web Start enables you to keep the clients and servers synchronized and using the same runtime.</p></blockquote>
<p>WTF!?! What kind of an upgrade process is this? Upgrading the application server requires simultaneously upgrading all clients? Ain&#8217;t gonna happen. It&#8217;s essentially guaranteeing version lock down. And recommending Java web start is fine for distributed client applications, not for long-running autonomous processes.</p>
<p>Anyway, downgrading Glassfish back to v2ur2 resolved the connectivity problem. The v2.1 compatibility problem exposed the deeper issue: that JMS, at least the default CORBA implementation, is a tightly coupled train-wreck waiting to happen, especially with Sun&#8217;s cavalier attitude toward upgrades. It&#8217;s time to pursue alternate communication protocols built on open standards like, say, XMPP.</p>
]]></content:encoded>
			<wfw:commentRss>http://exortech.com/blog/2009/05/06/weekly-release-blog-downgrading-from-glassfish-21/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Weekly Release Blog #23 &#8211; Continuous Deployment&#8230; to test?</title>
		<link>http://exortech.com/blog/2009/04/28/release-blog-23-continuous-deployment-to-test/</link>
		<comments>http://exortech.com/blog/2009/04/28/release-blog-23-continuous-deployment-to-test/#comments</comments>
		<pubDate>Tue, 28 Apr 2009 17:49:39 +0000</pubDate>
		<dc:creator>exortech</dc:creator>
				<category><![CDATA[agile]]></category>
		<category><![CDATA[release blog]]></category>
		<category><![CDATA[continuous deployment]]></category>
		<category><![CDATA[Glassfish]]></category>
		<category><![CDATA[weekly release]]></category>

		<guid isPermaLink="false">http://exortech.com/blog/?p=150</guid>
		<description><![CDATA[Last week, we were fortunate to have Eric Ries come out and spend some time talking with our team while he was here for the Agile Vancouver event. We had the chance to talk about 5 whys, split testing and other topics. I would have liked to spend a bit more time discussing continuous deployment, [...]]]></description>
			<content:encoded><![CDATA[<p>Last week, we were fortunate to have <a href="http://startuplessonslearned.blogspot.com/">Eric Ries</a> come out and spend some time talking with our team while he was here for the <a href="http://agilevancouver.ca">Agile Vancouver</a> event. We had the chance to talk about <a href="http://startuplessonslearned.blogspot.com/2008/11/five-whys.html">5 whys</a>, <a href="http://startuplessonslearned.blogspot.com/2008/09/one-line-split-test-or-how-to-ab-all.html">split testing</a> and other topics. I would have liked to spend a bit more time discussing <a href="http://timothyfitz.wordpress.com/2009/02/10/continuous-deployment-at-imvu-doing-the-impossible-fifty-times-a-day/">continuous deployment</a>, but I did get some more insight into how they got started with CD at <a href="http://imvu.com/">IMVU</a>.</p>
<p>One thing that I was surprised to learn was that IMVU started out with continuous deployment. They were deploying to production with every commit before they had an automated build server or extensive automated test coverage in place. Intuitively this seemed completely backwards to me &#8211; surely it would be better to start with CI, build up the test coverage until it reached an acceptable level and then work on deploying continuously. In retrospect and with a better understanding of their context, their approach makes perfect sense. Moreover, approaching the problem from the direction that I had intuitively is a recipe for never reaching a point where continuous deployment is feasible.</p>
<p>Initially, IMVU sought to quickly build a product that would prove out the soundness of their ideas and test the validity of their business model. Their initial users were super early adopters who were willing to trade quality for access to new features. Getting features and fixes into hands of users was the greatest priority &#8211; a test environment would just get in the way and slow down the validation coming from having code running in production. As the product matured, they were able to <a href="http://skizz.biz/blog/2008/03/11/fixing-broken-windows-with-ratcheting/">ratchet up the quality</a> to prevent regression on features that had been truly embraced by their customers.</p>
<p>Second, leveraging a dynamic scripting language (like PHP) for building web applications made it easy to quickly set up a <a href="http://radar.oreilly.com/2009/03/continuous-deployment-5-eas.html">simple, non-disruptive deployment process</a>. There&#8217;s no compilation or packaging steps which would generally be performed by an automated build server &#8211; just copy and change the symlink. </p>
<p>Third, they evolved ways to selectively expose functionality to sets of users. As Eric said, &#8220;at IMVU, &#8216;release&#8217; is a marketing term&#8221;. New functionality could be living in production for days or weeks before being released to the majority of users. They could test, get feedback and refine a new feature with a subset of users until it was ready for wider consumption. Users were not just an extension of the testing team &#8211; they were an extension of the product design team.</p>
<p>Understanding these three factors makes it clear as to why continuous deployment was a starting point for IMVU. In contrast, at most organizations &#8211; especially those with mature products &#8211; high quality is the starting point. It is assumed that users will not tolerate any decrease in quality. Users should only see new functionality once it is ready, fully implemented and thoroughly tested, lest they get a bad impression of the product that could adversely affect the company&#8217;s brand. They would rather build the wrong product well than risk this kind of exposure. In this context, the automated test coverage would need to be so good as to render continuous deployment infeasible for most systems. Starting instead from a position where feedback cycle time is the priority and allowing quality to ratchet up as the product matures provides a more natural lead in to continuous deployment.</p>
<p>For my company, even though we do weekly deployments, we&#8217;re still a fair way off from being able to deploy continuously. As we are operating in a new and rapidly evolving market, we focus on building and releasing a simple initial version of new features that demonstrate the potential of the software. We can then receive feedback and invest more effort in expanding those features that resonate with our clients. While we do routinely selectively expose new functionality to a subset of users (generally internal users) to solicit feedback, we still need to create more sophisticated ways to do user segmentation. Aside from the obvious bugbear of automated test coverage (we use JUnit and Selenium, but our coverage isn&#8217;t nearly good enough), our main blocking issue from a technology perspective is the deployment process itself.</p>
<p>To deploy continuously, the deployment has to be quick and it has to be transparent to end users (ie. there should be no visible downtime). Performing a rollback should have the same characteristics. Our deployment process <em>is</em> automated, but in the world of Java application servers (even lightweight ones like Glassfish) deployment is anything but fast. Deployment entails all kinds work that the app server needs to do (parsing configuration files, generating WSDLs, starting thread pools, etc) during which the application is unresponsive. Also, because of memory leak issues in the container, we always restart the application server with each deployment anyway. All in all, the only way to avoid downtime is to pull the application server out of the load balancer pool until the deployment completes. Rollback is the same process in reverse. </p>
<p>A bit of an aside, but I know of some teams that package Glassfish with their app, inverting the container metaphor and simply treating it as another library/dependency. This makes it easier to just flip the symlink on deployment and rollback. It&#8217;s an interesting idea, as long as you don&#8217;t mind copying a massive WAR to production with each deploy (which for us would just shift the deployment bottleneck to the network).</p>
<p>We have made a fair bit of head way on streamlining our deployment process, and while we&#8217;re not ready to do continuous deployments into production, I am trying to get us into a position where we can do continuous deployment to test. I used to be of the opinion that deployment to test was something that should be controlled by testers (via a deploy button on the automated build server). Most testers want to work against a stable baseline, limiting the number of variables that they are dealing with when testing the app. But this is a fallacy because a batch of changes is simply piling up behind whatever version is deployed into test. It&#8217;s classic batch-and-queue thinking.</p>
<p>What if deployments happened without downtime in a way that was invisible to the tester or the end user? What if test coverage was sufficient to ensure that there would be no regression on major areas of functionality? I think that the fears of continuous deployment into test and the need for a stable baseline would evaporate. Moreover, this is something that we would want to test because it would mirror the experience of users using the site when a new version goes into production. In our office, every time we do a deployment to test, someone needs to call out &#8220;deploying to test&#8221;. This too would go away.</p>
<p>That&#8217;s the plan anyway. Over the next couple of weeks, I&#8217;ll see if we can move closer to achieving it. I&#8217;ll let you know how it goes.</p>
]]></content:encoded>
			<wfw:commentRss>http://exortech.com/blog/2009/04/28/release-blog-23-continuous-deployment-to-test/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Weekly Release Blog #20 &#8211; Far Future Expires</title>
		<link>http://exortech.com/blog/2009/04/07/weekly-release-blog-20-far-future-expires/</link>
		<comments>http://exortech.com/blog/2009/04/07/weekly-release-blog-20-far-future-expires/#comments</comments>
		<pubDate>Wed, 08 Apr 2009 06:42:05 +0000</pubDate>
		<dc:creator>exortech</dc:creator>
				<category><![CDATA[agile]]></category>
		<category><![CDATA[release blog]]></category>
		<category><![CDATA[weekly release]]></category>

		<guid isPermaLink="false">http://exortech.com/blog/?p=140</guid>
		<description><![CDATA[If you&#8217;re looking for some quick ways to improve the performance of your site, Steve Souder&#8217;s High Performance Web Sites is packed with great advice. You don&#8217;t even need to buy the book as most of the information is available through links from the Firefox YSlow plugin. We have been picking one rule every couple [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re looking for some quick ways to improve the performance of your site, Steve Souder&#8217;s <a href="http://www.amazon.ca/High-Performance-Web-Sites-Essential/dp/0596529309">High Performance Web Sites</a> is packed with great advice. You don&#8217;t even need to buy the book as most of the <a href="http://developer.yahoo.com/performance/rules.html">information is available</a> through links from the <a href="http://developer.yahoo.com/yslow/">Firefox YSlow plugin</a>. We have been picking one rule every couple of weeks to focus on and this past week we spent a bit of time adding <a href="http://developer.yahoo.net/blog/archives/2007/05/high_performanc_2.html">far future expires headers</a> for the Flex SWFs on our site.</p>
<p><strong>Far future expires</strong> means setting the expires HTTP header for static content to some date far in the future. Effectively, this means that static content within a web site will always be loaded from the browser cache after it is first requested. This has the impact of greatly improving the load time for your site as well as reducing the number of requests sent to your web servers. The flip-side, however, is that because the cached content never expires, if you do need to change an image or a stylesheet then the user will need to clear their browser cache before they see it. </p>
<p>Hence, taking advantage of far future expires means taking responsibility for versioning static content on the server. Any time static content changes, it needs to be served up from a different URL. In his book, Steve Souders alludes to the approach that they follow at Yahoo! to achieve this, but he doesn&#8217;t give enough detail to just go ahead and implement it. So here is how we&#8217;re solving the problem.</p>
<p>We&#8217;re currently using two approaches to versioning static content: one for images and one for SWFs, CSS, and Javascript:</p>
<ul>
<li>Every time an image is added to our site, we place it in a folder named after the current release (ie. <em>/images/1.21/header.png</em>). If we need to update an image then we move it from the folder it&#8217;s in to the folder for the current release and then update all links to the image accordingly. While this approach does require some manual effort, it has the advantage of being incredibly simple and easy to get going immediately. Because images change relatively infrequently within the site, this approach creates minimal overhead. It also only means that images that have changed from release to release get reloaded. The majority of the images will stay cached because they haven&#8217;t changed.</li>
<li>Other static content like stylesheets, scripts and Flex applications change more frequently. They are versioned automatically with every build by getting copied to a folder named after the current build number and then bundled into the deployment package. We then dynamically build the path/URL to these resources using the current build number loaded from a bundled text file resource. This approach has the benefit of being completely automated. The only (small) disadvantage is that the version of the content changes with every release (as we&#8217;re releasing every week, this is quite often) regardless of whether the content has changed or not. However, given that this content generally changes weekly, it isn&#8217;t a problem.</li>
</ul>
<p>As far as setting the expires and cache control headers, we&#8217;re using <a href="http://nginx.net/">nginx</a> as a reverse proxy server which makes it trivial to <a href="http://wiki.nginx.org/NginxHttpHeadersModule">set HTTP headers</a> by the file extension for each requested URI.</p>
<p>If you have suggestions for a better way to version static content or if I can provide more clarity on the approach that we&#8217;re using, please let me know.</p>
]]></content:encoded>
			<wfw:commentRss>http://exortech.com/blog/2009/04/07/weekly-release-blog-20-far-future-expires/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
