<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>exortech.com &#187; continuous monitoring</title>
	<atom:link href="http://exortech.com/blog/tag/continuous-monitoring/feed/" rel="self" type="application/rss+xml" />
	<link>http://exortech.com/blog</link>
	<description>Peripatetic thinking</description>
	<lastBuildDate>Tue, 01 Dec 2009 05:56:13 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Weekly Release Blog #25 &#8211; Improving the signal-to-noise ratio</title>
		<link>http://exortech.com/blog/2009/05/13/weekly-release-blog-25-improving-the-signal-to-noise-ratio/</link>
		<comments>http://exortech.com/blog/2009/05/13/weekly-release-blog-25-improving-the-signal-to-noise-ratio/#comments</comments>
		<pubDate>Thu, 14 May 2009 06:21:05 +0000</pubDate>
		<dc:creator>exortech</dc:creator>
				<category><![CDATA[agile]]></category>
		<category><![CDATA[release blog]]></category>
		<category><![CDATA[continuous monitoring]]></category>
		<category><![CDATA[weekly release]]></category>

		<guid isPermaLink="false">http://exortech.com/blog/?p=161</guid>
		<description><![CDATA[At my company, we use a form of Continuous Monitoring: every time our system logs a warning or an error we immediately receive an email identifying the source and nature of the problem. This allows us to respond rapidly to problems as they arise and gives us good visibility into the health of our system. [...]]]></description>
			<content:encoded><![CDATA[<p>At my company, we use a form of <a href="http://exortech.com/blog/2008/08/14/continuous-monitoring-tutorial-at-agile-2008/">Continuous Monitoring</a>: every time our system logs a warning or an error we immediately receive an email identifying the source and nature of the problem. This allows us to respond rapidly to problems as they arise and gives us good visibility into the health of our system. Following the mantra of &#8220;do in test as is done in prod&#8221;, we have the same monitoring system set up in both environments to help us find issues in test before they find their way into production.</p>
<p>The downside to this level of monitoring is that it can amount to <strong>a lot</strong> of messages. Our challenge is to manage the signal-to-noise ratio so that:</p>
<ul>
<li>we are only notified about things that require immediate action,</li>
<li>we don&#8217;t suffer from information overload; and</li>
<li>emails that matter aren&#8217;t buried under a bunch of emails that don&#8217;t.</li>
</ul>
<p>As part of our <a href="http://startuplessonslearned.blogspot.com/2008/11/five-whys.html">5 Whys</a> activity for production issues, we have found that most production issues actually occurred first in test, but just went unnoticed. This provides a compelling reason to keep the signal ratio high in all environments. Any time that we find ourselves automatically archiving or filtering an alert indicates an opportunity for improvement. </p>
<p>We have found that refining and tuning these alert messages is an ongoing maintenance activity. As part of our weekly meeting, we try to select one message to clarify or dispatch each week. We have a script that trawls the support emails received in the past week and builds a pareto distribution of the number of messages by logger. This helps us decide where to focus our efforts and to quantify the impact of our actions on the volume of messages we receive.</p>
<p>Determining what kinds of things we need to be alerted about is difficult to assess in advance. Often things that we are concerned about when building a feature turn out to less important in production, and conversely, we miss things in development that turn out to be very important once real customers start using them. Fortunately, deploying every week gives plenty of opportunity for improvement. Also if a message is logged more frequently than intended, we only have to put up with it for a week before it can be rectified. </p>
<p>I should mention that we have a <a href="http://www.amazon.com/Release-Production-Ready-Software-Pragmatic-Programmers/dp/0978739213">circuit breaker</a> in place in the log monitor. We do not allow duplicate messages to be sent any more frequently than once per hour. (Relatively early on we managed to get temporarily blacklisted by a mail provider when an errant message was generated much too frequently).</p>
<p>In terms of managing the signal-to-noise ratio, I&#8217;ve found that there are a few broad categories of messages to deal with:</p>
<ul>
<li>Message source: did the message originate in our code or in one of the libraries that we depend on? Clearly, warnings coming from our code are easier deal with than those from outside. I&#8217;ve been frustrated by the laissez-faire attitude that various open source Java frameworks take to logging errors and warnings. We use <a href="http://cxf.apache.org/">Apache CXF</a>, and it generates over 10 severe messages with lengthy stacktraces every time the application starts up to inform us that JMS integration through JNDI is not enabled. WTF?!? Sometimes these messages can be controlled by setting custom log levels for specific loggers, but not always. And it typically feels a bit disconcerting to shut down logging just in case something important is missed.</li>
<li>System conditions: was the message generated during normal operations, during a shut down or a crash? I&#8217;ve found that systems tend to be very noisy during shutdown, but (perversely) pretty quiet during a crash. In the world of Java app servers where memory leaks across deployments are common, trying to quietly quiesce a server is a real challenge.</li>
</ul>
<p>In the (enterprise) environments that I&#8217;ve worked in the past, there was very little interaction between development and operations. Logs were used only for analyzing severe production problems &#8211; generally after a severe system problem (a crash) or a user had reported a problem. The log files were poorly tuned for diagnosing problems and they tended to be full of junk &#8211; problems that no one had noticed or reported that may have been going on for months (or longer).</p>
<p>In contrast, the approach that we follow at my current company means we are able to use logs to proactively find and remedy problems. It requires effort to maintain a high signal-to-noise ratio, but it is very worthwhile.</p>
]]></content:encoded>
			<wfw:commentRss>http://exortech.com/blog/2009/05/13/weekly-release-blog-25-improving-the-signal-to-noise-ratio/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Continuous Monitoring Tutorial at Agile 2008</title>
		<link>http://exortech.com/blog/2008/08/14/continuous-monitoring-tutorial-at-agile-2008/</link>
		<comments>http://exortech.com/blog/2008/08/14/continuous-monitoring-tutorial-at-agile-2008/#comments</comments>
		<pubDate>Thu, 14 Aug 2008 05:57:03 +0000</pubDate>
		<dc:creator>exortech</dc:creator>
				<category><![CDATA[agile]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[agile 2008]]></category>
		<category><![CDATA[continuous monitoring]]></category>

		<guid isPermaLink="false">http://exortech.com/blog/?p=39</guid>
		<description><![CDATA[Last week, I conducted a tutorial on Continuous Monitoring at the Agile 2008 conference in Toronto. The title of the session is Continuous Monitoring: Beyond Continuous Integration. Unfortunately, the track organizers changed the topic title on me twice and as a result I ended up with a number of attendees who had come to learn [...]]]></description>
			<content:encoded><![CDATA[<p>Last week, I conducted a tutorial on <a href="http://exortech.com/blog/2008/05/20/continuous-monitoring-on-hanselminutes/">Continuous Monitoring</a> at the <a href="http://agile2008.org/">Agile 2008</a> conference in Toronto. The title of the session is <b>Continuous Monitoring: Beyond Continuous Integration</b>. Unfortunately, the track organizers changed the <a href="http://submissions.agile2008.org/node/4381">topic title on me twice</a> and as a result I ended up with a number of attendees who had come to learn about setting up an automated build server. Ack! Hopefully, they didn&#8217;t go away disappointed and still got something valuable out of the tutorial.</p>
<p>The session was divided into 3 sections: I began with a presentation introducing the topic; next, participants were encouraged to work in small groups to design an <a href="http://en.wikipedia.org/wiki/Andon">andon dashboard</a> for their project teams; the remainder of the session was spent discussing the implementation details involved in building a dashboard. My plan for the latter half of the session was to get participants to integrate metric data from different sources via RESTful XML web services into a simple Rails-based dashboard that I have thrown together, but given the size and interest of the group, it seemed easiest to just discuss the implementation rather than go through with the exercise. I had also intended to demo using a digital photo frame as a digital dashboard, but my photo frame couldn&#8217;t get onto the hotel&#8217;s wireless.</p>
<p>If you are interested in a copy of the presentation, I&#8217;ve uploaded it in <a href="http://exortech.com/blog/wp-content/uploads/2008/08/continuousmonitoringkey.zip">Keynote</a> or <a href="http://exortech.com/blog/wp-content/uploads/2008/08/continuousmonitoringppt.zip">PowerPoint 2003</a>. Please feel free to use the contents of the slides. The presentation is done in the <a href="http://www.presentationzen.com/presentationzen/2005/10/the_lessig_meth.html">Lessig style</a>, so it might not be the easiest to follow. If you end up presenting on the topic, let me know &#8212; I&#8217;m interested to track the thinking and ideas as they evolve. Here&#8217;s the embedded slideshow from Slideshare:</p>
<div style="width:425px;text-align:center;padding-left:50px" id="__ss_554521"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" href="http://www.slideshare.net/exortech/continuous-monitoring?src=embed" title="Continuous Monitoring">Continuous Monitoring</a><object style="margin:0px" width="425" height="355"><param name="movie" value="http://static.slideshare.net/swf/ssplayer2.swf?doc=continuousmonitoring-1218696146829392-9&#038;stripped_title=continuous-monitoring" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slideshare.net/swf/ssplayer2.swf?doc=continuousmonitoring-1218696146829392-9&#038;stripped_title=continuous-monitoring" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object>
<div style="font-size:11px;font-family:tahoma,arial;height:26px;padding-top:2px;">View SlideShare <a style="text-decoration:underline;" href="http://www.slideshare.net/exortech/continuous-monitoring?src=embed" title="View Continuous Monitoring on SlideShare">presentation</a> (tags: <a style="text-decoration:underline;" href="http://slideshare.net/tag/continuousmonitoring">continuousmonitoring</a> <a style="text-decoration:underline;" href="http://slideshare.net/tag/agile2008">agile2008</a>)</div>
</div>
<p>As for the code that I used in the demo, I&#8217;ll get it uploaded to <a href="http://github.com/">github</a> soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://exortech.com/blog/2008/08/14/continuous-monitoring-tutorial-at-agile-2008/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Continuous Monitoring on Hanselminutes</title>
		<link>http://exortech.com/blog/2008/05/20/continuous-monitoring-on-hanselminutes/</link>
		<comments>http://exortech.com/blog/2008/05/20/continuous-monitoring-on-hanselminutes/#comments</comments>
		<pubDate>Tue, 20 May 2008 05:18:53 +0000</pubDate>
		<dc:creator>exortech</dc:creator>
				<category><![CDATA[.net]]></category>
		<category><![CDATA[agile]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[continuous monitoring]]></category>
		<category><![CDATA[hanselminutes]]></category>
		<category><![CDATA[podcast]]></category>

		<guid isPermaLink="false">http://exortech.com/blog/2008/05/20/continuous-monitoring-on-hanselminutes/</guid>
		<description><![CDATA[While at DevTeach, I was interviewed by Scott Hanselman for his Hanselminutes Podcast. We started out talking about the history of the CruiseControl.NET project, but I opted to segue into discussing Continuous Monitoring. Continuous Monitoring focuses on providing continuous feedback to a team by leveraging visible dashboard displays to ambiently communicate information about the health [...]]]></description>
			<content:encoded><![CDATA[<p>While at DevTeach, I was interviewed by Scott Hanselman for his <a href="http://www.hanselminutes.com/default.aspx?showID=131">Hanselminutes Podcast</a>. We started out talking about the history of the <a href="http://ccnet.thoughtworks.com">CruiseControl.NET project</a>, but I opted to segue into discussing Continuous Monitoring. Continuous Monitoring focuses on providing continuous feedback to a team by leveraging visible dashboard displays to ambiently communicate information about the health and state of their project. I intend to write more about the practice here on this blog, but for now the podcast is the best place to learn more about it. I will be presenting about it at Agile 2008 and if you are interested in joining the discussion, feel free to join the <a href="http://groups.google.com/group/continuousmonitoring/">Continuous Monitoring group</a>.</p>
<p><strong>Corrections:</strong><br />
There are a few statistics that I cited incorrectly off the top of my head during the podcast:</p>
<ul>
<li>The CruiseControl.NET project has consumed <a href="http://www.ohloh.net/projects/cruisecontrol">over 46 person years of effort</a> &#8211; at least based on what oloh can divine from our subversion repository.</li>
<li>The CruiseControl.NET project has had <a href="http://sourceforge.net/project/stats/?group_id=71179&#038;ugn=ccnet&#038;type=&#038;mode=alltime">over 800,000 downloads</a> &#8211; not 80,000 as I said during the interview. I was off by an order of magnitude. Oh and this doesn&#8217;t include all of the direct downloads from <a href="http://ccnetlive.thoughtworks.com">CCNetLive</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://exortech.com/blog/2008/05/20/continuous-monitoring-on-hanselminutes/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

