Chainsaw.Net with Style

March 29, 2011
I've been working pretty diligently during my nights at finishing version 2.0 of my custom ETL engine.  Its coming along nicely and I'm pretty happy with how this revision is coming.  Look for more on my custom ETL engine in the near future, but today I'm revisiting my .NET implementation of Chainsaw, a log viewer for log4net.  As some of you might remember I finished an initial version of the viewer a few weeks back.  I've been using it pretty heavily during the development of my ETL engine and I got tired of it looking crappy.  Tonight I took the night off from ETL development to "beautify" the log4net reporting viewer and I have to say, its easier on the eyes. 
If you compare the two screenshots you'll notice that I moved all my controls up to a single bar at the top of the page and incorporated the jQuery UI button components to replace the basic checkboxes, radio buttons and drop downs.  I've also removed the logger tree from being a permanent fixture on the page and now it toggles with a button in the top left that gives you access to the tree to make your logger selections.
 
That screenshot is full resolution if you open it in a new tab, which makes it easier to check out the new look and feel.
I found that it can be a bit tricky to pull the state out of these modified controls but it only required a few changes to the javascript.  Notably pulling the value out of the converted checkboxes and radio buttons required a slightly sophisticated select which I'll share here just in case you find yourself in a similar situation.
var sort_order = $('input:radio[name=radio-message-order]:checked').val(); 
var per_page = $('input:radio[name=option-perpage]:checked').val();
I played with swapping around some of the themes available from jQuery UI but they don't seem to want to play all that nicely which likely means I'm doing something wrong or the browsers are caching things. Either way, it will have to wait for another day.
Oh yea, I also discovered that the comment system wasn't exactly working.  I had CAPTCHA enabled but not the CAPTCHA tool.  Whoops, its working now so comment away.  Now, back to the real stuff.

Data Visualization from Hans Rosling

March 21, 2011

Today I was getting my fix from TED and ended up watching a recent talk by Hans Rosling on the magic washing machine. It was an excellent talk as are many by Hans, including an old favorite that I was reminded of while watching this one.

Back in 2009 Hans did another talk called Let my dataset change your mindset where he discussed public health and visualization techniques, two topics that I'm passionate about. Most of the visualizations in his talk are created with the Gapminder software which was acquired by Google back in 2007. Gapminder is excellent at showing how data changes over time.

Take a look at the talk and if you share my excitement with the visualization be happy to know that it lives on both at Gapminder.com as well as part of Google's public data explorer. You can even leverage the visualization technique yourself by using Google's motion chart.  In addition as you get captivated by Hans you'll likely also enjoy this earlier talk where Hans shows the best statistics you have ever seen.

Logging and Triggers and Email Oh My!

March 17, 2011

I've been doing a lot of work lately using SSIS to load our production databases.  We're loading very large databases and our SSIS packages are long running.  Many take days not hours.  It can be difficult to understand what parts of the packages are taking most of the time and when or if they have issues.  SSIS provides a logging feature that you can enable and it goes a long way to solving those issues.  We have enabled our packages for logging and direct the output to a centralized database where we log all of our loading package executions.  From there I've written a web application that analyzes the logging database so we can view all the information generated by our packages.  I even have a nifty little timeline viewer I've written that depicts our package flow, its simple, but it does the job.

The report has a ton more detail than the timeline, but I'll save that for another day.  What i wanted to focus on here was how to generate alerts.  We wanted a way to get notifications when packages had issues.  Yes, we could do things in the packages themselves to set paths to go to email tasks on failures, but it just complicated the packages and we would have to configure it in each package for each task, you see where this is going.  Since we have all of our packages logging centrally we thought it would be easier to setup a trigger on the logging table to alert us when particular events occurred.  This ended up working really well once we worked through a few minor issues.

First you have to go through and setup database mail on the server with the logging database which is done by just following through the wizard.  If anyone posts any questions I'll go back and provide some more information on how its done.

With database mail setup you have to make sure that the credentials you are using to write your log entries to the database have permissions to send email using database mail.  The script to do that is straight forward:

USE [msdb]
GO
CREATE USER [<ACCOUNT NAME HERE>] FOR LOGIN [<ACCOUNT NAME HERE>]
GO
USE [msdb]
GO
EXEC sp_addrolemember N'DatabaseMailUserRole', N'<ACCOUNT NAME HERE>'
GO

Now that your account has the right permissions you just need to setup a trigger to send the email.  You can customize it anyway to provide different levels of notification based on what you capture in your log table.  In the following example we send an email when a package encounters an error.

CREATE TRIGGER [dbo].[SSISError] ON [dbo].[sysssislog]
FOR INSERT
AS

IF (SELECT COUNT(*) FROM inserted WHERE [event] = 'OnError') = 1
BEGIN

exec msdb.dbo.sp_send_dbmail 
	@profile_name = 'Default Email Profile',
	@recipients = 'somebody@who.cares.about.alerts.com',
	@body_format = 'HTML',
	@body = 'Go check your error.,
	@subject = 'SSIS Error'
END
And there you have it.  Logging, reporting and alerts from your SSIS packages.  Now go load some data!

ETL : Made from scratch.

March 14, 2011

So for those of you dedicated readers (there are currently about 40 of you, so keep it up, and tell your friends) you already know that I wrote a bit about what it takes to get Rhino-ETL built.  I've since been amazed at how complex Rhino-ETL is, I mean, it has its own scripting language or DSL component as its called.  Its impressive, but I don't find it to be all that simple.  I'm a big fan of simple.

I use SQL Server Integration Services (SSIS) in my professional work.  Its not really all that simple either.  It is very capable, but not simple.  DTS was simple.  For those who aren't familiar DTS was Data Transformation Services, the ETL engine before SSIS.  Ever wonder why SSIS packages are called DTSx files?  They didn't know what they were calling SSIS in time to change the file extension. Much like old school ASP became ASPX, DTS became DTSX.  Now you know.  I also believe it was to lure us all in.  Let's face it, a lot of people really liked DTS.  No, it wasn't an enterprise ETL engine like SSIS or Informatica, but it was light weight, I'll even go as far as to say that it was fun.  That's right, FUN.  It inferred types, loaded data in ways it probably shouldn't have, but it just kind of worked.  It couldn't handle complicated scenarios, you know, the 20 part of the 80/20 split? But it handled the 80 part pretty well and I liked it.  Flash forward to today and DTS is gone.  Out, out brief candle. 

I have a few personal projects ongoing that require the help of an ETL engine and it must be open source or more specifically, free.  We have SSIS, but it certainly isn't free and Rhino-ETL looked great but its not meeting my criteria of being simple and so...

ETL : Made from scratch.  That's right, I'm re-inventing the wheel. Too bad.  Everyone else gets to do it.  I mean, did we really need GIT?  We have subversion.  Subversion tore me from my comfortable little world of Microsoft tools and quietly convinced me of its superiority to Source Safe and at least the first few versions of TFS.  Then someone decided we needed GIT.  So, too bad, I'm writing my own ETL engine and besides, it will be from scratch!  You know, because for some reason we are all pretty much convinced that things are better when they are made from scratch.  Cupcakes, pasta, tomato sauce, they are all better when made from scratch instead of from the boxes and jars we buy in the store.  Therefore, it logically follows that my ETL engine will be better. Well, probably not, but it will be simpler.

I wrote version 1 and promptly threw it away, because version one is never any good anyway.  I'm just putting the finishing touches on version 2 and hope to finish it up in the next few weeks.  Its clearly over-engineered just like any good V2 is supposed to be over-engineered.  Once I finish version 2 I will put it through the paces of loading all the data for my personal projects and see how it does.  To be fair, version 1 was able to fulfill about 80% of my needs but it was damn ugly.  V2 is much prettier, hopefully it works better too.  Look for more details on V2 and even some sample code in the near future.

I couldn't find a chainsaw, so...

March 2, 2011

I recently started working on a web based reporting console for log4net since I couldn't find a .NET version of Chainsaw, the java based app for a similar purpose.  

For those who aren't familiar log4net is a .NET port of the log4j logging framework.  You can find out more about the framework and download the bits off the Apache site.  Essentially its a logging library that you can use instead of rewriting the way you log for each project you develop.  What's really great about it is that it is completely abstracted from the storage layer, meaning you can log your messages to the local file system, just about any relational database, the windows event log, even email via smtp, all without changing your code.

My reporting console isn't nearly as flexible but works for my purposes.  I've setup an ASP.NET web application that utilizes Entity Framework to access my SQL server based logging data created by log4net, which I obviously have configured to use SQL server for storage.  The data is exposed through a web service and accessed via ajax from my front end.  

I've integrated a jquery extension called jstree to provide a hierarchical way to filter the loggers' messages to display in the report which is actually somewhat similar to what the java based log viewer Chainsaw does in its reporting.  Here is a screenshot of the reporting interface:

The top left shows the treeview courtesy of jstree.  To the right of the tree are a few simple reporting options I have implemented so far including a sort order for the log messages, a limit on the number of messages for the interface to download and filters on which columns to include in the report.  

I'm pulling the state of the options using jquery and then putting them all together in order to make the ajax call to the web service.  Gotta give a shout out to the folks over at ElegantCode for having a nice quick tutorial on reading the state of the select objects, since I always forget how to do that. Check out the basics of the data update routine below:

var sort_order = $('.option-sortorder').val();
var per_page = $('.option-perpage').val();

var bit_id = ($(".option-mask-id")[0].checked) ? '1' : '0';
var bit_date = ($(".option-mask-date")[0].checked) ? '1' : '0';
var bit_level = ($(".option-mask-level")[0].checked) ? '1' : '0';
var bit_logger = ($(".option-mask-logger")[0].checked) ? '1' : '0';
var bit_message = ($(".option-mask-message")[0].checked) ? '1' : '0';

var mask = bit_id + bit_date + bit_level + bit_logger + bit_message;

$.ajax({
	type: "POST",
	contentType: "application/json; charset=utf-8",
	url: "LoggerDetails.svc/GetMessages",
	data: "{\"ids\" : \"" + id_list.toString() + "\"," +
		"\"sort\" : " + sort_order + "," +
		"\"perpage\" : " + per_page + "," +
		"\"mask\" : \"" + mask + "\"," +
		"}",
	success: function (data, status) {
		$(".messages").html(data.d);
		$(".message-table").tablesorter({ widgets: ['zebra'] });
	},
	error: function (err) {
		alert(err.statusText);
	},
	dataType: "json"
});

I've been working a bit on timeline visualization controls and I might integrate one of them into this interface if it looks like it would be useful, which it does at this point.  I'm also considering adding a date filter so you can pull things back from specific time periods or some aggregate statistics, so you can get a higher level view of the volumne and category of entries in your log.

Source code available upon request, just hit up the comments below.

Git Rhino-ETL Going

March 1, 2011

For one of my projects I'm trying to identify a decent, open source, .NET based ETL component.  Yes, those are some picky requirements but there appears to be at least one answer called Rhino ETL.  Getting it built isn't the most straight forward task.  Here is what it takes:

First off grab the source from GIT, which might first require you have access to GIT utilities, I chose GITExtensions off of SourceForge.  After installing it and launching it I get to look at a cow.  I'm not sure why.  I'm sure there is a reason.  I don't think I want to know what it is though, so I'm going to skip googling GITExtensions and cow and just get on with it, but for your own viewing pleasure I've included a screenshot.

Once you have GITExtensions installed you can use the UI to download the current source for Rhino-ETL.  Currently the URL for Read Only access to the Rhino-ETL source is here:

https://github.com/hibernating-rhinos/rhino-etl.git

In order to grab a copy of the source you click 'Clone repository' in the above UI and then fill out the clone screen something like this:

Hit clone and now you'll have the source in the destination you specified.  

More...

Informatics

March 1, 2011

The term informatics covers a wide spectrum of areas.  Let us begin with the definition from none other than wikipedia:

"Informatics is the science of information, the practice of information processing, and the engineering of information systems. Informatics studies the structure, algorithms, behavior, and interactions of natural and artificial systems that store, process, access and communicate information. It also develops its own conceptual and theoretical foundations and utilizes foundations developed in other fields. Since the advent of computers, individuals and organizations increasingly process information digitally. This has led to the study of informatics that has computational, cognitive and social aspects, including study of the social impact of information technologies."

Part of what I want to communicate through this blog is my experience working in the field of informatics.  Until recently, I would not have even called myself an informatician, but the more familiar I became with the field, the more I understood that it is what I am.  To me, informatics is simply working with information.  Sure, the wikipedia definition wins the competition for most words in the definition, but mine is simpler and there is definitely something to be said about The Power Of Simplicity.  As simple as my definition is, the field is very complicated for a variety of reasons.  I'll be posting more about informatics in the future and including some practical examples of the trials and tribulations of an informatician. I'm sure you can hardly wait. 

About the author

Frank DeFalco has worked as software developer and informatics professional since 1996.  He is currently employed by Johnson & Johnson where he manages the Health Informatics Center of Excellence.

 

Reach me via email at frank (at) thedefalcos (dot) com.

 

Follow me on twitter @fjdefalco