Friday, August 23, 2013

An Error? Carry on anyway!

Lately I've had to do a lot of development in our core internal application which is responsible for managing data flow between our clients systems and our network.  This system also coordinates data for some of our 3rd-party partner integrations, such as Pay-as-you-Go Worker's Compensation.  Multiple times throughout any day, a number of scheduled tasks are run which imports data into databases, downloads and processes XML files, generates and uploads more XML files, and other similar large-scale data processing tasks.

So you'd think a system at this core level should be well-designed, right?  Wrong!  There are a number of faults in the architecture of this system, but by far the most blatant is a pattern called "Exception? Won't happen!"  It's the most pernicious of them all because it results in a number of complex, hard-to-debug data defects:

  • Data partially processed, or processed incorrectly and has to be manually rolled back.
  • Files that are generated, but not uploaded properly.
  • Multiple day's worth of data just "missing" from generated files.
Here's what the pattern looks like in a nut-shell:

Try
  PrepareData()
Catch ex As Exception
  LogException(ex)
End Try

Try
  GenerateFile()
Catch ex As Exception
  LogException(ex)
End Try

Try
  ArchiveData()
Catch ex As Exception
  LogException(ex)
End Try

Try
  UploadFile()
Catch ex As Exception
  LogException(ex)
End Try

At first glance you might think there's nothing wrong - the developer has wisely adding exception handling and logging to the system to make sure errors can be detected and reviewed.  The problem comes if something does indeed fail.  For example, what happens if "UploadFile()" process fails?  Well, the first three processes will have already finished and been committed.  The data has been archived permanently, but the generated file never got uploaded to the 3rd-party network.  That means they will never receive the data, and we will never send it again because it's been marked "complete"!  Apparently the developer assumed that "nothing could go wrong".

Resolving this defect can be a little time-consuming, but definitely worth it. I generally approach it this way:

  1. Wrap all sub-steps in some sort of transaction.
  2. Pass the same transaction to each method.
  3. If the entire process (work unit) is successful, commit the transaction at the end.
  4. If anything fails on any step, roll the entire process back, including deleting any temp files that were partially created.
  5. Everything is left in a consistent state then for another try in the next processing cycle.
Just for fun, here's another gem from the same code-base that occurred while stepping through with the debugger in Visual Studio (the highlighted line is the next statement to execute):



Happy programming!

No comments:

Post a Comment