Friday, September 13, 2013

Friday TIP: Switch Application By Name

I have for awhile been searching for a program that would allow me to manage all my open windows – as I typically have anywhere from 30-40 windows open on any given day.  The most frustrating thing was eyeing over the icons trying to find a certain window, and can’t seem to find it.  So enter some new tools for switching to open program by name:

  • Switcher – Adds a new shortcut Windows+~ that pulls up an overlay with a grid of screenshots of open windows, sorted by most often active.  Also, you can start typing, and it will search windows by name.  Downside: Requires Windows Aero to be activated, and it is somewhat graphic/cpu intense.  Probably not going to work well on a laptop.
  • ENSO Launcher – Replaces your CAPS Lock as a shortcut to a popup window where you can type various commands (if you’re like me and hardly ever use CAPs lock anyway, it’s not a big deal to lose this key).  You can type “go abcd” where abcd is part of the name of an open window.  As you type a smart-list dynamically populates with closest matches, and you simply press Enter to switch to the application.



I have both installed, and I’m quickly preferring ENSO Launcher because it is less graphic intense and more performant.

Wednesday, September 11, 2013

Performance Tuning C#.Net Code

If you've developed a project with any level of complexity, you have surely run into the problem of needing to do performance optimization.  You've got too many calls to the database; or your inner loops are inefficient; or you need to convert some large arrays into dictionary lookups, etc.  As it happens, my project this week has been to optimize the performance of some functionality for creating records in a cloud-based system that live synchronizes every update with a second database.

The most complicated calls in this system involve what we call "business logic".  A record update made in the cloud system is synchronized to the back-end database (via web-service call), and then there are other records in related tables that are automatically updated as well (think of a trigger), and those changes are sent back.  On top of that, if new records in related tables are being created, the ID values of those new records have to be sent in a second web service call to keep record ID's in sync.  For some of our larger customers, this entire two-way communication process from end-to-end has been taking upwards of a couple of minutes to execute, which is not acceptable to them or to us.

So how to approach a problem like this, where you have a performance issue in a complex system composed of many moving parts in separate locations?  A couple of basic rules come to mind, like the old mantra "Premature optimization is the root of all evil" (Donald Knuth).  Basically, we must first measure the code to determine what is in need of optimization, because it is very often not in the place you intuitively think it is.  The simplest way to measure code performance in .NET is to make use of the DateTime and TimeSpan classes.

var start = DateTime.Now;
DoCode();
var end = DateTime.Now;
var msElapsed = (end - start).TotalMilliseconds;

Console.WriteLine("Execution Time: " + msElapsed + " ms");
One nice thing about the DateTime API as shown above is a natural conversion to TimeSpan's by subtracting one DateTime from another.  This gives us the time difference, or total time elapsed in the code, to a pretty high accuracy in milliseconds (+/- 20 ms in my experience).  As you can imagine, your code can get pretty cluttered as you litter it with these timing calls.  So to simplify a little, I developed a set of helper methods to do performance timings:
private string _indent = "";
private DateTime _lastEvent = DateTime.Now;
public void LogTimingMessage(string method, string message, bool includeElapsedTime = true, TimeSpan? eventElapsed = null)
{
 var eventTime = DateTime.Now;
 var logFormat = "{0} ({1})-{2}{3}:{4}";
 if (includeElapsedTime) logFormat += " ({5} ms)";
 if (eventElapsed == null) eventElapsed = eventTime - _lastEvent;
 var log = string.Format(logFormat, TimeElapsed, eventTime, _indent, method, message, eventElapsed.Value.TotalMilliseconds);
 Console.WriteLine(log);
 _logging += log + "\n";
}

public void LogTimingVariable(string method, string variable, string value)
{
 var message = string.Format("{0}={1}", variable, value);
 LogTimingMessage(method, message);
}

public T LogTimingTask<T>(string method, string taskDescription, Func<T> Task)
{
 _indent += "\t";
 var start = DateTime.Now;
 var result = Task();
 var end = DateTime.Now;
 _indent = _indent.Substring(0, _indent.Length - 1);
 LogTimingMessage(method, "Task " + taskDescription, eventElapsed:(end - start));
 return result;
}

public void LogTimingTask(string method, string taskDescription, Action Task)
{
 LogTimingTask<bool>(method, taskDescription, () => { Task(); return true; });
}

public void LogTimingStart(string method)
{
 LogTimingMessage(method, "Start");
}

public void LogTimingEnd(string method)
{
 LogTimingMessage(method, "End", eventElapsed:DateTime.Now - Start);
}
The basic idea is that every call to a Log method prints out a message of how much time has elapsed since the last Log call.  Furthermore, you can wrap groups of code or method calls, and it will time execution of that group of code, as well as indent inner calls to show the nature of inner vs. outer timings.  Perhaps a real-world example usage will demonstrate what I mean:
var createResult = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "Create Employee", () =>
 {
   return APIMethods.CreateEmployee(ee, depInfo, wage, recordId, (SetupContext context, int employeeId) =>
   {
    //Common.LogMessage((DateTime.Now - start).TotalMilliseconds.ToString() + " ms", "APIMethods.CreateEmployee");

    var innerStart = DateTime.Now;
    var companyCode = ee.CompanyCode;

    var emp = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "Get Employee", () =>
     {
       return context.GetEmployee(employeeId);
     });
    //Common.LogMessage((DateTime.Now - innerStart).TotalMilliseconds.ToString() + " ms", "context.GetEmployee");
    innerStart = DateTime.Now;

    var resolver = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "Build Indexed Resolver", () =>
     {
       return new IndexedCompanySnapshotForeignKeyMapper(context, companyCode);
     });

    //Common.LogMessage((DateTime.Now - innerStart).TotalMilliseconds.ToString() + " ms", "IndexedCompanySnapshotForeignKeyMapper");
    innerStart = DateTime.Now;


    var locationId = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "Resolve Location Id", () =>
     {
      return resolver.ResolveLocationId(companyCode).Id;
     });

    var cpobjs = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "emp.AsEnumerable", () =>
     {
      return emp.AsEnumerable(EmployeeFlag.All, false).ToList();
     });

    var hrobjs = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "Convert cpobjs", () =>
     {
      return cpobjs.Select(c => { var hrobj = HRObjectUtility.Convert(c, resolver); hrobj.LocationId = locationId; return hrobj; }).ToList();
     });

    csi = new ChangeSetInfo();
    csi.Inserts = cpobjs;
    csi.HRInserts = hrobjs;
    csi.Updates = new List<HRObject>() { HREmployee.Convert(emp, resolver, EmployeeFlag.None) };
    csi.Updates[0].LocationId = locationId;
    csi.Deletes = new List<HRObject>();

    return true;
   }
  );
 }
);

The output looks something like this:
00:00:00.2260226 (9/6/2013 2:44:32 PM)-API.CreateEmployeeChangeSet:Start (226.0226 ms)
00:00:02.5132513 (9/6/2013 2:44:34 PM)-  APIMethods.CreateEmployee:Task wrapper.CreateEmployee (2224.2224 ms)
00:00:02.5302530 (9/6/2013 2:44:34 PM)-  APIMethods.CreateEmployee:Task context.InsertTranslationID (17.0017 ms)
00:00:02.5622562 (9/6/2013 2:44:34 PM)-  APIMethods.CreateEmployee:Task context.SubmitChanges (32.0032 ms)
00:00:03.2523252 (9/6/2013 2:44:35 PM)-  API.CreateEmployeeChangeSet:Task Get Employee (688.0688 ms)
00:00:06.2986298 (9/6/2013 2:44:38 PM)-  API.CreateEmployeeChangeSet:Task Build Indexed Resolver (3046.3046 ms)
00:00:06.3156315 (9/6/2013 2:44:38 PM)-  API.CreateEmployeeChangeSet:Task Resolve Location Id (17.0017 ms)
00:00:06.3206320 (9/6/2013 2:44:38 PM)-  API.CreateEmployeeChangeSet:Task emp.AsEnumerable (5.0005 ms)
00:00:06.3766376 (9/6/2013 2:44:38 PM)-  API.CreateEmployeeChangeSet:Task Convert cpobjs (56.0056 ms)
00:00:06.4176417 (9/6/2013 2:44:38 PM)-  APIMethods.CreateEmployee:Task context.AddEmployeeUpdateRecord (30.003 ms)
00:00:06.5426542 (9/6/2013 2:44:38 PM)-API.CreateEmployeeChangeSet:Task Create Employee (6316.6316 ms)
00:00:06.9636963 (9/6/2013 2:44:39 PM)-API.CreateEmployeeChangeSet:Task APIMethods.ProcessChangeSet (421.0421 ms)
00:00:06.9636963 (9/6/2013 2:44:39 PM)-API.CreateEmployeeChangeSet:End (6963.6963 ms)
At a glance it is fairly obvious which methods need the most attention - the "APIMethods.CreateEmployee" call which takes about 2.2 seconds, and the "Build Indexed Resolver" step, which takes about 3 seconds by itself.  With this data analysis in hand, I know where to focus my attention to get the biggest return on performance.

Happy programming!

Friday, August 23, 2013

An Error? Carry on anyway!

Lately I've had to do a lot of development in our core internal application which is responsible for managing data flow between our clients systems and our network.  This system also coordinates data for some of our 3rd-party partner integrations, such as Pay-as-you-Go Worker's Compensation.  Multiple times throughout any day, a number of scheduled tasks are run which imports data into databases, downloads and processes XML files, generates and uploads more XML files, and other similar large-scale data processing tasks.

So you'd think a system at this core level should be well-designed, right?  Wrong!  There are a number of faults in the architecture of this system, but by far the most blatant is a pattern called "Exception? Won't happen!"  It's the most pernicious of them all because it results in a number of complex, hard-to-debug data defects:

  • Data partially processed, or processed incorrectly and has to be manually rolled back.
  • Files that are generated, but not uploaded properly.
  • Multiple day's worth of data just "missing" from generated files.
Here's what the pattern looks like in a nut-shell:

Try
  PrepareData()
Catch ex As Exception
  LogException(ex)
End Try

Try
  GenerateFile()
Catch ex As Exception
  LogException(ex)
End Try

Try
  ArchiveData()
Catch ex As Exception
  LogException(ex)
End Try

Try
  UploadFile()
Catch ex As Exception
  LogException(ex)
End Try

At first glance you might think there's nothing wrong - the developer has wisely adding exception handling and logging to the system to make sure errors can be detected and reviewed.  The problem comes if something does indeed fail.  For example, what happens if "UploadFile()" process fails?  Well, the first three processes will have already finished and been committed.  The data has been archived permanently, but the generated file never got uploaded to the 3rd-party network.  That means they will never receive the data, and we will never send it again because it's been marked "complete"!  Apparently the developer assumed that "nothing could go wrong".

Resolving this defect can be a little time-consuming, but definitely worth it. I generally approach it this way:

  1. Wrap all sub-steps in some sort of transaction.
  2. Pass the same transaction to each method.
  3. If the entire process (work unit) is successful, commit the transaction at the end.
  4. If anything fails on any step, roll the entire process back, including deleting any temp files that were partially created.
  5. Everything is left in a consistent state then for another try in the next processing cycle.
Just for fun, here's another gem from the same code-base that occurred while stepping through with the debugger in Visual Studio (the highlighted line is the next statement to execute):



Happy programming!

Friday, August 16, 2013

Windows Azure RDP

So the other day I was looking for a good solution to managing multiple RDP windows. In a normal day I will be working with up to 10 different RDP sessions at once, and it gets annoying managing all those icons on the taskbar.  After hunting around and comparing a few different products, I settled with Terminals, which is an open-source RDP session manager hosted on CodePlex.  My ideal RDP manager would have the following core properties:

  • Has good connection management - groups, saved credentials, etc.
  • Captures special keys correctly (Windows key, Alt+Tab, Ctrl+Shift+Esc, etc.)
  • Supports Windows Azure
Terminals comes close, because it handles the first two very well, but unfortunately, it does not support Windows Azure.  However, since it is open-source and written in C#.Net, I thought I would just modify it myself to add support.

If you've worked with Windows Azure, you'll be familiar with the Connect link at the bottom of the Azure Portal when viewing information about a service instance.  That link gives you an .RDP file to download which can be used to connect to the instance.  If you type in the public instance name (name.cloudapp.net) into an RDP session without going through the .RDP file download, you'll quickly discover that it doesn't work.  So what is different about generating the .RDP file directly from the portal?

When you open the file, there is a special property value unique to Windows Azure that is required for the connection.  Without it, you get no response whatsoever, and no idea why your connection is not working.  I guess that's a security feature by design in Azure.  Here's an example file (with values redacted):

full address:s:subdomain.cloudapp.net
username:s:bobbytest
LoadBalanceInfo:s:Cookie: mstshash=subdomain#subdomain_IN_0
You'll notice there is a special value called "LoadBalanceInfo" which is not present on typical RDP sessions.  So I added the property to the UI of the Terminals program and modified all the places that generate RDP sessions to make use of the property as well.  However, every time I would try to connect, still no response.  After doing a little research, I became convinced that I was just missing a small detail and WireShark (network capture software) would provide the answer.

With WireShark I quickly discovered exactly what the issue was - my strings were getting sent over the wire as UTF-16, which means every character was 2 bytes.  Here is an example of what the TCP conversation dump looks like:

C.o.o.k.i.e.:. .m.s.t.s.h.a.s.h.=.s.u.b.d.o.m.a.i.n.#.s.u.b.d.o.m.a.i.n._.I.N._.0.

The dots in between each character are 0-byte characters making up the other half of the 2-byte unicode characters.  Since the class being used for the RDP connections is an ActiveX component provided by Microsoft, actually modifying the class was out of the question.  But there is a way - re-encode the string as UTF-8 by squishing two characters together into each 2-byte character.  It becomes a really weird string if you try to interpret as UTF-16 (the default in .NET), but the ActiveX class which reads as 1-byte characters works beautifully:

var b = Encoding.UTF8.GetBytes(temp);
var newLBI = Encoding.Unicode.GetString(b);

The only catch to this solution is that if the original string has an odd number of characters, the last character will still be a 0-byte and be rejected.  Simple solution is to pad with an extra space at the end, which seems to work okay.  Add to that the requirement that LoadBalanceInfo always be suffixed by "\r\n", and the full working solution is below:

var temp = LoadBalanceInfo;
if (temp.Length % 2 == 1) temp += " ";
temp += "\r\n";
var b = Encoding.UTF8.GetBytes(temp);
var newLBI = Encoding.Unicode.GetString(b);

Magically, it works!  I created and uploaded a patch that adds support for Windows Azure to the Terminals website, so you can try it for yourself.  Happy programming!

Friday, August 9, 2013

Basic Auth > Anonymous Auth

Had a very strange issue today.  We have a WCF-service based system we're deploying on a client's network.  One of the services uses Basic Authentication for normal calls, which is handled via a custom HttpModule.  However, one specific subfolder of the WCF service (/downloads/) we wanted to have only anonymous auth so that files could be downloaded without a password.

It seemed like it should be relatively straightforward.  I modified the logic in the Basic Auth module to skip authentication step for any path starting with /downloads/.  It worked beautifully in our testing environment.  However, the problems began when we moved the code into our client's network.  Every time I would try to access a url containing /downloads/, I would incorrectly get the Basic Auth prompt, even though it was supposed to be exempted.

In an attempt to debug the issue, I commented out the Basic Auth module completely from web.config so the website would use Anonymous Auth globally.  However, when I tried to access any path in the service from a web browser, it would generate a 401.3 error, which is a physical ACL access denied error.  It didn't make any sense that the account should be disabled from access because the identity pool account for the IIS website had full permissions to the folder containing the service files.

After doing a little research I discovered that the account used by default for anonymous auth is specified separately from the account for the identity pool.  Even if you specify in the global Website Basic Settings that the "Connect As" should be Pass-though (Application Pool Identity), that is separate from the setting for Anonymous Auth.  Turns out if you right-click on the Anonymous Auth setting in an IIS site, you can specify the account used for requests made as Anonymous, and by default the account is IUSR.  We changed this to use the application pool identity and it started working beautifully.

However, this leaves me somewhat puzzled as to how the Basic Auth account was working when the Anonymous Auth is not working.  The Basic Auth accounts are tied to a database, which is entirely segregated from the ACL-level permissions in windows, which are tied to Active Directory in the network.  Apparently, it seems that just by fact of using Basic Auth with any account, it uses the Application Pool Identity, but if you have no username at all, then it assumes the Anonymous Auth default user setting - regardless of whether your Basic Auth username has anything to do with the network.  Very unexpected behavior, and very frustrating to debug.

Happy programming!

A Debugging Nightmare

A few weeks ago, I ran into the most complicated bug I think I have ever had to solve in my entire programming career.

We are developing a fairly complex system that involves 4 WCF services, a SQL database, and an integration component to a Microsoft Access front-end application.  The bulk of the system involves synchronizing data between the Microsoft Access database and the SQL database via XML files.  The system was largely developed by a 3rd-party contractor who came in-house on a particular day so that we could work together to try to resolve the issue.

The basic problem was that initiating the sync would work fine when we manually started it via a stored procedure in SQL server, but when doing an end-to-end process from Microsoft Access, it failed every time.  The two calls should have been 100% identical, because we were manually calling the stored procedure that eventually gets called by Microsoft Access.  We could even demonstrate through our logging that the exact same stored procedure was being called in both cases with the same parameters, but it would only work when manually run.

We traced the calls through using a combination of database logging, text file logging, and Fiddler tracing to try and see what was going on.  There was nothing we could see different about the two requests and no clear reason why it would fail, until suddenly we stumbled on a clue.  When running the end-to-end test from Microsoft Access, it would fail after 30 seconds with a timeout error message.

At first, we thought one of the WCF services was timing out because the error message looked exactly like a web service timeout.  But eventually with the help of Fiddler (which I *highly* recommend btw!) it was clear the error message came from the server-side via a FaultException, not from the client-side.  So the error was occurring inside the service.  Eventually, I pinpointed it down to a single database call that was generating a timeout error, but only when done through the end-to-end client call.

It wasn't until I pulled out the stacktrace for the FaultException and tracked down the exact line with the error that I had the "aha!" moment.  Turns out, the real problem was a timeout caused by the process running in a transaction.  There is a record modified right before the stored procedure is called, which then calls .NET code that tries to modify the same record, but the table has been locked.  Apparently, the .NET code called by a stored procedure is actually a separate session from the caller, and so they get into deadlock situation.  Once I saw the error message, it was immediately obvious what the problem was.  I simply removed the conflicting update statements and now it works perfectly.

Happy programming!

Monday, July 29, 2013

Middle-Click Conundrum

So today I am once again reminded how stubbornly frustrating computer software can be when it goes awry.  It seems like on a daily basis I am confounded by some new unexpected behavior, an expectation mismatch, or an obscure error message that I have not yet seen before.

Today it is a middle mouse-button problem.  This isn't the first time I've seen it, but I think I first noticed a couple weeks ago on Slashdot.  I have a habit of using middle-click to open a link into a new tab, but recently, in Slashdot articles middle-clicking both opens a new tab *and* navigates in the current tab.  Quite annoying!

Doing a little research I found (as usual) that I'm not the only one plagued by this behavior.  According to this SuperUser post, it has to do with links that use onclick events for navigation.  Sure enough, when I examine the link in Chrome Dev Tools, there is indeed an onclick event in the anchor that performs navigation.

The fix is simple enough.  Install this chrome extension, and the problem goes away.  Works for me!