Wednesday, September 11, 2013

Performance Tuning C#.Net Code

If you've developed a project with any level of complexity, you have surely run into the problem of needing to do performance optimization.  You've got too many calls to the database; or your inner loops are inefficient; or you need to convert some large arrays into dictionary lookups, etc.  As it happens, my project this week has been to optimize the performance of some functionality for creating records in a cloud-based system that live synchronizes every update with a second database.

The most complicated calls in this system involve what we call "business logic".  A record update made in the cloud system is synchronized to the back-end database (via web-service call), and then there are other records in related tables that are automatically updated as well (think of a trigger), and those changes are sent back.  On top of that, if new records in related tables are being created, the ID values of those new records have to be sent in a second web service call to keep record ID's in sync.  For some of our larger customers, this entire two-way communication process from end-to-end has been taking upwards of a couple of minutes to execute, which is not acceptable to them or to us.

So how to approach a problem like this, where you have a performance issue in a complex system composed of many moving parts in separate locations?  A couple of basic rules come to mind, like the old mantra "Premature optimization is the root of all evil" (Donald Knuth).  Basically, we must first measure the code to determine what is in need of optimization, because it is very often not in the place you intuitively think it is.  The simplest way to measure code performance in .NET is to make use of the DateTime and TimeSpan classes.

var start = DateTime.Now;
DoCode();
var end = DateTime.Now;
var msElapsed = (end - start).TotalMilliseconds;

Console.WriteLine("Execution Time: " + msElapsed + " ms");
One nice thing about the DateTime API as shown above is a natural conversion to TimeSpan's by subtracting one DateTime from another.  This gives us the time difference, or total time elapsed in the code, to a pretty high accuracy in milliseconds (+/- 20 ms in my experience).  As you can imagine, your code can get pretty cluttered as you litter it with these timing calls.  So to simplify a little, I developed a set of helper methods to do performance timings:
private string _indent = "";
private DateTime _lastEvent = DateTime.Now;
public void LogTimingMessage(string method, string message, bool includeElapsedTime = true, TimeSpan? eventElapsed = null)
{
 var eventTime = DateTime.Now;
 var logFormat = "{0} ({1})-{2}{3}:{4}";
 if (includeElapsedTime) logFormat += " ({5} ms)";
 if (eventElapsed == null) eventElapsed = eventTime - _lastEvent;
 var log = string.Format(logFormat, TimeElapsed, eventTime, _indent, method, message, eventElapsed.Value.TotalMilliseconds);
 Console.WriteLine(log);
 _logging += log + "\n";
}

public void LogTimingVariable(string method, string variable, string value)
{
 var message = string.Format("{0}={1}", variable, value);
 LogTimingMessage(method, message);
}

public T LogTimingTask<T>(string method, string taskDescription, Func<T> Task)
{
 _indent += "\t";
 var start = DateTime.Now;
 var result = Task();
 var end = DateTime.Now;
 _indent = _indent.Substring(0, _indent.Length - 1);
 LogTimingMessage(method, "Task " + taskDescription, eventElapsed:(end - start));
 return result;
}

public void LogTimingTask(string method, string taskDescription, Action Task)
{
 LogTimingTask<bool>(method, taskDescription, () => { Task(); return true; });
}

public void LogTimingStart(string method)
{
 LogTimingMessage(method, "Start");
}

public void LogTimingEnd(string method)
{
 LogTimingMessage(method, "End", eventElapsed:DateTime.Now - Start);
}
The basic idea is that every call to a Log method prints out a message of how much time has elapsed since the last Log call.  Furthermore, you can wrap groups of code or method calls, and it will time execution of that group of code, as well as indent inner calls to show the nature of inner vs. outer timings.  Perhaps a real-world example usage will demonstrate what I mean:
var createResult = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "Create Employee", () =>
 {
   return APIMethods.CreateEmployee(ee, depInfo, wage, recordId, (SetupContext context, int employeeId) =>
   {
    //Common.LogMessage((DateTime.Now - start).TotalMilliseconds.ToString() + " ms", "APIMethods.CreateEmployee");

    var innerStart = DateTime.Now;
    var companyCode = ee.CompanyCode;

    var emp = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "Get Employee", () =>
     {
       return context.GetEmployee(employeeId);
     });
    //Common.LogMessage((DateTime.Now - innerStart).TotalMilliseconds.ToString() + " ms", "context.GetEmployee");
    innerStart = DateTime.Now;

    var resolver = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "Build Indexed Resolver", () =>
     {
       return new IndexedCompanySnapshotForeignKeyMapper(context, companyCode);
     });

    //Common.LogMessage((DateTime.Now - innerStart).TotalMilliseconds.ToString() + " ms", "IndexedCompanySnapshotForeignKeyMapper");
    innerStart = DateTime.Now;


    var locationId = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "Resolve Location Id", () =>
     {
      return resolver.ResolveLocationId(companyCode).Id;
     });

    var cpobjs = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "emp.AsEnumerable", () =>
     {
      return emp.AsEnumerable(EmployeeFlag.All, false).ToList();
     });

    var hrobjs = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "Convert cpobjs", () =>
     {
      return cpobjs.Select(c => { var hrobj = HRObjectUtility.Convert(c, resolver); hrobj.LocationId = locationId; return hrobj; }).ToList();
     });

    csi = new ChangeSetInfo();
    csi.Inserts = cpobjs;
    csi.HRInserts = hrobjs;
    csi.Updates = new List<HRObject>() { HREmployee.Convert(emp, resolver, EmployeeFlag.None) };
    csi.Updates[0].LocationId = locationId;
    csi.Deletes = new List<HRObject>();

    return true;
   }
  );
 }
);

The output looks something like this:
00:00:00.2260226 (9/6/2013 2:44:32 PM)-API.CreateEmployeeChangeSet:Start (226.0226 ms)
00:00:02.5132513 (9/6/2013 2:44:34 PM)-  APIMethods.CreateEmployee:Task wrapper.CreateEmployee (2224.2224 ms)
00:00:02.5302530 (9/6/2013 2:44:34 PM)-  APIMethods.CreateEmployee:Task context.InsertTranslationID (17.0017 ms)
00:00:02.5622562 (9/6/2013 2:44:34 PM)-  APIMethods.CreateEmployee:Task context.SubmitChanges (32.0032 ms)
00:00:03.2523252 (9/6/2013 2:44:35 PM)-  API.CreateEmployeeChangeSet:Task Get Employee (688.0688 ms)
00:00:06.2986298 (9/6/2013 2:44:38 PM)-  API.CreateEmployeeChangeSet:Task Build Indexed Resolver (3046.3046 ms)
00:00:06.3156315 (9/6/2013 2:44:38 PM)-  API.CreateEmployeeChangeSet:Task Resolve Location Id (17.0017 ms)
00:00:06.3206320 (9/6/2013 2:44:38 PM)-  API.CreateEmployeeChangeSet:Task emp.AsEnumerable (5.0005 ms)
00:00:06.3766376 (9/6/2013 2:44:38 PM)-  API.CreateEmployeeChangeSet:Task Convert cpobjs (56.0056 ms)
00:00:06.4176417 (9/6/2013 2:44:38 PM)-  APIMethods.CreateEmployee:Task context.AddEmployeeUpdateRecord (30.003 ms)
00:00:06.5426542 (9/6/2013 2:44:38 PM)-API.CreateEmployeeChangeSet:Task Create Employee (6316.6316 ms)
00:00:06.9636963 (9/6/2013 2:44:39 PM)-API.CreateEmployeeChangeSet:Task APIMethods.ProcessChangeSet (421.0421 ms)
00:00:06.9636963 (9/6/2013 2:44:39 PM)-API.CreateEmployeeChangeSet:End (6963.6963 ms)
At a glance it is fairly obvious which methods need the most attention - the "APIMethods.CreateEmployee" call which takes about 2.2 seconds, and the "Build Indexed Resolver" step, which takes about 3 seconds by itself.  With this data analysis in hand, I know where to focus my attention to get the biggest return on performance.

Happy programming!

No comments:

Post a Comment