A Programmer's Adventure: September 2013

Wednesday, September 25, 2013

The Power of Indexes

Indexing is one of those things that a developer seems to forget about, because it seems to be a "DBA"-role for maintaining the database. After a week of performance tuning of our cloud application, I am convinced that indexing should be a key part of the my process for any performance analysis. There are a couple of simple ways to find appropriate indexes for your database to address the low-hanging fruit of poor database performance. The process looks something like this:

Identify slow performing queries in your application.
Develop a SQL script containing test queries that are consistently low performance.
Execute the script in SSMS using the option to include the Execution Plan.
Review & apply suggested indexes, and retest again.

The first step can be done in multiple ways. My favorites are to use timing code like I posted previously, or to use System DMV's (Dynamic Management Views) to identify the most expensive queries. You can also use a tool like SQL Profiler to capture SQL traffic and timings, although this method can be challenging to apply practically because of excessive traffic "noise".
Here is one such example to find expensive queries:

SELECT TOP 10 SUBSTRING(qt.TEXT, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(qt.TEXT)
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2)+1),
qs.execution_count,
qs.total_logical_reads, qs.last_logical_reads,
qs.total_logical_writes, qs.last_logical_writes,
qs.total_worker_time,
qs.last_worker_time,
qs.total_elapsed_time/1000000 total_elapsed_time_in_S,
qs.last_elapsed_time/1000000 last_elapsed_time_in_S,
qs.last_execution_time,
qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
ORDER BY qs.total_logical_reads DESC -- logical reads
-- ORDER BY qs.total_logical_writes DESC -- logical writes
-- ORDER BY qs.total_worker_time DESC -- CPU time

The result is a list of queries and the time (in microseconds) that the query typically takes to execute. A rule of thumb I used is infrequent queries taking more than a couple seconds should be analyzed, as well as frequent queries taking more than about 100 ms for a high-performance, low-latency application. These requirements will vary depending on the needs of your application.
Once you identify a query to review, use these highlighted options in SSMS to include the Execution Plan and Client Statistics for good data to analyze the query:

With luck, the Execution Plan will provide a clue as to the reason for the poor performance. Reading execution plans is a whole other topic which is worth researching in and of itself. If you have a clear need for an index, and you have a good test set, SSMS 2012 will suggest an index at the top of the window which you can click to generate the script. Running this script will apply the index, and allow you to retest and see if the query performance is improved. Using this method we found 7-8 indexes that improved the overall performance of our application by about 10-15%.

While indexing a database is a good way to quickly improve performance with large amounts of data, there are a few caveats to keep in mind:

An index must be maintained, which means INSERT/UPDATE queries will have a slight performance impact for each index. This trade-off must be weighed carefully with each index, so at least one performance test should include real-world traffic patterns.
There is no replacement for good programming practices. It is better to fix chattiness in your application or cache often-used data rather than just use indexing to make the queries faster. For instance, using the wrong algorithm in your application can have a much more severe impact on performance than using a slow query.

Happy programming!

Friday, September 13, 2013

Friday TIP: Switch Application By Name

I have for awhile been searching for a program that would allow me to manage all my open windows – as I typically have anywhere from 30-40 windows open on any given day. The most frustrating thing was eyeing over the icons trying to find a certain window, and can’t seem to find it. So enter some new tools for switching to open program by name:

Switcher – Adds a new shortcut Windows+~ that pulls up an overlay with a grid of screenshots of open windows, sorted by most often active. Also, you can start typing, and it will search windows by name. Downside: Requires Windows Aero to be activated, and it is somewhat graphic/cpu intense. Probably not going to work well on a laptop.
ENSO Launcher – Replaces your CAPS Lock as a shortcut to a popup window where you can type various commands (if you’re like me and hardly ever use CAPs lock anyway, it’s not a big deal to lose this key). You can type “go abcd” where abcd is part of the name of an open window. As you type a smart-list dynamically populates with closest matches, and you simply press Enter to switch to the application.

I have both installed, and I’m quickly preferring ENSO Launcher because it is less graphic intense and more performant.

Wednesday, September 11, 2013

Performance Tuning C#.Net Code

If you've developed a project with any level of complexity, you have surely run into the problem of needing to do performance optimization. You've got too many calls to the database; or your inner loops are inefficient; or you need to convert some large arrays into dictionary lookups, etc. As it happens, my project this week has been to optimize the performance of some functionality for creating records in a cloud-based system that live synchronizes every update with a second database.

The most complicated calls in this system involve what we call "business logic". A record update made in the cloud system is synchronized to the back-end database (via web-service call), and then there are other records in related tables that are automatically updated as well (think of a trigger), and those changes are sent back. On top of that, if new records in related tables are being created, the ID values of those new records have to be sent in a second web service call to keep record ID's in sync. For some of our larger customers, this entire two-way communication process from end-to-end has been taking upwards of a couple of minutes to execute, which is not acceptable to them or to us.

So how to approach a problem like this, where you have a performance issue in a complex system composed of many moving parts in separate locations? A couple of basic rules come to mind, like the old mantra "Premature optimization is the root of all evil" (Donald Knuth). Basically, we must first measure the code to determine what is in need of optimization, because it is very often not in the place you intuitively think it is. The simplest way to measure code performance in .NET is to make use of the DateTime and TimeSpan classes.

var start = DateTime.Now;
DoCode();
var end = DateTime.Now;
var msElapsed = (end - start).TotalMilliseconds;

Console.WriteLine("Execution Time: " + msElapsed + " ms");

One nice thing about the DateTime API as shown above is a natural conversion to TimeSpan's by subtracting one DateTime from another. This gives us the time difference, or total time elapsed in the code, to a pretty high accuracy in milliseconds (+/- 20 ms in my experience). As you can imagine, your code can get pretty cluttered as you litter it with these timing calls. So to simplify a little, I developed a set of helper methods to do performance timings:

private string _indent = "";
private DateTime _lastEvent = DateTime.Now;
public void LogTimingMessage(string method, string message, bool includeElapsedTime = true, TimeSpan? eventElapsed = null)
{
 var eventTime = DateTime.Now;
 var logFormat = "{0} ({1})-{2}{3}:{4}";
 if (includeElapsedTime) logFormat += " ({5} ms)";
 if (eventElapsed == null) eventElapsed = eventTime - _lastEvent;
 var log = string.Format(logFormat, TimeElapsed, eventTime, _indent, method, message, eventElapsed.Value.TotalMilliseconds);
 Console.WriteLine(log);
 _logging += log + "\n";
}

public void LogTimingVariable(string method, string variable, string value)
{
 var message = string.Format("{0}={1}", variable, value);
 LogTimingMessage(method, message);
}

public T LogTimingTask<T>(string method, string taskDescription, Func<T> Task)
{
 _indent += "\t";
 var start = DateTime.Now;
 var result = Task();
 var end = DateTime.Now;
 _indent = _indent.Substring(0, _indent.Length - 1);
 LogTimingMessage(method, "Task " + taskDescription, eventElapsed:(end - start));
 return result;
}

public void LogTimingTask(string method, string taskDescription, Action Task)
{
 LogTimingTask<bool>(method, taskDescription, () => { Task(); return true; });
}

public void LogTimingStart(string method)
{
 LogTimingMessage(method, "Start");
}

public void LogTimingEnd(string method)
{
 LogTimingMessage(method, "End", eventElapsed:DateTime.Now - Start);
}

The basic idea is that every call to a Log method prints out a message of how much time has elapsed since the last Log call. Furthermore, you can wrap groups of code or method calls, and it will time execution of that group of code, as well as indent inner calls to show the nature of inner vs. outer timings. Perhaps a real-world example usage will demonstrate what I mean:

var createResult = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "Create Employee", () =>
 {
   return APIMethods.CreateEmployee(ee, depInfo, wage, recordId, (SetupContext context, int employeeId) =>
   {
    //Common.LogMessage((DateTime.Now - start).TotalMilliseconds.ToString() + " ms", "APIMethods.CreateEmployee");

    var innerStart = DateTime.Now;
    var companyCode = ee.CompanyCode;

    var emp = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "Get Employee", () =>
     {
       return context.GetEmployee(employeeId);
     });
    //Common.LogMessage((DateTime.Now - innerStart).TotalMilliseconds.ToString() + " ms", "context.GetEmployee");
    innerStart = DateTime.Now;

    var resolver = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "Build Indexed Resolver", () =>
     {
       return new IndexedCompanySnapshotForeignKeyMapper(context, companyCode);
     });

    //Common.LogMessage((DateTime.Now - innerStart).TotalMilliseconds.ToString() + " ms", "IndexedCompanySnapshotForeignKeyMapper");
    innerStart = DateTime.Now;


    var locationId = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "Resolve Location Id", () =>
     {
      return resolver.ResolveLocationId(companyCode).Id;
     });

    var cpobjs = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "emp.AsEnumerable", () =>
     {
      return emp.AsEnumerable(EmployeeFlag.All, false).ToList();
     });

    var hrobjs = LoggingContext.Current.LogTimingTask("API.CreateEmployeeChangeSet", "Convert cpobjs", () =>
     {
      return cpobjs.Select(c => { var hrobj = HRObjectUtility.Convert(c, resolver); hrobj.LocationId = locationId; return hrobj; }).ToList();
     });

    csi = new ChangeSetInfo();
    csi.Inserts = cpobjs;
    csi.HRInserts = hrobjs;
    csi.Updates = new List<HRObject>() { HREmployee.Convert(emp, resolver, EmployeeFlag.None) };
    csi.Updates[0].LocationId = locationId;
    csi.Deletes = new List<HRObject>();

    return true;
   }
  );
 }
);

The output looks something like this:

00:00:00.2260226 (9/6/2013 2:44:32 PM)-API.CreateEmployeeChangeSet:Start (226.0226 ms)
00:00:02.5132513 (9/6/2013 2:44:34 PM)-  APIMethods.CreateEmployee:Task wrapper.CreateEmployee (2224.2224 ms)
00:00:02.5302530 (9/6/2013 2:44:34 PM)-  APIMethods.CreateEmployee:Task context.InsertTranslationID (17.0017 ms)
00:00:02.5622562 (9/6/2013 2:44:34 PM)-  APIMethods.CreateEmployee:Task context.SubmitChanges (32.0032 ms)
00:00:03.2523252 (9/6/2013 2:44:35 PM)-  API.CreateEmployeeChangeSet:Task Get Employee (688.0688 ms)
00:00:06.2986298 (9/6/2013 2:44:38 PM)-  API.CreateEmployeeChangeSet:Task Build Indexed Resolver (3046.3046 ms)
00:00:06.3156315 (9/6/2013 2:44:38 PM)-  API.CreateEmployeeChangeSet:Task Resolve Location Id (17.0017 ms)
00:00:06.3206320 (9/6/2013 2:44:38 PM)-  API.CreateEmployeeChangeSet:Task emp.AsEnumerable (5.0005 ms)
00:00:06.3766376 (9/6/2013 2:44:38 PM)-  API.CreateEmployeeChangeSet:Task Convert cpobjs (56.0056 ms)
00:00:06.4176417 (9/6/2013 2:44:38 PM)-  APIMethods.CreateEmployee:Task context.AddEmployeeUpdateRecord (30.003 ms)
00:00:06.5426542 (9/6/2013 2:44:38 PM)-API.CreateEmployeeChangeSet:Task Create Employee (6316.6316 ms)
00:00:06.9636963 (9/6/2013 2:44:39 PM)-API.CreateEmployeeChangeSet:Task APIMethods.ProcessChangeSet (421.0421 ms)
00:00:06.9636963 (9/6/2013 2:44:39 PM)-API.CreateEmployeeChangeSet:End (6963.6963 ms)

At a glance it is fairly obvious which methods need the most attention - the "APIMethods.CreateEmployee" call which takes about 2.2 seconds, and the "Build Indexed Resolver" step, which takes about 3 seconds by itself. With this data analysis in hand, I know where to focus my attention to get the biggest return on performance.

Happy programming!