I have written previously questioning the performance of Entity Framework based upon the SQL it was generating in Examining Entity Framework’s SQL Generation. Finally things lightened up enough to give me time to examine the performance in more detail within the context of my application. Here’s what I found.
I started by running instrumentation on my WCF service and tracking a few calls that I knew caused the gigantic SQL statement explosion that I had previously seen. Immediately I saw that the execution of the SQL was the least of my concerns. It accounted for only 75ms of the 9500ms call time. The rest of the call time was hung up inside of Entity Framework, not visible to me through the performance profiler. I chose the RetrieveAllEmployees method to dissect. Here is some background on it:
- The method makes no apologies and simply retrieves all employees with some of their basic related data.
- 100 employee records in the database
- Each employee on average has 1 address, 1 phone number and 1 position
- There is little to no data in the Evaluations, Leave Of Absence and related tables.
So, the original code looks like:
public IList<Employee> RetrieveAllEmployees()
{
using (var containerScope = SrsIoCContainer.Current.BeginLifetimeScope())
{
var repository = containerScope.Resolve<IEmployeeRepository>();
return employees = repository.FindAll("PhoneNumbers", "Addresses", "Position", "Evaluations", "LeaveOfAbsences.Comments", "LeaveOfAbsences.LeaveOutOfOffices");
}
}
Inside the repository, this ends in a call to ObjectQuery with Include called on it for each parameter. This is a convenient calling syntax, but the result is the aforementioned ugly 5000 line SQL statement and horrific performance of 9.5 seconds!
So, I took the includes list down to nothing to get a baseline and the result was 150ms. I then added Address and PhoneNumbers back in resulting in a jump up to 800ms. From there each additional include added more than 1 second!
Lesson learned, do not use Include ever! Okay, maybe a bit drastic, but it definitely looks like the Entity Framework has some serious scaling issues with the Include functionality.
In order to address this issue, I decided to get rid of the Include calls and explicitly call LoadProperty on the Context. The nice thing about this method is that the SQL that gets generated is much more straight forward. However, it will be significantly more chatty with SQL Server as LoadProperty needs to be called on each retrieved entity instance, which then results in a database call too. The resulting code is uglier as well, but the performance speaks for itself, 350ms total down from 9500ms. Still not great, but I’ll take it as a win for the time being. Here is the new code:
public IList<Employee> RetrieveAllEmployees()
{
using (var containerScope = SrsIoCContainer.Current.BeginLifetimeScope())
{
var repository = containerScope.Resolve<IEmployeeRepository>();
var employees = repository.FindAll("PhoneNumbers", "Addresses", "Position", "Evaluations", "LeaveOfAbsences");
foreach (var employee in employees)
{
foreach (var loa in employee.LeaveOfAbsences)
{
repository.LoadProperty(loa, "Comments");
repository.LoadProperty(loa, "LeaveOutOfOffices");
}
}
return employees;
}
}
The repository now does the LoadProperty calls, but does not try to interpret the mutli-include syntax that the Include call handles. So, the caller needs to loop and include any additional navigation properties. I thought this would perform worse than the single SQL query that I started with, given all of the looping and querying going on, but to my surprise manual brute-force beats the ‘honed’ Entity Framework performance. Very scary.
I am still looking into performance and will update or add more blog entries if I come across anything interesting.