Saturday, May 28, 2016

Programming tips: Entity Framework (any ORM) is NOT the GOD of Data! You need to know how to work with it

Introduction

In my professional life, I've worked with different ORMs and of course the best ones were Hibernate and Entity Framework. They will save almost half of the time that you had to spend for your software by handling the requests to DB so you just need to know how to work with objects instead of relational data. However, they are no gods! They will do the same thing that you are telling them, They cannot work miracles because they don't know what is your intention! They are simply some software systems!

The problem

A simple question can result to the answer. Take an individual who knows Object Oriented and give him this classes.

Code:

class A

{

 public string Status{get;set;}

 public List<B> Bs{get;set;}

}



class B

{

 public string Status{get;set;}

 public List<C> Cs{get;set;}

}



class C

{

 public string Status{get;set;}

 public string Name{get;set;}

 }



Now ask him to give you name of cs that are in status 'BLBlah' he can find in B with Status 'Blah' and are in As with Status 'BlahBlah'. You will end up with a code like this:

Code:

var results = new List<string>();

Var selectedAs = context.As.Where(a=> a.Status== "BlahBlah" );

foreach(a in selectedAs)

{

     var selectedBs = a.Bs.Where(b=>b.Status=="Blah");

     foreach(b in selectedBs)

    {

      var selectedNames = b.Cs.Where(c=>c.Status=="BLBlah").Select(c=>c.Name).ToList();

       results.AddRange(selectedNames);

     }

}

return results;



Pretty straight forward, huh? He rocks! and the ORM rocks! Right?!
Well he just added a lot of overhead to your DB!

What is wrong with this code?

The way you've asked your query! Lets say there are 100 As with wanted status and for each of them you have 100 Bs with wanted status.
You've asked your ORM to find those 100 A, then for each of them you've asked it to find Bs and then for each one find Cs.
It is completely fine if you are working with your memory, but this code means :
1 call for As + (100 calls for Bs * 100 Call for Cs * (1 projection + addrange)) = 100001

So you've sent 10,001 requests to DB for a simple 3 layer select! How fast can it be?!!!

Lets See another Code

So now you might say that it is because of the foreachs I wrote, but that is not the case.
Lets say I want to convert the structure above to this one and then use it somewhere.

Code:

Calss ConvertedA

{

 public string Status{get;set;}

 public List<string> BStatuses{get;set;}

public List<string> CNames{get;set;}
}
With simple Object Oriented view you will probably end up with this code (With no foreach):
Code:

List<ConvertedA> GetConvertedList()

{

  var As= context.As;

  return context.As.Select(a=> new ConvertedA()

  {

   Status = a.Status,

  BStatuses = a.Bs.Select(b=>b.Status).ToList()

  CNames= a.Bs.SelectMany(b=>b.Cs).Select(c=>c.Name)

  };

}


OMG! What a wonderful query! right?! So Simple! But again, you are sending lots of requests to DB! why?
1  selectAs+ 100 Select Bs+ 100Bs*100Cs = 10,101 requests!

The Solution

There are 2 solutions to this problem. One is from the view of Software Architect, and the other from the view of programmer.


The programmer

Write the best query
As Mahdi Hasheminejad in the comments, there are many cases that you can select the correct data with one query. It is pretty useful and of course is the best way to solve the problem. As he mentioned, the query can be written like this:

var results = context.As
.Where(a => a.Status == "BlahBlah").SelectMany(a => a.Bs)
.Where(b => b.Status == "Blah").SelectMany(b => b.Cs)
.Where(c => c.Status == "BLBlah");

And the result will be translated to this query:
SELECT
[Extent3].[Status] AS [Status],
[Extent3].[Name] AS [Name],
FROM [A] AS [Extent1]
INNER JOIN [B] AS [Extent2] ON ...
INNER JOIN [C] AS [Extent3] ON ...
WHERE (N'BlahBlah' = [Extent1].[Status]) AND (N'Blah' = [Extent2].[Status]) AND (N'BLBlah' = [Extent3].[Status])

Load needed data in memory
If you cannot handle your request with a good query, load your data first! A good example for this case is when you need to compare something with the result of something out of your DB, like when you need to call a service.
Like for the first example, you can say:
var Cs = context.Cs.Where(c=>c.Stauts="Blah").ToList();
Var Bs = context.Bs..Where(b=>b.Status=="Blah").ToList();

Now use these 2 lists inside your foreach and compare them with their Ids.


The Architect

ORMs simply map tables to related objects in memory. But, there are 2 ways of  working with relations. Lazy loading (which is the default in most ORMS) and Eager Loading.
Lazy loading simply means that ORM will wait for you to ask for something, and then it will load the data. For instance :
var a = context.As.First();
This will only load 1 A object from the memory and nothing more.
Now if you write:
Var bs = a.Bs.ToList()
Your ORM will send another request to fetch the Bs.
This is exactly what most codes needs. But in some cases, we know that the a is not usable without their Bs. So the architect can decide to use Eager loading for that relation. So when you say
var a = context.As.First();

Your ORM will retries your A and all Bs that are related to it.

*The Eager loading, is not a good solution for 90% of the times. It depends to nature of your data! So don't use it perfunctory.





3 comments:

  1. In EF, eager loading is the answer to a few edge cases but this is not one of them. By loading all the data in memory, not only we're abusing system memory, but also we won't be able to take advantage of DBMS features like indexes.
    For this example I'd write the query this way:
    var results = context.As
    .Where(a => a.Status == "BlahBlah").SelectMany(a => a.Bs)
    .Where(b => b.Status == "Blah").SelectMany(b => b.Cs)
    .Where(c => c.Status == "BLBlah");

    And the result will be translated to this query:
    SELECT
    [Extent3].[Status] AS [Status],
    [Extent3].[Name] AS [Name],
    FROM [A] AS [Extent1]
    INNER JOIN [B] AS [Extent2] ON ...
    INNER JOIN [C] AS [Extent3] ON ...
    WHERE (N'BlahBlah' = [Extent1].[Status]) AND (N'Blah' = [Extent2].[Status]) AND (N'BLBlah' = [Extent3].[Status])

    This will be just one query and it won't load any additional data in system memory. Also we're taking advantage of indexes on primary keys which is much faster.

    ReplyDelete
    Replies
    1. Thanks buddy, I've updated the post. :)
      Certainly when all of your needs is inside the DB, there is no need to take it out and use the memory.

      Anyways, about the Eager loading, I emphasized that it is a decision by the architect, based on the nature of data. In other work, I won't look at a functionality and change my mapping, but if it does, then I prevented the additional calls.
      The simplest example can be a 1 to 1 relation. It makes sense the have the first table attached to the second one, every time you load it, and of course you don't change it considering the queries that people are writing later on.

      Delete
  2. Hi there, I found your blog via Google while searching for such kinda informative post and your post looks very interesting for me.
    ORM

    ReplyDelete