Saturday, May 28, 2016

Programming tips: Entity Framework (any ORM) is NOT the GOD of Data! You need to know how to work with it

Introduction

In my professional life, I've worked with different ORMs and of course the best ones were Hibernate and Entity Framework. They will save almost half of the time that you had to spend for your software by handling the requests to DB so you just need to know how to work with objects instead of relational data. However, they are no gods! They will do the same thing that you are telling them, They cannot work miracles because they don't know what is your intention! They are simply some software systems!

The problem

A simple question can result to the answer. Take an individual who knows Object Oriented and give him this classes.

Code:

class A

{

 public string Status{get;set;}

 public List<B> Bs{get;set;}

}



class B

{

 public string Status{get;set;}

 public List<C> Cs{get;set;}

}



class C

{

 public string Status{get;set;}

 public string Name{get;set;}

 }



Now ask him to give you name of cs that are in status 'BLBlah' he can find in B with Status 'Blah' and are in As with Status 'BlahBlah'. You will end up with a code like this:

Code:

var results = new List<string>();

Var selectedAs = context.As.Where(a=> a.Status== "BlahBlah" );

foreach(a in selectedAs)

{

     var selectedBs = a.Bs.Where(b=>b.Status=="Blah");

     foreach(b in selectedBs)

    {

      var selectedNames = b.Cs.Where(c=>c.Status=="BLBlah").Select(c=>c.Name).ToList();

       results.AddRange(selectedNames);

     }

}

return results;



Pretty straight forward, huh? He rocks! and the ORM rocks! Right?!
Well he just added a lot of overhead to your DB!

What is wrong with this code?

The way you've asked your query! Lets say there are 100 As with wanted status and for each of them you have 100 Bs with wanted status.
You've asked your ORM to find those 100 A, then for each of them you've asked it to find Bs and then for each one find Cs.
It is completely fine if you are working with your memory, but this code means :
1 call for As + (100 calls for Bs * 100 Call for Cs * (1 projection + addrange)) = 100001

So you've sent 10,001 requests to DB for a simple 3 layer select! How fast can it be?!!!

Lets See another Code

So now you might say that it is because of the foreachs I wrote, but that is not the case.
Lets say I want to convert the structure above to this one and then use it somewhere.

Code:

Calss ConvertedA

{

 public string Status{get;set;}

 public List<string> BStatuses{get;set;}

public List<string> CNames{get;set;}
}
With simple Object Oriented view you will probably end up with this code (With no foreach):
Code:

List<ConvertedA> GetConvertedList()

{

  var As= context.As;

  return context.As.Select(a=> new ConvertedA()

  {

   Status = a.Status,

  BStatuses = a.Bs.Select(b=>b.Status).ToList()

  CNames= a.Bs.SelectMany(b=>b.Cs).Select(c=>c.Name)

  };

}


OMG! What a wonderful query! right?! So Simple! But again, you are sending lots of requests to DB! why?
1  selectAs+ 100 Select Bs+ 100Bs*100Cs = 10,101 requests!

The Solution

There are 2 solutions to this problem. One is from the view of Software Architect, and the other from the view of programmer.


The programmer

Write the best query
As Mahdi Hasheminejad in the comments, there are many cases that you can select the correct data with one query. It is pretty useful and of course is the best way to solve the problem. As he mentioned, the query can be written like this:

var results = context.As
.Where(a => a.Status == "BlahBlah").SelectMany(a => a.Bs)
.Where(b => b.Status == "Blah").SelectMany(b => b.Cs)
.Where(c => c.Status == "BLBlah");

And the result will be translated to this query:
SELECT
[Extent3].[Status] AS [Status],
[Extent3].[Name] AS [Name],
FROM [A] AS [Extent1]
INNER JOIN [B] AS [Extent2] ON ...
INNER JOIN [C] AS [Extent3] ON ...
WHERE (N'BlahBlah' = [Extent1].[Status]) AND (N'Blah' = [Extent2].[Status]) AND (N'BLBlah' = [Extent3].[Status])

Load needed data in memory
If you cannot handle your request with a good query, load your data first! A good example for this case is when you need to compare something with the result of something out of your DB, like when you need to call a service.
Like for the first example, you can say:
var Cs = context.Cs.Where(c=>c.Stauts="Blah").ToList();
Var Bs = context.Bs..Where(b=>b.Status=="Blah").ToList();

Now use these 2 lists inside your foreach and compare them with their Ids.


The Architect

ORMs simply map tables to related objects in memory. But, there are 2 ways of  working with relations. Lazy loading (which is the default in most ORMS) and Eager Loading.
Lazy loading simply means that ORM will wait for you to ask for something, and then it will load the data. For instance :
var a = context.As.First();
This will only load 1 A object from the memory and nothing more.
Now if you write:
Var bs = a.Bs.ToList()
Your ORM will send another request to fetch the Bs.
This is exactly what most codes needs. But in some cases, we know that the a is not usable without their Bs. So the architect can decide to use Eager loading for that relation. So when you say
var a = context.As.First();

Your ORM will retries your A and all Bs that are related to it.

*The Eager loading, is not a good solution for 90% of the times. It depends to nature of your data! So don't use it perfunctory.





Friday, May 27, 2016

Software Architecture Tips- Don't put logs in your production DB


Sometimes ago, I had some discussions with one of my friends regarding having the logs inside DB.
While it might be interesting for many people, it is really a bad idea.

Why people decide to have their logs on DB

First thing, first :) we have to see what is good about having your logs inside a DB. Most people claim that when you have multiple servers (like web-servers) it is hard to collect your logs and combine them in order to understand what goes wrong.
That is a true point, but there are many other ways to do that, without affecting your important resources.

What is the disadvantage?

* First, you are sacrificing the most valuable resource in your system to keep some stupid logs to use in the future! You just need to have them somewhere for future reference! That is all! No realtime access, no need to have indexing or etc.

* Secondly, you might loose your logs, when the network goes down, or the DB goes down, you will not have a clue about what had happened in your code.

* Then: You will stop logging the other important parts in your code, since you will think of the resources you are using. You will loose all Debugs and Info's because if you start logging them, your DB will struggle to return a simple select to you! So you will ignore many important logs that you will need in the future.

What is the log in the view of Software Architecture

Logs are simply some data that have to be managed separately. They are not there to meet your systems functionalities. The only reason for having them, is to help you find out what is wrong with the system and fix it ASAP.

What to do then?

First, your logging system has to be implemented in a way that you can change the behavior easily. If you want to add debug logs or remove them you have to be able to do it. The thread that is handling your logs has to be different than the ones which are handling your systems process. No critical resources has to get busy because of logging.

But it is takes a lot of time!

There are lots of different logging libraries. One of the best ones are Log4net for .net applications. It has been implemented by Apache  and it is very easy to use.
It has an xml file that you can use to say what to do with a log, and what shall be the format of the output, etc. You can also say that I want my output to be handled in several different ways, and it has a separate thread so it won't affect your system.

Where shall I put my logs then?

Of course the first place to put your logs, is on the disc. It is available all the time (if not, your server will go down so you have no logs :) ), it is not a valuable resource and it is almost free since you are on the web-server. I suggest to have a file for all of your logs and another one only for Errors and Fatals, so you can see the errors easily.

But I can't check my logs everyday specially since I have several servers

As I mentioned, it is not a problem of your system. The log manager has to handle it and you can do it easily by confining your logger. There are many solutions for collecting logs. Sentry is one of the simplest. You can tell your logger to push all errors to your sentry account and then check them in a managed third party web-application easily. There are also some other systems that check your log files and update their status based on those files.

So if you have several applications that you have to take care of, you can see all of the process in a third party application somewhere else and you've just used 1 thread of your system, your web-applications disc and a small portion of your network.


Thursday, May 26, 2016

Programming Tips: Don't mix 2 tasks together


In my professional life, I've seen lots of people with different coding styles, but one of the most important issues that most programmers have is due to the fact that they mix different tasks together.

Rule of 30

You've probably heard of rule of 30 in clean code, but did you ever thought why a simple rule like this is so important?
The reason is simple: To make you create simpler methods.
So if you have a method that will do 10 things together, you will be forced to write 10 different methods and then call them

Why some people avoid it

Well, in my opinion 30 lines is enough for implementing a singe task, most of the times. And I reserve 1% for something that is an exception, but what about the other times?!
In my opinion, most of the times it is hard for people decide about separating different parts of a big task to several smaller tasks so they try to solve the whole problem in a big and ugly method.

Example

Any normal method that you write in daily basis can be a simple example for this mater. For instance, reading data from some text and filling your tables in DB. Very easy, right?

If you don't divide your big task, you will have a big method for doing everything and then you probably try to read every line and for each line you will try to save data into your DB. Right?
It is ugly code and I didn't even started! :)
OK, hopefully you have a DB with good structure, so you need to put your stuff in tables that they have some items it-selves.
....And most of the times your text data is not normalized data ( you wish! :) )
So you will start to reading data, line by line. storing common data into parent tables the first time and then ignoring them next times.

You see where it is going! right?! UGLY!UGLY! LAKH!

What should you do?!

You have a task to read data from text and store in in DB right?! So read the data first into structured objects (like DTO) and then save them into DB in another method. You have complex objects?! easy! create a method for reading each one and call them inside the parent!


What is the benefits? 
* It is easy to understand what you did. So easy to maintain and debug
* You have separated methods based on your functionalities. So you can reuse your code. 
* You managed your situation on each method based on what it has to do. So your code handles the situation better