Tuesday, October 23, 2018

Adding Data Lake Analytics to Data Factory

For working with Data Lake Analytics in Data Factory, follow the instructions below:
  1. Add a new Service principal authentication if you don't have one
    • In portal go to Active Directory >App registrations>+ New application registration> Provide a name and URL for the application

2- Take the Pricipal key and Id for the user:
Id: Active Directory>App registrations>Application ID
Key: Active Directory>App registrations>Settings>Keys--> add new one
3- Add the key to the Key vault and take the name for it
Go to Key vault account related to the resource group that you are in and add a secret to it
4- Give permission to both the Data Lake Analytics and the Data Lake Store it uses.
Grant service principal permission to your Azure Data Lake Anatlyics using the Add User Wizard
Giving access to Data Lake Analytics
In the Azure portal, go to your Data Lake Analytics account.
  • On the left, under Getting Started, click Add User Wizard.
  • Select a role ( we can use contributer), and then click Select.
  • Paste the principal Id from the step 2 in the search field. it will find the principal (user) name
  • Select the access control lists (ACLs) for the U-SQL databases. When you're satisfied with your choices, click Select.
  • Select the ACLs for files. For the default store, don't change the ACLs for the root folder "/" and for the /system folder. Click Select.
  • Review all your selected changes, and then click Run.
  • When the wizard is finished, click Done.
Giving access to  Data Lake Store
In the Azure portal, go to your Data Lake Store account.
  • Access control (IAM)
  • Click add
  • Select the role (CONTRIBUTOR)
  • Paste Pricipal ID in the search field to find the principal (user)
5- Add a Linked service in Data Factory
Go to the Data Factory> Connections> + New
  • From Compute tab select >Azure Data Lake Analytics
  • Give it a good name and feel the values for tenant and service
  • Set the principal Id from step 2
  • select Key Vault option and feel the secret name
  • click on test connection to see that Azure data factory can get connected to the Data Lake Analytics

Accessing resources in Azure


The old way of accessing resources like blob storage was connection string. Later services have been added to Azure like Key Vault in order let people store keys, secrets and certificates in order to let developers do their job while protecting the production environments from the rouge ones :).

That being said, there is new ways to give access to platforms so they can access each other. For example in Azure Data Lake Analytics, you might want to access Azure Data Lake Store, Blob storage and Data Factory and to do that you need to give it access for each of them.

Where to find values that we need?


For the data factory, if we were about to follow the same lead, meaning that we needed a service principal Id and key that can be created using active directory, but it turns out that we don't need to as Azure have a newer concept called "managed identity" which has already been created when we have created the Data factory using Portal. We just need to make sure that the data factory identity have access to the DataLakeStor.
So what we need to set in the parameters is only the subscription and resource group in the release pipeline.

* We could also use managed identities to connect to Blob storage on Azure.

You can read more about managed identities for Azure here

Data Lake Analytics

In any case if we need to have a service principal (like for the Data lake analytics), we can just create an app in active directory and make sure that app have access to the resources we need. Then we can read the Id and key as below:

Id:     Active Directory>App registrations>Application ID
Key: Active Directory>App registrations>Settings>Keys--> add new one

read more about how to do it in here:

Thursday, April 5, 2018

Deleting old files on Azure web app (app service)


It is going to be a short post :)
In Azure we usually use application insights or something similar to keep track of your logs. If for some reason you like to keep everything also in log files, for example in App_data you will have a lot of files in no time. Usually the size is not that much matter since you have 100 GB on your disk, but it is also annoying that you have to look into thousands of files (usually 1 file per day per scaled instance).

Where are my files

You probably know it already since you are trying to delete them :) but for people who have searched for something else, lets say your app service is in test.azurewebsites.net/ you can access the disk using test.scm.azurewebsites.net/ 

Then you can click on "Debug console" from top menu and click on CMD to see all folders and files. To visit the place that your files have been hosted just go to Site> wwwroot

The simple solution

One simple solution is to delete the old files that you know you are not going to need. For example you might have a policy to say you don't care about what has happen more than 30 days ago. 


Well, basically it is Windows so you can simply use command prompt's commands to do stuff. 
Use this command to see the list of the files older than 30 days
forfiles -s -m *.* -d -30 -c "cmd /c eco @file"

If you were happy with the results, change eco to del to delete those files:
forfiles -s -m *.* -d -30 -c "cmd /c del @file"

Wednesday, November 22, 2017

Umbraco for beginners: How to see versions of a property on a page (back in history)

All CMS systems log data versions and the person who has changed them. Umbraco is of course is  not an exception regarding this matter :)
But how can we check and see how has done the change?

There are probably some add-ins for that, but what if you don't want to let other people see old data? What if you don't have time, or not allowed to install a package on the production server?

It is pretty simple :). you just need to query on the Database

Wrong text back in history

Lets say your client sends you a bug report and when you check it you see that a text field is not correct. For the sake of example, lets say you have several email subjects in your CMS and you have a bug report saying someone has received "Subject 2" instead of  "Subject 1". What will you do?

Of course first action is to check CMS and then checking code, but what if everything is right? Isn't it possible that for a period of time an editor changed it to the wrong value and after a while corrected it? How would you check that?

A simple query like this

SELECT TOP (1000) 
    pt.Alias ,pt.Name,  cpd.dataNvarchar, cd.updateDate, u.username
--,pt.*, cpd.*, cd.*

  FROM [dbo].[cmsPropertyData] cpd
  inner join [dbo].[cmsDocument] cd on cpd.[versionId] = cd.[versionId]
  inner join [dbo].[cmsPropertyType] pt on pt.id = cpd.propertytypeid
  inner join [dbo].[umbracoUser] u on u.id =  cd.documentuser
  datanvarchar like '%Subject 2%'
With this query you will find all property names in the history that "%Subject 2%" has been assign to them with date and the one who is responsible. Also you can filter on any other fileds like nodeId (page id) and also you can check other data types (dataInt, dataDecimal, dataDate, dataNvarchar, dataNtext)

Friday, November 10, 2017

Caching partials in Umbraco 7.7

Umbraco has a nice way of implementing output caching in the views. It is exactly the same ass calling a partial, but you can say how long you want that partial to be cached and how should Umbraco understand if it should return cached data or build it again.
You could find the documentation in here.

Quick examples:

one can use CachedPartial without specifying anything else which means that you want the same partial showing every time no matter where it is.
For instance if you want to show footer or menu of your site, it is pretty easy to have them in a partial and then just call Html.CachedPartial("_menu",....) on the master page/ layout page.

  @Html.CachedPartial("_partial", Model, chachtime)//caching for everypage
On the other hand you might want to show different views for each member. An example could be a module to show username or other user specified data.
  @Html.CachedPartial("_partial", Model, chachtime,cacheByMember:true)//caching based on the member
Then there are times that you want to have your data showing differently based on different pages that they are on. For instance you want to write meta data of each page, but you don't care which user is that.
  @Html.CachedPartial("_partial", Model, chachtime,cacheByPage:true)//caching based on pages
 We can also do combination of page and members. 
  @Html.CachedPartial("_partial", Model, chachtime,cacheByPage:true,cacheByMember:true)//caching based on pages and members

But what if you want to cache your partial based on something else? What if you have some querystrings that contains some items that you want to differentiate based on that?
Lets say you have a small advertising site and for some reasons, you have your product Ids in the querystring. If you cache the page by page, it will just show everyone the first product that someone has browsed for the cache period which is not good.
Cachebymemebr is not also usable because you don't care if the person is logged in or different. You just care about the querystring that you have (productIds)
The way to do it is using contextualKeyBuilder like below: 

@Html.CachedPartial("_partial", Model, chachtime, 
contextualKeyBuilder:(o, dictionary) => 
CurrentPage.Id.ToString() + Request.QueryString.ToString())
You can specify the key based on your scenario or even pageview data.

Friday, October 20, 2017

Beginners Guide: Output caching in EPiServer


Output caching is not a new concept. There is a variety of project that people try to implement output caching, but how to do it in EpiServer?

What is Output caching?

The main purpose of using Output Caching is to dramatically improve the performance of an ASP.NET MVC Application. It enables us to cache the content returned by any controller method so that the same content does not need to be generated each time the same controller method is invoked. 

Why using output caching?

Output Caching has huge advantages, such as:
*Reduces the load on the server because there is no need to generate the results again
* Reduces the load on DB (same reason)
* Faster response

How it is done on MVC?

OutputCacheAttribute class has been implemented in System.Web.Mvc class and can easily being used simply by decorating the Controller with [OutputCache]. There are different variations for using it, which I am not going to discuss now, but you can read more in here.

What is the problem? Lets use it on EpiServer.

Although [OutputCache] attribute works fine for MVC applications, there are some functionalities that are specific to CMS systems. Let me elaborate more with an example:
- You have a page on your CMS and someone tries to view it for the first time. The MVC engine will try to generate the results and then save in in cache.
A second later, another person requests the same exact page. As you've guessed the MVC will receive the request and since it has it on its cache, it will respond with the same exact results.
- On the next step the editor change something on the page and publish it, but s(he) will see the same exact results as before since MVC doen't know about the change and will still return the same response. This also happens for the next people whom request the same page.

So to summarize we have 2 big problem:
1- content should not be cached for the editors
2- After publishing a page the old cache should be invalidate and the whole process should go through

How to fix it?

To fix the problems by yourself you need a lot of knowledge about EPiServer and you need some time to implement it. Luckily EPiServer has implemented their own ContentOutputCacheAttribute that will handle those for you :)

Simply add ContentOutputCache on top of your action 

 public class MyPageController : PageController<PageType>

   [ContentOutputCache(Duration = 3600, VaryByCustom = "*")]

   public ActionResult Index(StartPage currentPage)

Handle your GetVaryByCustomString in your Global.asax and you are good to go

public class EPiServerApplication : EPiServer.Global

 public override string GetVaryByCustomString(HttpContext context, string custom)

What to put into the GetVaryByCustomString depends on your setup, but one simple example could be returning the AbsoluteUri of the page. This way, different pages will differ, language versions will be handled automatically (since the AbsoluteUri will be different regardless of your routing config) but be aware that the personalization will not work since the AbsoluteUri is the same for all different users.

 public override string GetVaryByCustomString(HttpContext context, string custom)
        return context.Request.Url.AbsoluteUri;

****Very Important!!!!

ContentOutputCache will not work, unless you have the httpCacheExpiration in your web.config.
Simply go to your webconfig>
configuration>episerver>applicationSettings and make sure that it has properties for httpCacheability and httpCacheExpiration.

It should be something like this:

    <applicationSettings httpCacheability="Public" httpCacheExpiration="0:10:00" .... />

Monday, September 11, 2017

Clean code trap for Entity framework

When writing a code, it is pretty normal to extract a part of your code to a new method in order to have a cleaner and human readable code.
Take this example:

PostTable (1) ===> (n) Votes

You have a post table that contains a lot of columns and you also like to retrieve total number of votes for that post.
A simple query would be:

public static Dto Convert(Post p)
 return new Dto{
  Title= p.Title, 
  Other fields
  VoteCount= p.Votes.Count

Simple. Right?
But then your query will be translated as a query to take Post object and 1 query for each of the posts to read its votes and it is not the worst part! it will fetch all the votes instead of counting them:
exec sp_executesql N'SELECT 
    [Extent1].[Id] AS [Id], 
    [Extent1].[PostId] AS [PostId], 
    FROM [dbo].[Votes] AS [Extent1]
    WHERE [Extent1].[PostId] = @EntityKeyValue1',N'@EntityKeyValue1 int',@EntityKeyValue1=6

Then you might say, you can Include your vote table in your query. Like this:

But then it will be translated into a big join

    [Project1].[Id] AS [Id], 
    [Project1].[Title] AS [Title], 
    [Project1].[C1] AS [C1], 
    [Project1].[Id1] AS [Id1], 
    [Project1].[Pid] AS [Pid], 
    [Project1].[PostId] AS [PostId], 
        [Extent1].[Id] AS [Id], 
        [Extent1].[Title] AS [Title], 
        [Extent2].[Id] AS [Id1], 
        [Extent2].[PostId] AS [PostId], 
        CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
        FROM  [dbo].[Posts] AS [Extent1]
        LEFT OUTER JOIN [dbo].[Votes] AS [Extent2] ON [Extent1].[Id] = [Extent2].[PostId]
    )  AS [Project1]
    ORDER BY [Project1].[Id] ASC, [Project1].[C1] ASC
So what is wrong?!!

As you may already have guessed it. Do not put the select into a separated method. call your query simply like this:

new PostlDto{Title= p.Title, VoteCount= p.Votes.Count, ....})