Tuesday, October 23, 2018

Adding Data Lake Analytics to Data Factory

For working with Data Lake Analytics in Data Factory, follow the instructions below:
  1. Add a new Service principal authentication if you don't have one
    • In portal go to Active Directory >App registrations>+ New application registration> Provide a name and URL for the application

2- Take the Pricipal key and Id for the user:
Id: Active Directory>App registrations>Application ID
Key: Active Directory>App registrations>Settings>Keys--> add new one
3- Add the key to the Key vault and take the name for it
Go to Key vault account related to the resource group that you are in and add a secret to it
4- Give permission to both the Data Lake Analytics and the Data Lake Store it uses.
Grant service principal permission to your Azure Data Lake Anatlyics using the Add User Wizard
Giving access to Data Lake Analytics
In the Azure portal, go to your Data Lake Analytics account.
  • On the left, under Getting Started, click Add User Wizard.
  • Select a role ( we can use contributer), and then click Select.
  • Paste the principal Id from the step 2 in the search field. it will find the principal (user) name
  • Select the access control lists (ACLs) for the U-SQL databases. When you're satisfied with your choices, click Select.
  • Select the ACLs for files. For the default store, don't change the ACLs for the root folder "/" and for the /system folder. Click Select.
  • Review all your selected changes, and then click Run.
  • When the wizard is finished, click Done.
Giving access to  Data Lake Store
In the Azure portal, go to your Data Lake Store account.
  • Access control (IAM)
  • Click add
  • Select the role (CONTRIBUTOR)
  • Paste Pricipal ID in the search field to find the principal (user)
5- Add a Linked service in Data Factory
Go to the Data Factory> Connections> + New
  • From Compute tab select >Azure Data Lake Analytics
  • Give it a good name and feel the values for tenant and service
  • Set the principal Id from step 2
  • select Key Vault option and feel the secret name
  • click on test connection to see that Azure data factory can get connected to the Data Lake Analytics

Accessing resources in Azure


The old way of accessing resources like blob storage was connection string. Later services have been added to Azure like Key Vault in order let people store keys, secrets and certificates in order to let developers do their job while protecting the production environments from the rouge ones :).

That being said, there is new ways to give access to platforms so they can access each other. For example in Azure Data Lake Analytics, you might want to access Azure Data Lake Store, Blob storage and Data Factory and to do that you need to give it access for each of them.

Where to find values that we need?


For the data factory, if we were about to follow the same lead, meaning that we needed a service principal Id and key that can be created using active directory, but it turns out that we don't need to as Azure have a newer concept called "managed identity" which has already been created when we have created the Data factory using Portal. We just need to make sure that the data factory identity have access to the DataLakeStor.
So what we need to set in the parameters is only the subscription and resource group in the release pipeline.

* We could also use managed identities to connect to Blob storage on Azure.

You can read more about managed identities for Azure here

Data Lake Analytics

In any case if we need to have a service principal (like for the Data lake analytics), we can just create an app in active directory and make sure that app have access to the resources we need. Then we can read the Id and key as below:

Id:     Active Directory>App registrations>Application ID
Key: Active Directory>App registrations>Settings>Keys--> add new one

read more about how to do it in here:

Thursday, April 5, 2018

Deleting old files on Azure web app (app service)


It is going to be a short post :)
In Azure we usually use application insights or something similar to keep track of your logs. If for some reason you like to keep everything also in log files, for example in App_data you will have a lot of files in no time. Usually the size is not that much matter since you have 100 GB on your disk, but it is also annoying that you have to look into thousands of files (usually 1 file per day per scaled instance).

Where are my files

You probably know it already since you are trying to delete them :) but for people who have searched for something else, lets say your app service is in test.azurewebsites.net/ you can access the disk using test.scm.azurewebsites.net/ 

Then you can click on "Debug console" from top menu and click on CMD to see all folders and files. To visit the place that your files have been hosted just go to Site> wwwroot

The simple solution

One simple solution is to delete the old files that you know you are not going to need. For example you might have a policy to say you don't care about what has happen more than 30 days ago. 


Well, basically it is Windows so you can simply use command prompt's commands to do stuff. 
Use this command to see the list of the files older than 30 days
forfiles -s -m *.* -d -30 -c "cmd /c eco @file"

If you were happy with the results, change eco to del to delete those files:
forfiles -s -m *.* -d -30 -c "cmd /c del @file"