Tuesday, October 23, 2018

Adding Data Lake Analytics to Data Factory

For working with Data Lake Analytics in Data Factory, follow the instructions below:
  1. Add a new Service principal authentication if you don't have one
    • In portal go to Active Directory >App registrations>+ New application registration> Provide a name and URL for the application

2- Take the Pricipal key and Id for the user:
Id: Active Directory>App registrations>Application ID
Key: Active Directory>App registrations>Settings>Keys--> add new one
3- Add the key to the Key vault and take the name for it
Go to Key vault account related to the resource group that you are in and add a secret to it
4- Give permission to both the Data Lake Analytics and the Data Lake Store it uses.
Grant service principal permission to your Azure Data Lake Anatlyics using the Add User Wizard
Giving access to Data Lake Analytics
In the Azure portal, go to your Data Lake Analytics account.
  • On the left, under Getting Started, click Add User Wizard.
  • Select a role ( we can use contributer), and then click Select.
  • Paste the principal Id from the step 2 in the search field. it will find the principal (user) name
  • Select the access control lists (ACLs) for the U-SQL databases. When you're satisfied with your choices, click Select.
  • Select the ACLs for files. For the default store, don't change the ACLs for the root folder "/" and for the /system folder. Click Select.
  • Review all your selected changes, and then click Run.
  • When the wizard is finished, click Done.
Giving access to  Data Lake Store
In the Azure portal, go to your Data Lake Store account.
  • Access control (IAM)
  • Click add
  • Select the role (CONTRIBUTOR)
  • Paste Pricipal ID in the search field to find the principal (user)
5- Add a Linked service in Data Factory
Go to the Data Factory> Connections> + New
  • From Compute tab select >Azure Data Lake Analytics
  • Give it a good name and feel the values for tenant and service
  • Set the principal Id from step 2
  • select Key Vault option and feel the secret name
  • click on test connection to see that Azure data factory can get connected to the Data Lake Analytics

Accessing resources in Azure


The old way of accessing resources like blob storage was connection string. Later services have been added to Azure like Key Vault in order let people store keys, secrets and certificates in order to let developers do their job while protecting the production environments from the rouge ones :).

That being said, there is new ways to give access to platforms so they can access each other. For example in Azure Data Lake Analytics, you might want to access Azure Data Lake Store, Blob storage and Data Factory and to do that you need to give it access for each of them.

Where to find values that we need?


For the data factory, if we were about to follow the same lead, meaning that we needed a service principal Id and key that can be created using active directory, but it turns out that we don't need to as Azure have a newer concept called "managed identity" which has already been created when we have created the Data factory using Portal. We just need to make sure that the data factory identity have access to the DataLakeStor.
So what we need to set in the parameters is only the subscription and resource group in the release pipeline.

* We could also use managed identities to connect to Blob storage on Azure.

You can read more about managed identities for Azure here

Data Lake Analytics

In any case if we need to have a service principal (like for the Data lake analytics), we can just create an app in active directory and make sure that app have access to the resources we need. Then we can read the Id and key as below:

Id:     Active Directory>App registrations>Application ID
Key: Active Directory>App registrations>Settings>Keys--> add new one

read more about how to do it in here: