Open In App

Microsoft Azure – Introduction to Azure Data Factory

Improve
Improve
Like Article
Like
Save
Share
Report

Azure data factory as commonly known as ADF is a ETL(Extract-Transform- load ) Tool to integrate data from various sources of various formats and sizes together, in other words, It is a fully managed, server less data integration solution for ingesting, preparing, and transforming all your data at scale.The pipelines of Azure data factory are used to transfer the data from the on-premises to the cloud with in the certain period of intervals.

What Is Azure Data Factory(ADF)?

Azure data factory will helps you to automate and manage the workflow of data which is being transferred from on-premises and cloud-based data sources and destinations.Azure data factory manages the pipelines of the data-driven work flows. The Azure data factory stands out when compared to other ETL tools because of features such as Easy to Use, Cost-Effective solution , Powerful and intelligent code free service.

As the data is increasing day by day around the world many enterprises and businesses are shifting towards the usage of cloud-based technology to make their business scalable. Because of the increase in cloud adaption, there is a need for reliable ETL tools in the cloud to make the integration.

Azure Data factory(ADF) Features

  1. Data flows: Data flows uses Apache spark to transfer data from the source to destination. Data flow is an code-free way to transform data you can just drag and drop the source and destination of the data to flow it will create an complex pipelines to transfer the data.
  2. Pipelines: Pipelines plays major role in the data transfer it will orchestrate data movement and transformation processes. Pipelines and can triggered by the events occurred or we can schedule based up on the time intervals.
  3. Data Sets: Datasets are simply points or reference the data, which we want to use in our activities as input or output.
  4. Activity: Activities in a pipeline define actions to perform on data. For example, copy data activity can read from one location of Blob storage and loads it to another location on Blob storage.
  5. Integration Runtime: The Integration Runtime(IR) is to compute infrastructure used by ADF to provide capabilities such as  Data Flow, Data Movement, Activity Dispatch, and SSIS Package Execution across different network environments.
  6. Linked Services: Linked services are used to connect to other sources with the Azure data factory. Linked services act as connection strings for resources to connect.

Azure Data Factory (ADF) Benefits

  1. Scalability and Flexibility: Azure data factory is scalability in the nature because the data which is being transferred from on-premises and cloud-based data sources and destinations is unpredictable some times the volume of the data will be high some time and it may be also less some times to meet this requirements Azure data factory is scalable in nature.
  2. Hybrid data integration: The data which is managed by the on-premises and cloud-based sources can be managed by the Azure data factory.
  3. Data Orchestration: Azure data factory will helps us to manage large amount of data in a centerlized manner which makes it easy to maintain the data.
  4. Intergration with Azure services: A few of the Azure services that work closely with Azure Data Factory include Azure Synapse Analytics, Azure Databricks, and Azure Blob Storage. This makes it simple to create and manage data pipelines that utilise a variety of services.

Azure Data Factory(ADF) Architecture

The figure below describes the Architecture of the data engineering flow using the Azure data factory

The data flow starts form the The source data can be from a variety of sources, such as on-premises databases, cloud storage services, and SaaS applications.

Azure Data Factory

After the data destination the data is being transferred to the staging area where data is stored for the temporary purpose where the data will arranged in the manner which can be arranged for further processing. After the data is processed it will be with the help of data flows.

  1. Integration run time: It will executes the pipelines which are hosted on-premises or in the cloud.
  2. Linked service: It will connect the data source and destination.
  3. Dataset: A dataset represents the data that is being processed by a pipeline.
  4. Pipelines: A pipeline is a sequence of activities that are executed in order to process data.

Azure data factory will transfer the data from the on-premises data centre to the cloud which is required. For example a company needs to analyzie the data using Azure Synapse Analytics.which has to be done on daily bases so company will creates the three step producer to achieve this by using Azure data factory pipeline.

  1. Copy the data from the on-premises database to a staging area in Azure Blob Storage.
  2. Data flow activity will transfer the data in the staging area.
  3. A copy activity to copy the transformed data from the staging area to the data warehouse in Azure Synapse Analytics.

The pipeline will set on daily basis to be triggered when ever the pipeline gets triggered the data will be transferred from the on-premises to the cloud destination.

Azure Data Factory(ADF) Pricing

Data Pipelines: Helps to Integrate data from cloud and hybrid data sources, at scale.  – Pricing starts from ₹72.046 / 1,000 activity runs per month.

SQL Server Integration Services:  Helps to easily move your existing on-premises SQL Server Integration Services projects to a fully-managed environment in the cloud. Pricing for SQL Server Integration Services integration runtime nodes start from ₹60.498 /hour.

  1. No upfront cost
  2. No termination fees
  3. Pay only for what you use

FAQs On Azure Data Factory (ADF)

1. Difference Between Azure Data Factory and Azure Data-bricks

  1. Azure Data Factory: Azure data factory is mainly used to the manage the data-driven workflows (pipelines) which will transfer the data from data-driven workflows (pipelines).
  2. Azure Data-bricks: Azure Data-bricks will allows data scientists and data engineers to work together on the data which built on Apache spark.

2. Difference Between Azure Data Factory and Azure Data Factory Pipelines

  1. Azure Data Factory: Azure data factory is mainly used to the manage the data-driven workflows (pipelines) which will transfer the data from data-driven workflows (pipelines).
  2. Azure Data Factory Pipelines: Azure Data Factory pipelines are the individual data processing workflows that are created and managed within Azure Data Factory.

3. Difference Between Azure Data Factory and Azure Data Lake

  1. Azure Data Factory: Azure data factory is mainly used to the manage the data-driven workflows (pipelines) which will transfer the data from data-driven workflows (pipelines).
  2. Azure Data Lake: Large amounts of data in many formats, such as structured, semi-structured, and unstructured data, can be stored using the highly scalable and secure Azure Data Lake storage service.



Last Updated : 07 Nov, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads