Inhaltsverzeichnis
Anyone who wants to merge, transform and orchestrate data from different sources in the cloud will sooner or later come across the Azure Data Factory - Microsoft's managed cloud data integration service. But what is behind it, how exactly does ADF work and for whom does it make sense to use it?
In this article, we explain the core concepts, show typical use cases and shed light on how Azure Data Factory compares to alternatives such as AWS Glue or Apache Airflow - and what role it plays in the Microsoft fabric strategy.
What is Azure Data Factory?
Azure Data Factory (ADF) is a fully managed, serverless cloud service from Microsoft for large-scale data integration. The service makes it possible to design, orchestrate and monitor ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes - without having to operate your own server infrastructure.
In short: Azure Data Factory is the answer to the question of how companies can merge, prepare and transfer structured and unstructured data from dozens of different sources - from on-premises databases to SaaS applications and streaming services - into analytics platforms or data lakes.
ADF has been generally available since 2015 and has become one of the most widely used data integration services in the Microsoft Azure world. According to current market data, more than 5,500 companies worldwide now rely on Azure Data Factory - with a market share of around 6.7% in the data storage and integration tools segment.
![]()
The core components of Azure Data Factory
Azure Data Factory is based on six central concepts that work together to define and execute data pipelines:
1. pipelines
Pipelines are the logical bracket for a group of activities that jointly fulfill a task. They define the sequence or parallelization of processing steps - for example: reading data from an SQL database, transforming it and writing it to an Azure Blob Storage.
2. activities (activities)
Activities are the specific processing steps within a pipeline. ADF distinguishes between three types:
- Data movement activities - e.g. the copy activity for transferring data between source and target
- Data transformation activities - e.g. mapping data flows, Spark activities or Azure Machine Learning activities
- Control flow activities - e.g. conditional branches, loops or waiting for external events
3. datasets
Datasets describe the data structures in the connected data stores - they show where and in what format data is available. A dataset always refers to a linked service and describes the actual data resource (e.g. a specific table or file).
4. linked services
Linked services are the connection strings to external resources - comparable to connection configurations in classic ETL tools. They can refer to data storage (SQL Server, Oracle, Blob Storage, Salesforce, etc.) or to computing resources (HDInsight, Azure Databricks).
5. integration runtime (IR)
The Integration Runtime is the execution environment for pipelines. ADF offers three variants:
- Azure IR - fully managed by Microsoft, for cloud-to-cloud scenarios
- Self-hosted IR - runs on its own infrastructure, for on-premises connections
- Azure-SSIS IR - for rehosting existing SSIS packages in the cloud
6. trigger
Triggers control the execution time of pipelines. Time-based schedule triggers (cron-like), tumbling window triggers for fixed time windows and event-based triggers that react to storage events or user-defined events are supported.
![]()
Important functions at a glance
Over 90 ready-made connectors
Azure Data Factory offers more than 90 built-in, maintenance-free connectors at no extra charge. These include connections to big data sources such as Amazon S3 or HDFS, enterprise database platforms such as Oracle Exadata or Teradata, SaaS applications such as Salesforce or ServiceNow and all native Azure services.
Visual pipeline designer
The browser-based authoring interface makes it possible to design pipelines using drag-and-drop - without any programming knowledge. Ready-made templates for common ETL/ELT patterns speed up the process of getting started. An integrated debug mode allows interactive testing directly in the Designer.
Mapping data flows
Mapping Data Flows are visually designed data transformations that are executed in the background on managed Spark clusters - without the need for Spark knowledge. Joins, aggregations, pivoting, conditional splits and derived columns are supported. Metadata, column counts and data types can be viewed at any time via the Inspect tab.
ETL and ELT
ADF supports both integration patterns: In classic ETL, data is transformed before loading. With the ELT approach - the optimal variant for modern cloud data warehouses such as Azure Synapse Analytics or Microsoft Fabric - data is first loaded raw and only transformed in the target system, which makes optimum use of the target's computing capacities.
CI/CD integration and Git support
Azure Data Factory natively supports Azure DevOps and GitHub. Pipelines can be versioned, developed in feature branches and transferred to production environments using proven CI/CD processes.
Typical use cases for Azure Data Factory
Data migration to the cloud
ADF is the standard tool for migrating large amounts of data from on-premises systems or other cloud platforms to Azure. Typical scenarios include the migration of big data workloads from Amazon S3 or HDFS as well as EDW migration from Oracle Exadata, Netezza, Teradata or Amazon Redshift - even on a petabyte scale with minimal downtime requirements.
Data lake filling and consolidation
Many companies use ADF to automatically transfer data from a wide variety of sources - IoT devices, cloud services, on-premises systems, streaming sources - to a central data lake. Partitioning and integrated Azure Data Catalog Management can be used to improve the findability of data in a targeted manner.
Cloud analytics and business intelligence
ADF serves as a data pipeline layer for analytical platforms: Operational systems deliver raw data, which ADF prepares and transfers to Azure Data Lake Storage or a data warehouse - where data scientists and analysts can continue their work. The close integration with Power BI enables up-to-date dashboards based on this data.
Hybrid and multi-cloud scenarios
With the self-hosted integration runtime, on-premises databases can be securely connected without having to release public endpoints. The wide range of connectors also enables genuine multi-cloud integrations between AWS, GCP and Azure.
Azure Data Factory vs. alternatives
Azure Data Factory is not the only cloud ETL tool on the market. An overview of the most important alternatives:
|
Tool |
Type |
Special features & strengths |
|
Azure Data Factory |
Managed Cloud (Azure) |
90+ connectors, visual designer, hybrid scenarios, deeply integrated into Azure ecosystem, low-code approach |
|
AWS Glue |
Managed Cloud (AWS) |
Serverless, Spark-based, automatic schema discovery, ideal for pure AWS environments, code-first approach (Python/Scala) |
|
Google Cloud Dataflow |
Managed Cloud (GCP) |
Apache Beam-based, strong in real-time streaming, portable pipelines (Java, Python, Go), ideal for real-time scenarios |
|
Talend |
Platform-independent |
Over 1,000 connectors, graphical interface, accessible without programming knowledge, broad SaaS/DB/Big Data ecosystem |
|
Apache Airflow |
Open source |
Python-based, maximum flexibility and customizability, community-driven, ideal for teams with strong developer resources |
ADF scores particularly well where companies already rely on the Azure ecosystem, require hybrid on-premises/cloud scenarios and prefer a low-code approach. However, those who require complete control at code level and cloud independence should also consider Airflow or cloud-native alternatives from other providers.
Azure Data Factory and Microsoft Fabric
Microsoft has introduced Microsoft Fabric, a new, comprehensive analytics platform that combines the data integration capabilities of Azure Data Factory with a modern SaaS interface and AI integration. The Data Factory experience in Microsoft Fabric is considered the next generation of ADF.
What does this mean in concrete terms?
- Fabric Data Factory offers over 170 connectors (compared to 90+ in ADF) and combines the Power Query ease of use with the scalability of ADF
- AI-powered features such as Copilot for Data Factory allow pipelines to be created and edited using natural language
- Data is natively stored in OneLake - the central data lake of Microsoft Fabric - which structurally dissolves data isolation between services
- Microsoft officially recommends starting new projects via Microsoft Fabric instead of Azure Data Factory
|
Note on the roadmap Azure Data Factory remains a fully supported service and will continue to receive updates Existing ADF workloads can be gradually migrated to Microsoft Fabric. For new data integration projects in the Azure ecosystem, Microsoft today recommends starting with the Data Factory experience in Microsoft Fabric. |
Costs and license model
Azure Data Factory follows a usage-based billing model - there are no fixed costs for standing infrastructure. The costs are made up of several components:
- Activity Runs: Billing per pipeline execution and activity call
- Data Integration Units (DIU): Computing unit for Copy Activities on the Azure Integration Runtime - combines CPU, memory and network resources. Standard configuration: 4 DIU (configurable from 2 to 256). Price: approx. 0.25 USD per DIU hour
- vCore hours: For compute-intensive transformations via mapping data flows
- Monitoring & Management: Billing for monitoring and management operations
The serverless approach means that you only pay for the resources you actually use. Microsoft's official price calculator for Azure Data Factory helps to estimate the costs in advance based on specific workload parameters.
Security and governance
Azure Data Factory offers a range of security features designed for enterprise deployments:
Managed Virtual Network
The Azure Integration Runtime can be operated in a managed virtual network. All network connections run exclusively via the Microsoft backbone - data traffic never leaves the public Internet. This protects against data exfiltration and simplifies compliance requirements.
Private endpoints
Managed private endpoints enable secure connections to supported data stores and Azure services without public network exposure. Azure Key Vault - the recommended mechanism for managing connection secrets - can also be connected via a private endpoint.
Access control and audit
ADF is fully integrated into Microsoft Entra ID (formerly Azure Active Directory). Granular authorizations for pipeline authors, operators and readers can be defined via role-based access control (RBAC). Audit logs seamlessly document all executions and configuration changes.
Azure Purview integration
Azure Data Factory can be connected to Microsoft Purview for company-wide data governance. Purview automatically records data lineage for all ADF pipelines - a key requirement in regulated industries.
![]()
Conclusion: When is Azure Data Factory worthwhile?
Azure Data Factory is a mature, production-proven choice for companies that want to orchestrate data from heterogeneous sources in the cloud - especially if they already rely on the Azure ecosystem. The combination of a visual development environment, broad connector selection, hybrid connectivity and serverless billing makes ADF one of the most versatile data integration services on the market.
ADF is particularly useful when...
- ...data from on-premises systems and cloud services is to be merged
- ...a low-code approach without in-depth Spark or Python knowledge is desired
- ...existing SSIS packages are to be lifted into the cloud
- ...Azure Synapse Analytics, Power BI or Azure Machine Learning are used as target systems
- ...there are high security and compliance requirements
For new projects that want to use the full breadth of the Microsoft Analytics world right from the start, it is worth taking a look at the Data Factory experience in Microsoft Fabric: it is based on the same concepts, but offers a more modern interface, more connectors and deep AI integration.
Would you like to know whether Azure Data Factory is the right foundation for your data architecture - and how you can get started? Get in touch with us.