Shoaib Follow I’m curious about how things work. Everything I share is part of that exploration to inspire smarter ways of thinking, building, and solving.

Handling Duplicate Data in Azure Data Flows

In this task, we are going to remove duplicates from a csv file present in a container.

alt text

Open data factory

1. Create linked service to blob storage

2. Create datasets for source and sink

3. Create new data flow

We can modify the data flow using a script as well. Click on script icon at the top right to write script to do the transformations.

We use distinct Rows transformation to remove the duplicates.

alt text

4. Create new pipeline and run it

Now, if we check the container, we have a new file.

alt text

Done !!!

20 Dec 2024

Azure
Azure Data Factory

#Azure
#Azure Data Factory
#Azure Data Flow

« Configuring Azure Data Flow Error Handling Storage Blob in Azure »

Handling Duplicate Data in Azure Data Flows

Open data factory

1. Create linked service to blob storage

2. Create datasets for source and sink

3. Create new data flow

4. Create new pipeline and run it

Explore →