As an ETL developer guy, data cleansing is the first step of the processing any data into your system and identify the duplicates just comes after this, where you have to eliminate these records from the processing job. So here, we will learn How to remove duplicate records from a file by using tUniqRow component is Talend Open Studio.
There are multiple ways to remove duplicate records from a raw data files or data tables. Such as -
To Learn more, please visit our YouTube channel at -
http://www.youtube.com/c/Sql-datatools
To Learn more, please visit our Instagram account at -
https://www.instagram.com/asp.mukesh/
To Learn more, please visit our twitter account at -
https://twitter.com/macxima
- We can eliminate the duplicate rows by using tUniqRow component is Talend Open Studio. (Excluding original)
- Remove all duplicate rows from flow (including original). An efficient and clean way is to use tAggregateRow component to count key column, join to input again by tMap component and then filter all row have more than 1.
The main and recommended benefits from tUniqRow component is that it also gives a unique record from the duplicates, means you have a unique record from each set of the data.
To build this job, you need the following processing components -
tFileInputDelimated: We can use this component to read a file and
separate fields contained in this file using a defined separator. It allows you
to create a data flow.
tUniqRow: This component is
very useful to maintain the data quality because it compares entries and sorts
out duplicate entries from the input flow and ensures data quality of input or
output flow in a Job. This component handles flow of data therefore it
requires input and output, hence is defined as an intermediary step.
tLogRow: This component is used to monitor data processed and displays data or
results in the Run console. This component can be used as intermediate step
in a data flow or as a n end object in the Job flowchart.
To see a demo video, please visit our YouTube channel
To Learn more, please visit our YouTube channel at -
http://www.youtube.com/c/Sql-datatools
To Learn more, please visit our Instagram account at -
https://www.instagram.com/asp.mukesh/
To Learn more, please visit our twitter account at -
https://twitter.com/macxima