Sunday, December 22, 2019

Talend ETL - Remove duplicate values

As an ETL developer guy, data cleansing is the first step of the processing any data into your system and identify the duplicates just comes after this, where you have to eliminate these records from the processing job. So here, we will learn How to remove duplicate records from a file by using tUniqRow component is Talend Open Studio.

There are multiple ways to remove duplicate records from a raw data files or data tables. Such as -

  1. We can eliminate the duplicate rows by using tUniqRow component is Talend Open Studio. (Excluding original)
  2. Remove all duplicate rows from flow (including original). An efficient and clean way is to use tAggregateRow component to count key column, join to input again by tMap component and then filter all row have more than 1.
The main and recommended benefits from tUniqRow component is that it also gives a unique record from the duplicates, means you have a unique record from each set of the data.

To build this job, you need the following processing components -

tFileInputDelimated: We can use this component to read a file and separate fields contained in this file using a defined separator. It allows you to create a data flow.

tUniqRow: This component is very useful to maintain the data quality because it compares entries and sorts out duplicate entries from the input flow and ensures data quality of input or output flow in a Job. This component handles flow of data therefore it requires input and output, hence is defined as an intermediary step.

tLogRow: This component is used to monitor data processed and displays data or results in the Run console. This component can be used as intermediate step in a data flow or as a n end object in the Job flowchart.

To see a demo video, please visit our YouTube channel





To Learn more, please visit our YouTube channel at - 
http://www.youtube.com/c/Sql-datatools
To Learn more, please visit our Instagram account at -
https://www.instagram.com/asp.mukesh/
To Learn more, please visit our twitter account at -

https://twitter.com/macxima

Tuesday, December 17, 2019

Talend ETL - Lookup data for Insert, Update and delete

Here you will learn "How to use Lookup data for Insert, Update and Delete by using tMap?" in Talend Open Studio.


We have an ideal business scenario where business wants to ensure that lookup in output file, check if there is any change in the new input file and if the record doesn't exists in the output file, business needs to insert these records into the output file, if the row already exists then they needs to update the data in the output file.

We have fields such as unique key such as SalespersonId and Salesyear in the both files. So, we will just use tMap with inner join with new file as main row and previous one as lookup.

Our source and target are raw data files and using tFileInputDelimited, tFileOutputDelimited, tLogRow, tMap components in the example.

If we need to detect deleted records, you need another subjob where main row becomes the lookup, then rejected records are deleted ones. All the records of the lookup flow need to be loaded before processing each record of the main flow. 

Three types of lookup loading models are provided suiting various types of business requirement and the performance needs: Load once, Reload at each row, and Reload at each row (cache).

Used components to accomplished this jobs are -
tFileInputDelimated: We can use this component to read a file and separate fields contained in this file using a defined separator. It allows you to create a data flow.

tMap: tMap is an advanced component to transforms and routes data from single or multiple sources to single or multiple destinations, which integrates itself as plugin to Talend Studio. Possible uses are from a simple reorganization of fields to the most complex Jobs of data multiplexing or demultiplexing transformation, concatenation, inversion, filtering and more.

tLogRow: This component is used to monitor data processed and displays data or results in the Run console. This component can be used as intermediate step in a data flow or as a n end object in the Job flowchart.

tFileOutputDelimated: tFileOutputDelimited outputs data to a delimited file and This component writes a delimited file that holds data organized according to the defined schema. Use this component to write a delimited file and separate fields using a field separator value.

To watch a live demo, please check the below YouTube video -



To Learn more, please visit our YouTube channel at - 

To Learn more, please visit our Instagram account at -

To Learn more, please visit our twitter account at -

Thursday, December 5, 2019

Talend ETL - Delete Files After Processing

In our day to  day work as an ETL developer, we need to remove the files from the source after processing them into the system. If we are working as Talend developer then we can see that it is containing a lot of user friendly graphical controls to accomplish this kind of jobs.

We are going to show you how to check a file exists or not and then delete it after processing  in Talend Open Studio.


We are using following components -

tFileExist: This component can be used as standalone component and responsible to check if a file exists or not on a defined location. In fact, it helps to streamline processes by automating recurrent and tedious tasks such as checking if a file exists.

tFileInputDelimated: We can use this component to read a file and separate fields contained in this file using a defined separator. It allows you to create a data flow.

tLogRow: This component is used to monitor data processed and displays data or results in the Run console. This component can be used as intermediate step in a data flow or as a n end object in the Job flowchart.

tMsgBox: This component is useful to display a message on the screen and can be used as standalone component.

How to check if a File does exist or not- This scenario describes a simple Job that: checks if a given file exists, displays a graphical message to confirm that the file does not exist, reads the input data in another given file and display it by tLogRow component.



tFileDelete: This component suppresses a file from a defined directory. Means, helps to streamline processes by automating recurrent and tedious tasks such as delete. This component can be used as standalone component.

tJava: enables you to enter personalized code in order to integrate it in Talend program. keep in mind that we can execute this code only once.

How to delete a file after processing - This scenario describes a simple job to delete a file after processing it into the system.


To Learn more, please visit our blog at - 
http://www.sql-datatools.com
To Learn more, please visit our YouTube channel at - 
http://www.youtube.com/c/Sql-datatools
To Learn more, please visit our Instagram account at -
https://www.instagram.com/asp.mukesh/
To Learn more, please visit our twitter account at -
https://twitter.com/macxima

Monday, December 2, 2019

Talend ETL - Copy Files from Source To Destination

As an Talend ETL developer, you have to move files from source to destination location before processing them into the databases and you know also that Talend Open Studio is a very fabulous and mind blowing tool. It provides unlimited functionalities to accomplished many jobs. 

If you want to move your processing files from one to another location then you can use the following components -

tFileList: This component retrieves a set of files or folders based on a filemask pattern and iterates on each unity. This component helps to streamline processes by automating recurrent and tedious copy tasks. tFileList provides a list of files or folders from a defined directory on which it iterates.

tJava enables you to enter personalized code in order to integrate it in Talend program. You can execute this code only once. tJava makes it possible to extend the functionalities of a Talend Job using custom Java commands. tJava component is generally used as a one-component sub-job.

tFileCopy: This component copies a source file or folder into a target directory. Apart from this, it can be used as a standalone component. It comes with a lot of options -
  1. You can rename the file 
  2. You can remove the source file after moving it to destination location
  3. Replace existing file or overwrite any existing file with the newly copied file
  4. Create the directory if it doesn't exist means create the specified destination directory if it does not exist

To see a demo, please play the below video-



To Learn more, please visit our YouTube channel at - 

To Learn more, please visit our Instagram account at -

To Learn more, please visit our twitter account at -

Tuesday, November 5, 2019

PowerBI - Maintain Slicer Values through Bookmarks

You are creating a Power BI dashboard and using more than one bookmarks with multiple slicers and your dashboard is not able to maintain the selected slicer values across the navigation of the bookmarks. 
This is because the slicer does not maintain the value between different views of Summary and Detail.

If you want to maintain the slicer selected values though the bookmarks then it is very easy task to do it in Power BI. 

Simply, go to your bookmark pane and right click on it and deselect the Data option as given below-


If you are using more than one bookmarks in your dashboard then you have to unselected all the Data options for each bookmark. When you create a bookmark with this option active it will always return to that specific filter in your case is the All.

If you do this for all your bookmarks you should be able to go from one to the other keeping slicer choice.
Note: Please keep in mind that this feature is selected by default. 
By unchecking the Data option from the bookmarks, it will keep slicer values across bookmarks, however it removes ALL filters regardless of if its a Page Filter, Visual Filter, or Slicer, which means that any default filters are dropped.
If you want keep some default values on your dashboard then you should need to place the default values on your slicer and then selecting all those slicers create a bookmark and select the option Selected visuals, keeping the Data option active.
When you do that the all the slicer will keep the values on other slicers but the one you select will go back to the default values.

Friday, September 27, 2019

Talend ETL - Filtering data based on advanced conditions

Sometimes, we have to filter our source data based on the some advance conditions in the data processing to populate multiple outputs. It is very easy task if you are using Talend Open Studio which has a lot of rich data processing components and you can pass multiple logical conditions as well.

We are here to tell you "How to Filtering data source based on advanced conditions by using tFilterRow?" in Talend Open Studio.

tFilterRow processing component filters input rows by setting one or more conditions on the selected columns. It helps parametrizing filters on the source data. It uses a logical operator to combine simple conditions and to combine the filter results of both modes if any advanced conditions are defined.

Based on the logical operator selected, the conditions are evaluated one after the other in sequential order for each row. When evaluated, each condition returns the Boolean value of true or false.

Input column: Select the column of the schema the function is to be operated on
Function: Select the function on the list
Operator: Select the operator to bind the input column with the value
Value: Type in the filtered value, between quotes if needed.

Warning - In the Value field, you must type in your values between double quotes for all types of values, except for integer values, which do not need quotes.

Example - We want to separate players for "Tennis" and "Cricket" based in Games column as given below:
Id
FirstName
LastName
Games
1
Ryan
Arjun
Baseball
1
Ryan
Arjun
Table Tennis
1
Ryan
Arjun
 Cricket
1
Ryan
Arjun
Tennis
2
Kimmy
Wang
Cricket
2
Kimmy
Wang
Tennis
2
Kimmy
Wang
Basketball
2
Kimmy
Wang
Hockey
3
Rosy
Gray
Badminton
3
Rosy
Gray
 Volleyball
3
Rosy
Gray
Soccer
4
Will
Smith
Tennis
4
Will
Smith
 Basketball
4
Will
Smith
Volleyball
4
Will
Smith
Soccer
5
John
Godson
Swimming
5
John
Godson
 Volleyball
5
John
Godson
Soccer

We will use  tFileInputDelimated, tFilterRow and tLogRow from the Palette onto the design workspace.


Process flow -

  1. On the tFileInputDelimited Basic settings panel, set the filepath to the file to be denormalized and define the Row and Field separators, the Header and other information if required.
  2. Now, connect the tFileInputDelimited to the tLogRow, using a Row > Filter link and rename it as Source Data.
  3. Double-click tFilterRow to display its Basic settings view and define its properties. In the Conditions table, add four conditions and fill in the filtering parameters. From the InputColumn list field of the first row, select Games, from the Function list field, select Lower case, from the Operator list field, select Equals, and in the Value column, type in "tennis".
  4. Now, connect the tFilterRow to the tLogRow, using a Row > Filter link and rename it as Filter.  Then again drop tLogRow from the Palette into the design workspace and rename it as Reject and then again connect the tFilterRow to the reject, using a Row > Reject link.
Final output from Filter tLogRow components - They are in filtered records and rest of them are in rejected records. 
Each rejected record has a corresponding error message that explains the reason of rejection.

To Learn more, please visit our YouTube channel at - 

To Learn more, please visit our Instagram account at -

To Learn more, please visit our twitter account at -

YouTube demo example- 

Popular Posts