Showing posts with label tdenormalize in talend. Show all posts
Showing posts with label tdenormalize in talend. Show all posts

Tuesday, January 7, 2020

Talend ETL - How to Perform Rolling or Cumulative

If you are working as Talend ETL guy then you have to do some kinds of jobs in Talend which can be easily achieved in database end like in SQL Server. One of them is just Rolling or Cumulative sum also. Talend is full of many mind-blowing components which are fully capable to accomplished of any kind of jobs. 
   
Here you will learn "How to create a rolling or cumulative sum over different groups by using tJavaFlex ?" in Talend Open Studio.

We will use the following components -
tFileInputDelimated: We can use this component to read a file and separate fields contained in this file using a defined separator. It allows you to create a data flow.

tLogRow: This component is used to monitor data processed and displays data or results in the Run console. This component can be used as intermediate step in a data flow or as a n end object in the Job flowchart. 
SalesPersonId
SalesYear
TotalSales
201
2015
100
202
2015
200
203
2015
300
204
2015
400
205
2016
50
206
2016
100
207
2016
150
208
2016
200
209
2017
1000
210
2017
2000
211
2017
3000
212
2017
4000
213
2018
1050
214
2018
1100
215
2018
1150

tJavaFlex: This enables you to enter personalized code in order to integrate it in Talend program. With tJavaFlex, you can enter the three java-code parts (start, main and end) that constitute a kind of component dedicated to do a desired operation. 
It lets you add Java code to the Start/Main/End code sections of this component itself. You can use this component as a start, intermediate or output component. 

SalesPersonId
SalesYear
TotalSales
RunSal
201
2015
100
100
202
2015
200
300
203
2015
300
600
204
2015
400
1000
205
2016
50
1050
206
2016
100
1150
207
2016
150
1300
208
2016
200
1500
209
2017
1000
2500
210
2017
2000
4500
211
2017
3000
7500
212
2017
4000
11500
213
2018
1050
12550
214
2018
1100
13650
215
2018
1150
14800

You can as well use it as a one-component sub-job but you must know the Java language.

To watch a live demo -

To learn more, please follow us -
To Learn more, please visit our YouTube channel at - 
To Learn more, please visit our Instagram account at -
To Learn more, please visit our twitter account at -

Friday, September 27, 2019

Talend ETL - Filtering data based on advanced conditions

Sometimes, we have to filter our source data based on the some advance conditions in the data processing to populate multiple outputs. It is very easy task if you are using Talend Open Studio which has a lot of rich data processing components and you can pass multiple logical conditions as well.

We are here to tell you "How to Filtering data source based on advanced conditions by using tFilterRow?" in Talend Open Studio.

tFilterRow processing component filters input rows by setting one or more conditions on the selected columns. It helps parametrizing filters on the source data. It uses a logical operator to combine simple conditions and to combine the filter results of both modes if any advanced conditions are defined.

Based on the logical operator selected, the conditions are evaluated one after the other in sequential order for each row. When evaluated, each condition returns the Boolean value of true or false.

Input column: Select the column of the schema the function is to be operated on
Function: Select the function on the list
Operator: Select the operator to bind the input column with the value
Value: Type in the filtered value, between quotes if needed.

Warning - In the Value field, you must type in your values between double quotes for all types of values, except for integer values, which do not need quotes.

Example - We want to separate players for "Tennis" and "Cricket" based in Games column as given below:
Id
FirstName
LastName
Games
1
Ryan
Arjun
Baseball
1
Ryan
Arjun
Table Tennis
1
Ryan
Arjun
 Cricket
1
Ryan
Arjun
Tennis
2
Kimmy
Wang
Cricket
2
Kimmy
Wang
Tennis
2
Kimmy
Wang
Basketball
2
Kimmy
Wang
Hockey
3
Rosy
Gray
Badminton
3
Rosy
Gray
 Volleyball
3
Rosy
Gray
Soccer
4
Will
Smith
Tennis
4
Will
Smith
 Basketball
4
Will
Smith
Volleyball
4
Will
Smith
Soccer
5
John
Godson
Swimming
5
John
Godson
 Volleyball
5
John
Godson
Soccer

We will use  tFileInputDelimated, tFilterRow and tLogRow from the Palette onto the design workspace.


Process flow -

  1. On the tFileInputDelimited Basic settings panel, set the filepath to the file to be denormalized and define the Row and Field separators, the Header and other information if required.
  2. Now, connect the tFileInputDelimited to the tLogRow, using a Row > Filter link and rename it as Source Data.
  3. Double-click tFilterRow to display its Basic settings view and define its properties. In the Conditions table, add four conditions and fill in the filtering parameters. From the InputColumn list field of the first row, select Games, from the Function list field, select Lower case, from the Operator list field, select Equals, and in the Value column, type in "tennis".
  4. Now, connect the tFilterRow to the tLogRow, using a Row > Filter link and rename it as Filter.  Then again drop tLogRow from the Palette into the design workspace and rename it as Reject and then again connect the tFilterRow to the reject, using a Row > Reject link.
Final output from Filter tLogRow components - They are in filtered records and rest of them are in rejected records. 
Each rejected record has a corresponding error message that explains the reason of rejection.

To Learn more, please visit our YouTube channel at - 

To Learn more, please visit our Instagram account at -

To Learn more, please visit our twitter account at -

YouTube demo example-