Showing posts with label check existence of files in talend. Show all posts
Showing posts with label check existence of files in talend. Show all posts

Monday, March 16, 2020

Talend ETL - Email Validation


How to verify that the email address column's data is having @ or [.] if not then load rejected data on different table. As an addition to that Talend supplies many Apache Commons libraries which have hundreds of really useful, efficient and community (Java community) checked/built solutions. The Apache Commons Validator library comes with a whole host of validation methods for Emails, Phone Numbers, URLs, etc.



Processing file data – We are using the below data to validate the correct email.
Id
Name
Age
Email
201
Ryan Arjun
22
Ryan.Arjun@gmail.com
202
Mini Cooper
18
Mini.cooper@data.net
203
Kimmy Wang
34
Kimmy_Wang@dataspan.co.uk
204
Bill Willson
45
bill.willson@@microsoft.com
205
Donald Trump
56
donald..trump@usgov.gov

How to write Custom Code?
In the Repository, right click on Code, create a folder (here called "custom") then right click on "custom" and create a routine then define the function to validate the email address as given below:

package routines;
import java.util.regex.*;
public class CheckEmail {
    public static boolean isEmailValid(String email) {
                    String regex = "^[\\w!#$%&'*+/=?`{|}~^-]+(?:\\.[\\w!#$%&'*+/=?`{|}~^-]+)*@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,6}$";
                    Pattern pattern = Pattern.compile(regex);
                    Matcher matcher = pattern.matcher(email);
                    return matcher.matches();
    }
}

To build this job, you need the following processing components -

tFileInputDelimated: We can use this component to read a file and separate fields contained in this file using a defined separator. It allows you to create a data flow.

tLogRow: This component is used to monitor data processed and displays data or results in the Run console. This component can be used as intermediate step in a data flow or as a n end object in the Job flowchart.

tFilterRow: This component filters input rows by setting one or more conditions on the selected columns. It  helps parametrizing filters on the source data. This component is not startable (green background) and it requires an output component.

tMap: This component is an advanced component, which integrates itself as plugin to Talend Studio. It  transforms and routes data from single or multiple sources to single or multiple destinations. Possible uses are from a simple reorganization of fields to the most complex Jobs of data multiplexing or demultiplexing transformation, concatenation, inversion, filtering and more.
The use of tMap supposes minimum Java knowledge in order to fully exploit its functionalities. This component is a junction step, and for this reason cannot be a start nor end component in the Job.

To Learn more, please visit our YouTube channel at - 

To Learn more, please visit our Instagram account at -
To Learn more, please visit our twitter account at -
https://twitter.com/macxima 


Tuesday, December 17, 2019

Talend ETL - Lookup data for Insert, Update and delete

Here you will learn "How to use Lookup data for Insert, Update and Delete by using tMap?" in Talend Open Studio.


We have an ideal business scenario where business wants to ensure that lookup in output file, check if there is any change in the new input file and if the record doesn't exists in the output file, business needs to insert these records into the output file, if the row already exists then they needs to update the data in the output file.

We have fields such as unique key such as SalespersonId and Salesyear in the both files. So, we will just use tMap with inner join with new file as main row and previous one as lookup.

Our source and target are raw data files and using tFileInputDelimited, tFileOutputDelimited, tLogRow, tMap components in the example.

If we need to detect deleted records, you need another subjob where main row becomes the lookup, then rejected records are deleted ones. All the records of the lookup flow need to be loaded before processing each record of the main flow. 

Three types of lookup loading models are provided suiting various types of business requirement and the performance needs: Load once, Reload at each row, and Reload at each row (cache).

Used components to accomplished this jobs are -
tFileInputDelimated: We can use this component to read a file and separate fields contained in this file using a defined separator. It allows you to create a data flow.

tMap: tMap is an advanced component to transforms and routes data from single or multiple sources to single or multiple destinations, which integrates itself as plugin to Talend Studio. Possible uses are from a simple reorganization of fields to the most complex Jobs of data multiplexing or demultiplexing transformation, concatenation, inversion, filtering and more.

tLogRow: This component is used to monitor data processed and displays data or results in the Run console. This component can be used as intermediate step in a data flow or as a n end object in the Job flowchart.

tFileOutputDelimated: tFileOutputDelimited outputs data to a delimited file and This component writes a delimited file that holds data organized according to the defined schema. Use this component to write a delimited file and separate fields using a field separator value.

To watch a live demo, please check the below YouTube video -



To Learn more, please visit our YouTube channel at - 

To Learn more, please visit our Instagram account at -

To Learn more, please visit our twitter account at -

Thursday, December 5, 2019

Talend ETL - Delete Files After Processing

In our day to  day work as an ETL developer, we need to remove the files from the source after processing them into the system. If we are working as Talend developer then we can see that it is containing a lot of user friendly graphical controls to accomplish this kind of jobs.

We are going to show you how to check a file exists or not and then delete it after processing  in Talend Open Studio.


We are using following components -

tFileExist: This component can be used as standalone component and responsible to check if a file exists or not on a defined location. In fact, it helps to streamline processes by automating recurrent and tedious tasks such as checking if a file exists.

tFileInputDelimated: We can use this component to read a file and separate fields contained in this file using a defined separator. It allows you to create a data flow.

tLogRow: This component is used to monitor data processed and displays data or results in the Run console. This component can be used as intermediate step in a data flow or as a n end object in the Job flowchart.

tMsgBox: This component is useful to display a message on the screen and can be used as standalone component.

How to check if a File does exist or not- This scenario describes a simple Job that: checks if a given file exists, displays a graphical message to confirm that the file does not exist, reads the input data in another given file and display it by tLogRow component.



tFileDelete: This component suppresses a file from a defined directory. Means, helps to streamline processes by automating recurrent and tedious tasks such as delete. This component can be used as standalone component.

tJava: enables you to enter personalized code in order to integrate it in Talend program. keep in mind that we can execute this code only once.

How to delete a file after processing - This scenario describes a simple job to delete a file after processing it into the system.


To Learn more, please visit our blog at - 
http://www.sql-datatools.com
To Learn more, please visit our YouTube channel at - 
http://www.youtube.com/c/Sql-datatools
To Learn more, please visit our Instagram account at -
https://www.instagram.com/asp.mukesh/
To Learn more, please visit our twitter account at -
https://twitter.com/macxima