Monday, March 16, 2020

Talend ETL - Email Validation


How to verify that the email address column's data is having @ or [.] if not then load rejected data on different table. As an addition to that Talend supplies many Apache Commons libraries which have hundreds of really useful, efficient and community (Java community) checked/built solutions. The Apache Commons Validator library comes with a whole host of validation methods for Emails, Phone Numbers, URLs, etc.



Processing file data – We are using the below data to validate the correct email.
Id
Name
Age
Email
201
Ryan Arjun
22
Ryan.Arjun@gmail.com
202
Mini Cooper
18
Mini.cooper@data.net
203
Kimmy Wang
34
Kimmy_Wang@dataspan.co.uk
204
Bill Willson
45
bill.willson@@microsoft.com
205
Donald Trump
56
donald..trump@usgov.gov

How to write Custom Code?
In the Repository, right click on Code, create a folder (here called "custom") then right click on "custom" and create a routine then define the function to validate the email address as given below:

package routines;
import java.util.regex.*;
public class CheckEmail {
    public static boolean isEmailValid(String email) {
                    String regex = "^[\\w!#$%&'*+/=?`{|}~^-]+(?:\\.[\\w!#$%&'*+/=?`{|}~^-]+)*@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,6}$";
                    Pattern pattern = Pattern.compile(regex);
                    Matcher matcher = pattern.matcher(email);
                    return matcher.matches();
    }
}

To build this job, you need the following processing components -

tFileInputDelimated: We can use this component to read a file and separate fields contained in this file using a defined separator. It allows you to create a data flow.

tLogRow: This component is used to monitor data processed and displays data or results in the Run console. This component can be used as intermediate step in a data flow or as a n end object in the Job flowchart.

tFilterRow: This component filters input rows by setting one or more conditions on the selected columns. It  helps parametrizing filters on the source data. This component is not startable (green background) and it requires an output component.

tMap: This component is an advanced component, which integrates itself as plugin to Talend Studio. It  transforms and routes data from single or multiple sources to single or multiple destinations. Possible uses are from a simple reorganization of fields to the most complex Jobs of data multiplexing or demultiplexing transformation, concatenation, inversion, filtering and more.
The use of tMap supposes minimum Java knowledge in order to fully exploit its functionalities. This component is a junction step, and for this reason cannot be a start nor end component in the Job.

To Learn more, please visit our YouTube channel at - 

To Learn more, please visit our Instagram account at -
To Learn more, please visit our twitter account at -
https://twitter.com/macxima 


No comments:

Post a Comment

Popular Posts

Get Sponsored by Big Brands