Wednesday, January 8, 2020

Python - Extracting Email Addresses Using Regular Expressions

As a python developers, we have to accomplished a lot of jobs such as data cleansing from a file before processing the other business operations. 

For an example, you have a raw data text file and you have to read some specific data like email addresses by  to performing the actual Regular Expression matching.

What is a Regular Expression and which module is used in Python?
Regular expression is a sequence of special character(s) mainly used to find and replace patterns in a string or file, using a specialized syntax held in a pattern. 
The Python module re provides full support for Perl-like regular expressions in Python. The re module raises the exception re.error if an error occurs while compiling or using a regular expression.
Example -
# Python program to extract emails from the String By Regular Expression.
  
# Importing module required for regular
# expressions
import re 
  
# Example string 
txt = "Ryan has sent an invoice email to john.d@yahoo.com by using his email id ryan.arjun@gmail.com and he also shared a copy to his boss rosy.gray@amazon.co.uk on the cc part."
  
# \w matches any non-whitespace character
# @ for as in the Email
# + for Repeats a character one or more times
findEmail = re.findall(r'[\w\.-]+@[\w\.-]+', txt)  

# Printing findEmail of List
print(findEmail)


Output - 
['john1.d@yahoo.com', 'ryan.arjun@gmail.com', 'rosy.gray@amazon.co.uk']


To learn more, please follow us -

To Learn more, please visit our YouTube channel at - 
To Learn more, please visit our Instagram account at -
To Learn more, please visit our twitter account at -

Tuesday, January 7, 2020

Talend ETL - How to Perform Rolling or Cumulative

If you are working as Talend ETL guy then you have to do some kinds of jobs in Talend which can be easily achieved in database end like in SQL Server. One of them is just Rolling or Cumulative sum also. Talend is full of many mind-blowing components which are fully capable to accomplished of any kind of jobs. 
   
Here you will learn "How to create a rolling or cumulative sum over different groups by using tJavaFlex ?" in Talend Open Studio.

We will use the following components -
tFileInputDelimated: We can use this component to read a file and separate fields contained in this file using a defined separator. It allows you to create a data flow.

tLogRow: This component is used to monitor data processed and displays data or results in the Run console. This component can be used as intermediate step in a data flow or as a n end object in the Job flowchart. 
SalesPersonId
SalesYear
TotalSales
201
2015
100
202
2015
200
203
2015
300
204
2015
400
205
2016
50
206
2016
100
207
2016
150
208
2016
200
209
2017
1000
210
2017
2000
211
2017
3000
212
2017
4000
213
2018
1050
214
2018
1100
215
2018
1150

tJavaFlex: This enables you to enter personalized code in order to integrate it in Talend program. With tJavaFlex, you can enter the three java-code parts (start, main and end) that constitute a kind of component dedicated to do a desired operation. 
It lets you add Java code to the Start/Main/End code sections of this component itself. You can use this component as a start, intermediate or output component. 

SalesPersonId
SalesYear
TotalSales
RunSal
201
2015
100
100
202
2015
200
300
203
2015
300
600
204
2015
400
1000
205
2016
50
1050
206
2016
100
1150
207
2016
150
1300
208
2016
200
1500
209
2017
1000
2500
210
2017
2000
4500
211
2017
3000
7500
212
2017
4000
11500
213
2018
1050
12550
214
2018
1100
13650
215
2018
1150
14800

You can as well use it as a one-component sub-job but you must know the Java language.

To watch a live demo -

To learn more, please follow us -
To Learn more, please visit our YouTube channel at - 
To Learn more, please visit our Instagram account at -
To Learn more, please visit our twitter account at -