Wednesday, May 22, 2019

Talend ETL - How to remove special characters in the string

Talend allows us to define each data lake as a data source and then develop processes to combine and filter data from those sources to produce new products (reports, new data, spreadsheets, etc.) All this is done using a diagrammatic interface based on Eclipse.

tMap Component Transformation 
The tMap component is part of the Processing family of components. tMap is one of the core components and is primarily used for mapping input data to output data, that is, mapping one Schema to another.
The Map Editor allows us to enter a Mapping Expression for each of the columns in each output Schema.

As well as performing mapping functions, tMap may also be used to Join multiple inputs, and to write multiple outputs. Additionally, we can Filter data within the tMap component. We'll cover these features in a later article.

If you want to remove any special characters from the input string/text then you can achieve this by using a tMap component. tMap Component Transformation comes with two replace() and replaceAll() function where replaceAll() works with regular expressions, replace() works with CharSequence. You have to take care of replace function in tMap components. To use tMap with the following expression to get the expected result:
To remove all non-digit characters from a string
row1.inputField.replaceAll("\\D", "")

To remove all digits characters from a string
row1.inputField.replaceAll("[0-9]", "")

To remove all digits characters from a string
row1.inputField.replaceAll("*,&;!", "")

To remove all digits characters from a string
row1.inputField.replace("??", "")
This will replace requested characters by nothing. If you want to learn the whole process then you can watch our demo at YouTube Channel also -

No comments:

Post a Comment