Saturday, October 8, 2016

SSIS - How to build Lookup Transformation Logic

The Lookup transformation performs lookups by joining data in input columns with columns in a reference dataset. Lookup transformation takes input value and searches for a row in dataset that contains this value in the specified field; once it finds it we are able to extract values from different fields that belong to the same. This means use the lookup to access additional information in a related dataset which is based on values in other datasets common columns.

For better understanding, we can use two data tables such as Employee having employee details like as EmpCode, EmpName and DeptId and another table such as Department having DeptId and Department.
Tables
Employee
Department
EmpId
EmpName
DeptId
DeptId
Department
1
Ryan Arjun
101
101
Finance
2
Kimmy Wang
102
102
Admin
3
Lucy Gray
103
103
Sales
4
Billy Doug
104
104
Admin
5
Gery Dean
105
105
IT
6
Uday Neil
106
106
HR

Matched - Lookup Transformation
EmpId
EmpName
DeptId
Department
1
Ryan Arjun
101
Finance
2
Kimmy Wang
102
Admin
3
Lucy Gray
103
Sales
4
Billy Doug
104
Admin
5
Gery Dean
105
IT
6
Uday Neil
106
HR
We can use this transformation in many scenarios to meet the business requirements because this is most necessary transformation component in building of a data warehouse. During our data extract, we need to pull the source system key (business key) that we want to find in our dimension table and replace with ID (surrogate key) which will be used to load our Fact data table.

Here, I’m showing only matching and un-matching output with a very simple example as given below: 


We’ll create a new package and add data flow task in source control as given below:

In data flow control, add source item and set the data source connection also as given below:

Now, we are using Employee table to extract all employee information in the OLE DB Source as given below:
We are able to view all the source columns in the above image. Now, we need to use the lookup transformation to pull the entire department from the department table which are based on the common column called DeptId in both datasets. So, put a lookup transformation in data flow control and link with the data source control as given below:

Configuring the General page of the Lookup Transformation Editor
Now, we need to show all unmatched records and matched records with the help of lookup transformation. In Lookup Transformation Editor Window’s General selection, we configure the lookup transformation by using one option from three configuring cache mode options which are defined as:
Full cache: The referenced dataset is generated and loaded into cache before the Lookup transformation is executed.
Partial cache: The referenced dataset is generated when the Lookup transformation is executed, and the dataset is loaded into cache.
No cache: The referenced dataset is generated when the Lookup transformation is executed, but no data is loaded into cache.

Connection type section, which supports two options: Cache Connection Manager and OLE DB Connection Manager. In our tutorials, we are using OLE DB connection manager option because we are retrieving data from the database.
Type of lookup. At the bottom there is a drop down box with four values specified. We need to pull both matched and unmatched rows from lookup transformation, so we will have to choose Redirect rows to no match output as given below:
Configuring the Connection page of the Lookup Transformation Editor
In Connection page, set OLE DB connection manager which is the database that contains both fields that we are looking for, specify department table and clicked preview button so we can see values from the data table as given below:
Configuring the Columns page of the Lookup Transformation editor
In Columns page, we can see two datasets. The left side is input (comes from source item in data flow) and the right side is what we look up against which we specified in previous step (connection page section). Here, we need to drag DeptId input field input lookup DeptId field which is the main part responsible for finding the correct row and checked Department column from the lookup dataset also as given below:
Note: Lookup match is based on case sensitive algorithm. If we are matching records in lookup transformation then both should have the same data types. If they are characters then both string should be in lower or capital format.

Selecting an output for the Lookup transformation
We have configured lookup transformation successfully. We need to show both matched and unmatched outputs, for this just drop two Row Sampling Transformations as given below:
When we try to connect lookup transformation then it will ask for Lookup Match Output or Lookup No Match Output as given below:

Connect both Row Sampling transformation with Lookup Transformation such as one for Matched and another for No Match output. Now, we need to Enable Data Viewer for both outputs as given below:
Time to run our SSIS Package
The final step, of course, is to run the SSIS package in BIDS. When we run the package, we can monitor that how many rows matched the lookup dataset and how many did not as given below:

Conclusion
The Lookup transformation makes it relatively easy to access referenced data in order to supplement the data flow. The coming referenced data could be from any source whose data could be saved to cache, or directly from another database through an OLE DB connection, Lookup Transformation component will work in particular situation in order to create an effective lookup operation.

No comments:

Post a Comment