Monday, April 23, 2018

SSIS - Data Flow


Microsoft provided SQL Server Integration Services, an amazing ETL tool to create packages to meet the business requirements. In SSIS, a package is a graphical design tool which may consist of organized collection of connections, control flow elements, data flow elements, event handlers, variables, parameters, and configurations.
After Control Flow, the Data Flow task is one of the most important and powerful components in SSIS which is responsible to retrieve data from heterogeneous  data sources, transform that data, and insert it into heterogeneous  destinations.

If you want to go Data Flow section then we must have to add Data Flow task in the control flow section tab as given below –


After adding Data Flow Task in Control Flow tab, double-click the Data Flow task in the control flow. This will move you to the Data Flow tab as given below –

How can we Configuring the Data Flow? 
As we already stated that SSIS is completely a graphical design tool and we can configure a Data Flow task by adding components to the Data Flow tab. In this area, SSIS tool supports three types of data flow components to meet the business:
  1. Sources: Extract data from external data sources, such as line-of-business systems, CRM systems, relational databases, web services, and SharePoint lists
  2. Transformations: This includes cleansing the data and converting it to an OLAP-friendly data model. The OLAP-friendly data model traditionally consists of dimension and fact tables in a star or snowflake schema, and closely maps SSAS’s dimensional model (SSAS stands for SQL Server Analysis Services)
  3. Destinations: Load the data in data warehouse/data marts, so that it can be quickly accessed by querying tools, such as reports. In practice, this implies processing SSAS cubes

Conclusion
Data Flow task will always start with a source and will usually end with a destination, but not always. We can also add as many transformations as necessary to prepare the data for the destination. For example, we can use the Derived Column transformation to add a computed column to the data flow, or we can use a Conditional Split transformation to split data into different destinations based on specified criteria. 


2 comments:

  1. The Data Flow, conversely, can branch, split, and merge, providing parallel processing, but this concept is different from the Control Flow. Even though there may be a set of connected linear transformations, you cannot necessarily call the process a serial process, because the transformations in most cases will run at the same time, handling subsets of the data in parallel and passing groups of data downstream.

    ReplyDelete
  2. Multiple components are running at the same time because the Data Flow Transformations are working together in a coordinated streaming fashion, and the data is being transformed in groups (called buffers) as it is passed down from the source to the subsequent transformations.

    ReplyDelete