Microsoft provided SQL Server Integration
Services, an amazing ETL tool to create packages to meet the business
requirements. In SSIS, a package is a graphical design tool which may consist of
organized collection of connections, control flow elements, data flow elements,
event handlers, variables, parameters, and configurations.
After Control Flow, the Data Flow task
is one of the most important and powerful components in SSIS which is
responsible to retrieve data from heterogeneous data sources, transform that data, and
insert it into heterogeneous destinations.
If you want to go Data Flow section then we must
have to add Data Flow task in the control flow section tab as given below –
After adding Data Flow Task in Control Flow tab,
double-click the Data Flow task in the control flow.
This will move you to the Data Flow tab as given
below –
How can we
Configuring the Data Flow?
As we already stated that SSIS is completely a graphical
design tool and we can configure a Data Flow task by adding
components to the Data Flow tab. In this area, SSIS tool supports
three types of data flow components to meet the business:
- Sources: Extract data from external data sources, such as line-of-business systems, CRM systems, relational databases, web services, and SharePoint lists
- Transformations: This includes cleansing the data and converting it to an OLAP-friendly data model. The OLAP-friendly data model traditionally consists of dimension and fact tables in a star or snowflake schema, and closely maps SSAS’s dimensional model (SSAS stands for SQL Server Analysis Services)
- Destinations: Load the data in data warehouse/data marts, so that it can be quickly accessed by querying tools, such as reports. In practice, this implies processing SSAS cubes
Conclusion
A Data Flow task will always
start with a source and will usually end with a destination, but not always. We
can also add as many transformations as necessary to prepare the data for the
destination. For example, we can use the Derived Column transformation
to add a computed column to the data flow, or we can use a Conditional
Split transformation to split data into different destinations based
on specified criteria.
The Data Flow, conversely, can branch, split, and merge, providing parallel processing, but this concept is different from the Control Flow. Even though there may be a set of connected linear transformations, you cannot necessarily call the process a serial process, because the transformations in most cases will run at the same time, handling subsets of the data in parallel and passing groups of data downstream.
ReplyDeleteMultiple components are running at the same time because the Data Flow Transformations are working together in a coordinated streaming fashion, and the data is being transformed in groups (called buffers) as it is passed down from the source to the subsequent transformations.
ReplyDelete