Microsoft Business Intelligence (Data Tools)|April 2018

Wednesday, April 25, 2018

TSQL - GROUP BY with ROLLUP and CUBE

The ROLLUP operator is a very useful in generating reports that contain subtotals and totals which generates a result set that shows aggregates for a hierarchy of values in the selected columns. The super-aggregated column is represented by a NULL value and we can assign any value such as ‘ALL’, although the super-aggregate rows will always be added last. Multiple aggregates over different columns will be added if there are multiple GROUP BY columns.
It is useful-
A. By using the ROLLUP option, you can use a single query to generate multiple grouping sets.
B. ROLLUP generates a result set that shows aggregates for a hierarchy of values in the selected columns.
C. By using ROLLUP, you can find total for corresponding to one or more columns

Examples
1. SQL ROLLUP with single column
2. SQL ROLLUP with multiple columns

3. SQL ROLLUP with partial rollup

WITH ROLLUP, as well as WITH CUBE, both are non-standard and deprecated.

To understand both operators, we will observe both the ROLLUP and CUBE operators with the help of a simple example as given below-

----- Create table variable

DECLARE @Item_List Table

(Product Varchar(20), Maker Varchar(20), Quantity Int)

----- Insert Data into table variable

INSERT INTO @Item_List(Product,Maker, Quantity )

VALUES ('Laptop', 'Dell', 300),

('Laptop', 'Lenovo', 150),

('Laptop', 'HP', 400),

('Desktop', 'Dell', 120),

('Desktop', 'Lenovo', 80),

('Desktop', 'HP', 250)

----- Pull the data from table variable

SELECT * FROM @Item_List;

Product	Maker	Quantity
Desktop	Dell	120
Desktop	HP	250
Desktop	Lenovo	80
Laptop	Dell	300
Laptop	HP	400
Laptop	Lenovo	150

----- Result by using WITH ROLLUP;

SELECT Product=ISNULL(Product,'ALL'), Maker=ISNULL(Maker,'ALL'), Quantity=Sum(Quantity)

FROM @Item_List

GROUP BY Product, Maker WITH ROLLUP;

Product	Maker	Quantity
Desktop	Dell	120
Desktop	HP	250
Desktop	Lenovo	80
Desktop	ALL	450
Laptop	Dell	300
Laptop	HP	400
Laptop	Lenovo	150
Laptop	ALL	850
ALL	ALL	1300

In addition to the subtotals generated by the ROLLUP extension, the WITH CUBE operator will generate subtotals for all combinations of the dimensions specified. In general, WITH CUBE generates a result set that shows aggregates for all combinations of values in the selected columns.

---- Result by using With Cube

SELECT Product=ISNULL(Product,'ALL'), Maker=ISNULL(Maker,'ALL'), Quantity=Sum(Quantity)

FROM @Item_List

GROUP BY Product , Maker WITH Cube ;

Product	Maker	Quantity
Desktop	Dell	120
Laptop	Dell	300
ALL	Dell	420
Desktop	HP	250
Laptop	HP	400
ALL	HP	650
Desktop	Lenovo	80
Laptop	Lenovo	150
ALL	Lenovo	230
ALL	ALL	1300
Desktop	ALL	450
Laptop	ALL	850

As the number of dimensions/columns increase, so do the combinations of subtotals that need to be calculated. The ROLLUP operator can also be used to calculate sub-totals for each column, based on the groupings within that column.

The Difference between ROLLUP and CUBE

There is only one major difference between the functionality of the ROLLUP operator and the CUBE operator. ROLLUP operator generates aggregated results for the selected columns in a hierarchical way whereas CUBE generates an aggregated result that contains all the possible combinations for the selected columns or dimensions.

To Learn more, please visit our YouTube channel at -

http://www.youtube.com/c/Sql-datatools

Mukesh Singh

With over 17 years of experience in the Data Engineering stack across a variety of cloud and on-premises systems, I have successfully delivered more than ten complete business product solutions. My expertise lies in building robust infrastructure and architecture to support data engineering, data analytics, and machine learning processes. These solutions have significantly improved collaboration among cross-functional teams, including data scientists, business analysts, software engineers, and stakeholders. Key Contributions Data Modelling and Integration • Data Modeling: Developed various data models to produce suitable data for business users, data analytics, data science, and data visualization teams. • Legacy Systems and Cloud Technologies: Integrated legacy systems with modern cloud-based technologies (AWS, Azure, GCP), data lakes, and data warehouses. • Streamlined Data Pipelines: Built efficient data pipelines, data warehouses, BI reports, and dashboards to streamline data access and insights.

Monday, April 23, 2018

SSIS - Data Flow

Microsoft provided SQL Server Integration Services, an amazing ETL tool to create packages to meet the business requirements. In SSIS, a package is a graphical design tool which may consist of organized collection of connections, control flow elements, data flow elements, event handlers, variables, parameters, and configurations.

After Control Flow, the Data Flow task is one of the most important and powerful components in SSIS which is responsible to retrieve data from heterogeneous data sources, transform that data, and insert it into heterogeneous destinations.

If you want to go Data Flow section then we must have to add Data Flow task in the control flow section tab as given below –

After adding Data Flow Task in Control Flow tab, double-click the Data Flow task in the control flow. This will move you to the Data Flow tab as given below –

How can we Configuring the Data Flow?

As we already stated that SSIS is completely a graphical design tool and we can configure a Data Flow task by adding components to the Data Flow tab. In this area, SSIS tool supports three types of data flow components to meet the business:

Sources: Extract data from external data sources, such as line-of-business systems, CRM systems, relational databases, web services, and SharePoint lists
Transformations: This includes cleansing the data and converting it to an OLAP-friendly data model. The OLAP-friendly data model traditionally consists of dimension and fact tables in a star or snowflake schema, and closely maps SSAS’s dimensional model (SSAS stands for SQL Server Analysis Services)
Destinations: Load the data in data warehouse/data marts, so that it can be quickly accessed by querying tools, such as reports. In practice, this implies processing SSAS cubes

Conclusion

A Data Flow task will always start with a source and will usually end with a destination, but not always. We can also add as many transformations as necessary to prepare the data for the destination. For example, we can use the Derived Column transformation to add a computed column to the data flow, or we can use a Conditional Split transformation to split data into different destinations based on specified criteria.

Mukesh Singh

Friday, April 13, 2018

Review ColumnStore Index in SQL Server 2016

Microsoft development team has been introduced ColumnStore indexes technologies in SQL Server 2012 onward and now they are guiding us to which feature in SQL Server 2016 could help us in our workload. The most important things is that ColumnStore indexes are the significant technology of SQL Server where Microsoft development team is trying their best to improve this technology in every new release of the product.

SQL Server 2016 provides the ability to create a columnstore index on top of a memory-optimized table also which was introduced in SQL Server 2014 to allow a complete table to stay in memory all the time.

This is the same code base technology which has been delivered in SQL Server, Azure SQL Server and VM Servers also as the services. The main objective of ColumnStore indexes to use in analytics and can achieve very high performance without make any changes in the running applications.

Before using ColumnStore indexes, we are simply using row store model which is the most common traditional model for any database/data warehouse project in SQL Server. If we want to improve the performance of any query which is based on row store model then we have to chance their indexes as Columnstore which convert the mode from row store, storage of the data to columnstore model.

The query run faster after building a columnstore index on the table, it compressed our data at column store level instead of row level which is the key factor to give us an amazing query performance.

Query performance is much-much faster by building a columnstore clustered index because they have a special processing system due to of improved compression, reduced I/O and more data can easily fits in memory optimized for CPU utilization also.

Some point to be remember -

We do not need to have table with Primary Key when we create a column store index because in this situation a Columnstore Index and Primary Key can be on different index.
If we are using non-clustered columnstore index and clustered columnstore index on data tables then they will carry different structures
For clustered columnstore index, the table itself changes its internal structure quite differently compared to a normal table with non-clustered columnstore index.

Mukesh Singh

Wednesday, April 25, 2018

TSQL - GROUP BY with ROLLUP and CUBE

Monday, April 23, 2018

SSIS - Data Flow

Friday, April 13, 2018

Review ColumnStore Index in SQL Server 2016

Popular Posts