Wednesday, April 25, 2018

TSQL - GROUP BY with ROLLUP and CUBE


The ROLLUP operator is a very useful in generating reports that contain subtotals and totals which generates a result set that shows aggregates for a hierarchy of values in the selected columns. The super-aggregated column is represented by a NULL value and we can assign any value such as ‘ALL’, although the super-aggregate rows will always be added last. Multiple aggregates over different columns will be added if there are multiple GROUP BY columns.
It is useful- 
A. By using the ROLLUP option, you can use a single query to generate multiple grouping sets.
B. ROLLUP generates a result set that shows aggregates for a hierarchy of values in the selected columns.
C. By using ROLLUP, you can find total for corresponding to one or more columns

Examples
1. SQL ROLLUP with single column
2. SQL ROLLUP with multiple columns

3. SQL ROLLUP with partial rollup

WITH ROLLUP, as well as WITH CUBE, both are non-standard and deprecated.
To understand both operators, we will observe both the ROLLUP and CUBE operators with the help of a simple example as given below-
----- Create table variable
DECLARE @Item_List Table
(Product Varchar(20), Maker Varchar(20), Quantity Int)

----- Insert Data into table variable
INSERT INTO @Item_List(Product,Maker, Quantity )
VALUES ('Laptop', 'Dell', 300),
('Laptop', 'Lenovo', 150),
('Laptop', 'HP', 400),
('Desktop', 'Dell', 120),
('Desktop', 'Lenovo', 80),
('Desktop', 'HP', 250)

----- Pull the data from table variable
SELECT * FROM @Item_List;
Product
Maker
Quantity
Desktop
Dell
120
Desktop
HP
250
Desktop
Lenovo
80
Laptop
Dell
300
Laptop
HP
400
Laptop
Lenovo
150

----- Result by using WITH ROLLUP;
SELECT Product=ISNULL(Product,'ALL'), Maker=ISNULL(Maker,'ALL'), Quantity=Sum(Quantity)
FROM @Item_List
GROUP BY Product, Maker WITH ROLLUP;
Product
Maker
Quantity
Desktop
Dell
120
Desktop
HP
250
Desktop
Lenovo
80
Desktop
ALL
450
Laptop
Dell
300
Laptop
HP
400
Laptop
Lenovo
150
Laptop
ALL
850
ALL
ALL
1300



In addition to the subtotals generated by the ROLLUP extension, the WITH CUBE operator will generate subtotals for all combinations of the dimensions specified. In general, WITH CUBE generates a result set that shows aggregates for all combinations of values in the selected columns.
---- Result by using With Cube
SELECT Product=ISNULL(Product,'ALL'), Maker=ISNULL(Maker,'ALL'), Quantity=Sum(Quantity)
FROM @Item_List
GROUP BY Product , Maker WITH Cube ;
Product
Maker
Quantity
Desktop
Dell
120
Laptop
Dell
300
ALL
Dell
420
Desktop
HP
250
Laptop
HP
400
ALL
HP
650
Desktop
Lenovo
80
Laptop
Lenovo
150
ALL
Lenovo
230
ALL
ALL
1300
Desktop
ALL
450
Laptop
ALL
850
As the number of dimensions/columns increase, so do the combinations of subtotals that need to be calculated. The ROLLUP operator can also be used to calculate sub-totals for each column, based on the groupings within that column.

The Difference between ROLLUP and CUBE
There is only one major difference between the functionality of the ROLLUP operator and the CUBE operator. ROLLUP operator generates aggregated results for the selected columns in a hierarchical way whereas CUBE generates an aggregated result that contains all the possible combinations for the selected columns or dimensions.

To Learn more, please visit our YouTube channel at - 

http://www.youtube.com/c/Sql-datatools



Monday, April 23, 2018

SSIS - Data Flow


Microsoft provided SQL Server Integration Services, an amazing ETL tool to create packages to meet the business requirements. In SSIS, a package is a graphical design tool which may consist of organized collection of connections, control flow elements, data flow elements, event handlers, variables, parameters, and configurations.
After Control Flow, the Data Flow task is one of the most important and powerful components in SSIS which is responsible to retrieve data from heterogeneous  data sources, transform that data, and insert it into heterogeneous  destinations.

If you want to go Data Flow section then we must have to add Data Flow task in the control flow section tab as given below –


After adding Data Flow Task in Control Flow tab, double-click the Data Flow task in the control flow. This will move you to the Data Flow tab as given below –

How can we Configuring the Data Flow? 
As we already stated that SSIS is completely a graphical design tool and we can configure a Data Flow task by adding components to the Data Flow tab. In this area, SSIS tool supports three types of data flow components to meet the business:
  1. Sources: Extract data from external data sources, such as line-of-business systems, CRM systems, relational databases, web services, and SharePoint lists
  2. Transformations: This includes cleansing the data and converting it to an OLAP-friendly data model. The OLAP-friendly data model traditionally consists of dimension and fact tables in a star or snowflake schema, and closely maps SSAS’s dimensional model (SSAS stands for SQL Server Analysis Services)
  3. Destinations: Load the data in data warehouse/data marts, so that it can be quickly accessed by querying tools, such as reports. In practice, this implies processing SSAS cubes

Conclusion
Data Flow task will always start with a source and will usually end with a destination, but not always. We can also add as many transformations as necessary to prepare the data for the destination. For example, we can use the Derived Column transformation to add a computed column to the data flow, or we can use a Conditional Split transformation to split data into different destinations based on specified criteria. 


Friday, April 13, 2018

Review ColumnStore Index in SQL Server 2016


Microsoft development team has been introduced ColumnStore indexes technologies in SQL Server 2012 onward and now they are guiding us to which feature in SQL Server 2016 could help us in our workload. The most important things is that ColumnStore indexes are the significant technology of SQL Server where Microsoft development team is trying their best to improve this technology in every new release of the product.
SQL Server 2016 provides the ability to create a columnstore index on top of a memory-optimized table also which was introduced in SQL Server 2014 to allow a complete table to stay in memory all the time.  
This is the same code base technology which has been delivered in SQL Server, Azure SQL Server and VM Servers also as the services. The main objective of ColumnStore indexes to use in analytics and can achieve very high performance without make any changes in the running applications.
Before using ColumnStore indexes, we are simply using row store model which is the most common traditional model for any database/data warehouse project in SQL Server. If we want to improve the performance of any query which is based on row store model then we have to chance their indexes as Columnstore which convert the mode from row store, storage of the data to columnstore model.
The query run faster after building a columnstore index on the table, it compressed our data at column store level instead of row level which is the key factor to give us an amazing query performance. 
Query performance is much-much faster by building a columnstore clustered index because they have a special processing system due to of improved compression, reduced I/O and more data can easily fits in memory optimized for CPU utilization also.
Some point to be remember -
  1. We do not need to have table with Primary Key when we create a column store index because in this situation a Columnstore Index and Primary Key can be on different index. 
  2. If we are using non-clustered columnstore index and clustered columnstore index on data tables then they will carry different structures
  3. For clustered columnstore index, the table itself changes its internal structure quite differently compared to a normal table with non-clustered columnstore index.