Thursday, March 29, 2018

TSQL - Set operations in SQL Server


In SQL Server, Set operations allow the results of multiple queries to be combined into a single result set. Set operators include UNION, UNION ALL, INTERSECT, and EXCEPT. Before using the set operations in SQL Server, we should keep the following things in our mind -
1. These set operations are design to work around more than two datasets or queries
2. Each dataset or query must have the same number of columns
3. Each column must have compatible data types,
4. Column names for the final result set are taken from the first query or datasets

Union - This operator is design to merge more than two datasets and it eliminates duplicate rows if any. It gives unique record from all data set. 

Union all - This operator to design merge more than two datasets and does not eliminate duplicate rows if any. It gives all  records from all data set.


Intersect - It will show when rows are identical and not include rows which are different. It will remove duplicate rows from the final result set.


EXCEPT - It takes the distinct rows of one query and returns the rows that do not appear in a second result set. The EXCEPT ALL operator does not remove duplicates. For purposes of row elimination and duplicate removal, the EXCEPT operator does not distinguish between NULLs.


Tuesday, March 20, 2018

TSQL - Row Number function in SQL Server

SQL Server Management Studio is one of the great tools for SQL Server which are capable to give us a wealth of information about its data components. SQL Server comes with many pretty and awesome functions which make developer’s life easy to go in the development environments. Today, we are going to understand Ranking functions in SQL Server and how do they work.
As we know that they provide a runtime assigning number to the records in the result set in SQL. One of them is Row_Number() function which allows the code to assign rankings or numbering against each row in result set data. Transact-SQL provides the following ranking functions: RANK (), DENSE_RANK (), NTILE () and ROW_NUMBER ().
Depending on the function that is used, some rows might receive the same value as other rows. Ranking functions are nondeterministic.
ROW_NUMBER is nondeterministic function which is responsible to calculate a temporary value whenever the query is run. In ROW_NUMBER() function, Partition By Clause is used for the grouping functionality on the result set data. When used in combination with the order by clause, it results in grouping the data and then the ranking of the rows is done where as Order By Clause decides how the ranking or numbering of the data starts.
More specifically, returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition. In SQL Server, we use the following syntax-
ROW_NUMBER() OVER ( [PARTITION BY value expression… [n]] ORDER BY clause)
Arguments
PARTITION BY value expression divides the result set produced by the FROM clause into partitions to which the ROW_NUMBER function is applied. value_expression specifies the column by which the result set is partitioned. If PARTITION BY is not specified, the function treats all rows of the query result set as a single group.
ORDER BY clause determines the sequence in which the rows are assigned their unique ROW_NUMBER within a specified partition. It is required.
General Remarks - There is no guarantee that the rows returned by a query using ROW_NUMBER() will be ordered exactly the same with each execution unless the following conditions are true.
Values of the partitioned column are unique.
Values of the ORDER BY columns are unique.
Combinations of values of the partition column and ORDER BY columns are unique.

To understand ROW_NUMBER function, we can take an example of the product data table where a single item is associated with a specific category as given below -
------ CREATE TEMP Product table
SELECT CATEGORY, CATEOGRY_NAME, ITEM_CODE, ITEM_NAME
INTO #PRODUCT
FROM
(
SELECT CATEGORY=101,CATEOGRY_NAME='Socks',ITEM_CODE=1201,ITEM_NAME='Mountain Bike Socks' UNION
SELECT CATEGORY=101,CATEOGRY_NAME='Socks',ITEM_CODE=1032,ITEM_NAME='Mountain Bike Socks' UNION
SELECT CATEGORY=101,CATEOGRY_NAME='Socks',ITEM_CODE=1503,ITEM_NAME='Racing Socks' UNION
SELECT CATEGORY=101,CATEOGRY_NAME='Socks',ITEM_CODE=1034,ITEM_NAME='Racing Socks' UNION
SELECT CATEGORY=102,CATEOGRY_NAME='Caps',ITEM_CODE=1205,ITEM_NAME='Baseball Caps' UNION
SELECT CATEGORY=102,CATEOGRY_NAME='Caps',ITEM_CODE=4106,ITEM_NAME='Fisherman Caps' UNION
SELECT CATEGORY=102,CATEOGRY_NAME='Caps',ITEM_CODE=5107,ITEM_NAME='Flat Caps' UNION
SELECT CATEGORY=103,CATEOGRY_NAME='Jeans & Pants',ITEM_CODE=1018,ITEM_NAME='Skinny Fit Jeans' UNION
SELECT CATEGORY=103,CATEOGRY_NAME='Jeans & Pants',ITEM_CODE=1019,ITEM_NAME='Skinny Fit Jeans' UNION
SELECT CATEGORY=103,CATEOGRY_NAME='Jeans & Pants',ITEM_CODE=1110,ITEM_NAME='Skinny Fit Jeans'
)P

---- Pull data from Product temp table
SELECT CATEGORY, CATEOGRY_NAME, ITEM_CODE, ITEM_NAME
FROM #PRODUCT

CATEGORY
CATEOGRY_NAME
ITEM_CODE
ITEM_NAME
101
Socks
1032
Mountain Bike Socks
101
Socks
1034
Racing Socks
101
Socks
1201
Mountain Bike Socks
101
Socks
1503
Racing Socks
102
Caps
1205
Baseball Caps
102
Caps
4106
Fisherman Caps
102
Caps
5107
Flat Caps
103
Jeans & Pants
1018
Skinny Fit Jeans
103
Jeans & Pants
1019
Skinny Fit Jeans
103
Jeans & Pants
1110
Skinny Fit Jeans
Row_Number function with Order By Clause
Now, you can see in the above product table, there is without any row number and we want to give a Row Id for each row then we can use Row_Number() function as given below -
---- CASE1:  ROW_NUMBER ORDER BY CATEGORY, ITEM_CODE
SELECT
---- ASSIGN ROW ID AGAINST EACH ROW ORDER BY CATEGORY, ITEM_CODE
ROWID= ROW_NUMBER() OVER (ORDER BY CATEGORY, ITEM_CODE),
CATEGORY, CATEOGRY_NAME, ITEM_CODE, ITEM_NAME
FROM #PRODUCT

ROWID
CATEGORY
CATEOGRY_NAME
ITEM_CODE
ITEM_NAME
1
101
Socks
1032
Mountain Bike Socks
2
101
Socks
1034
Racing Socks
3
101
Socks
1201
Mountain Bike Socks
4
101
Socks
1503
Racing Socks
5
102
Caps
1205
Baseball Caps
6
102
Caps
4106
Fisherman Caps
7
102
Caps
5107
Flat Caps
8
103
Jeans & Pants
1018
Skinny Fit Jeans
9
103
Jeans & Pants
1019
Skinny Fit Jeans
10
103
Jeans & Pants
1110
Skinny Fit Jeans


---- CASE2:  ROW_NUMBER ORDER BY ITEM_CODE
SELECT
---- ASSIGN ROW ID AGAINST EACH ROW ORDER BY ITEM_CODE
ROWID= ROW_NUMBER() OVER (ORDER BY ITEM_CODE),
CATEGORY, CATEOGRY_NAME, ITEM_CODE, ITEM_NAME
FROM #PRODUCT

ROWID
CATEGORY
CATEOGRY_NAME
ITEM_CODE
ITEM_NAME
1
103
Jeans & Pants
1018
Skinny Fit Jeans
2
103
Jeans & Pants
1019
Skinny Fit Jeans
3
101
Socks
1032
Mountain Bike Socks
4
101
Socks
1034
Racing Socks
5
103
Jeans & Pants
1110
Skinny Fit Jeans
6
101
Socks
1201
Mountain Bike Socks
7
102
Caps
1205
Baseball Caps
8
101
Socks
1503
Racing Socks
9
102
Caps
4106
Fisherman Caps
10
102
Caps
5107
Flat Caps

---- CASE3:  ROW_NUMBER ORDER BY CATEGORY DESC
SELECT
---- ASSIGN ROW ID AGAINST EACH ROW ORDER BY ITEM_CODE
ROWID= ROW_NUMBER() OVER (ORDER BY CATEGORY DESC),
CATEGORY, CATEOGRY_NAME, ITEM_CODE, ITEM_NAME
FROM #PRODUCT
ROWID
CATEGORY
CATEOGRY_NAME
ITEM_CODE
ITEM_NAME
1
103
Jeans & Pants
1018
Skinny Fit Jeans
2
103
Jeans & Pants
1019
Skinny Fit Jeans
3
103
Jeans & Pants
1110
Skinny Fit Jeans
4
102
Caps
1205
Baseball Caps
5
102
Caps
4106
Fisherman Caps
6
102
Caps
5107
Flat Caps
7
101
Socks
1032
Mountain Bike Socks
8
101
Socks
1034
Racing Socks
9
101
Socks
1201
Mountain Bike Socks
10
101
Socks
1503
Racing Socks
In the above query, you can see that we have used Row_Number() function with Order by clause.
In the first case, we have used order by category, item_code where resulted RowId is generated as per the category and item_code.
In the second case, we have used order by item_code where resulted RowId is generated as per the item_code and ignored the other columns.
In the third case, we have used order by category desc where resulted RowId is generated as per the category descending order and ignored the other columns.

Using ROW_NUMBER() with PARTITION
In the following example, we have used the PARTITION BY argument to partition the query result set by the column Category. The ORDER BY clause specified in the OVER clause orders the rows in each partition by the column Item_code. The ORDER BY clause in the SELECT statement orders the entire query result set by Category as given -
SELECT
---- ASSIGN ROW ID AGAINST EACH ROW ORDER BY ITEM_CODE
ROWID= ROW_NUMBER() OVER (PARTITION BY CATEGORY ORDER BY ITEM_CODE),
CATEGORY, CATEOGRY_NAME, ITEM_CODE, ITEM_NAME
FROM #PRODUCT
ROWID
CATEGORY
CATEOGRY_NAME
ITEM_CODE
ITEM_NAME
1
101
Socks
1503
Racing Socks
2
101
Socks
1201
Mountain Bike Socks
3
101
Socks
1034
Racing Socks
4
101
Socks
1032
Mountain Bike Socks
1
102
Caps
5107
Flat Caps
2
102
Caps
4106
Fisherman Caps
3
102
Caps
1205
Baseball Caps
1
103
Jeans & Pants
1110
Skinny Fit Jeans
2
103
Jeans & Pants
1019
Skinny Fit Jeans
3
103
Jeans & Pants
1018
Skinny Fit Jeans
In the above query, we have used partition by and order by clause to generate on the fly or runtime RowId for each row.
Conclusion
Row_Number() function works by assigning continuous ranking to the records, without skipping any number in the result set, whether it is partitioned or not. In SQL, it generates a run-time column in the result set which generates sequential number to each row according to the column used in order by clause.