Thursday, December 8, 2016

Cumulative Sum in SQL Server

Cumulative Sum or Running Total is a sequence of partial sums of a given sequence which is used to display the total sum of data as it grows with time or any other series or progression. This lets us know and view the total contribution so far of a given measure against a given sequence or time.
In SQL Server, getting running totals in T-SQL is not hard task because there are different ways of calculating cumulative sums or running totals. To understand it in a better way, we are taking an employee example to accumulate their salaries across the company as well as accumulate their salary within their departments also.
Demo data of Employee Master – We are using table variable to view the total contribution so far of a given measure against a given sequence or time as given below:
------ Declare table varible
DECLARE @EmpSalary TABLE
(
Id INT IDENTITY(1,1),
EmpId INT,
DeptId INT,
Salary FLOAT
)

----- Insert values into @EmpSalary
INSERT INTO @EmpSalary(EmpId, DeptId, Salary)
VALUES
(101,10,25000), (102,10,35000),
(103,11,15000), (104,11,18500)

---- Pull result from @EmpSalary
SELECT Id, EmpId, DeptId, Salary
FROM @EmpSalary
Id
EmpId
DeptId
Salary
1
101
10
25000
2
102
10
35000
3
103
11
15000
4
104
11
18500

Expected Output for Cumulative Sum or Running Total
By using T-SQL in SQL Server, we can get the expected result as given below:
Running Salary across the data table
Id
EmpId
DeptId
Salary
RunningSalary
1
101
10
25000
25000
2
102
10
35000
60000
3
103
11
15000
75000
4
104
11
18500
93500

Running Salary across the data table within department
Id
EmpId
DeptId
Salary
RunningSalary
RunningDeptSal
1
101
10
25000
25000
25000
2
102
10
35000
60000
60000
3
103
11
15000
75000
15000
4
104
11
18500
93500
33500

Cumulative Sum or Running Totals in SQL Server 2005 or 2008 R2
SQL Server supports most of the aggregation functions such as SUM and AVG in this context with the exceptions of grouping and we can use self-join within the table, nested select statements to pull the cumulative sum or running total in SQL Server 2005 or 2008 R2 as given below:
----- Running Salary across the data table
SELECT E.Id, E.EmpId, E.DeptId, E.Salary,
SUM(S.Salary) As RunningSalary
FROM @EmpSalary E
---- self-join
INNER JOIN @EmpSalary S on E.Id>=S.Id
GROUP BY E.Id, E.EmpId, E.DeptId, E.Salary
ORDER BY E.Id,Sum(S.Salary)

Id
EmpId
DeptId
Salary
RunningSalary
1
101
10
25000
25000
2
102
10
35000
60000
3
103
11
15000
75000
4
104
11
18500
93500

----- Running Salary across the data table within department
SELECT E.Id, E.EmpId, E.DeptId
,E.Salary, E.RunningSalary
,Sum(D.Salary) As RunningDeptSal
FROM
----- Drived data table
(
----- Pull Running Total Salary accross the data
SELECT E.Id, E.EmpId, E.DeptId, E.Salary,
SUM(S.Salary) As RunningSalary
FROM @EmpSalary E
---- self join
INNER JOIN @EmpSalary S on E.Id>=S.Id
GROUP BY E.Id, E.EmpId, E.DeptId, E.Salary
)E
---- Self join within dept
INNER JOIN @EmpSalary D on E.Id>=D.Id
AND E.DeptId=D.DeptId
----- group all column of derived table
GROUP BY E.Id, E.EmpId, E.DeptId,
E.Salary, E.RunningSalary
ORDER BY E.Id,Sum(D.Salary)
Id
EmpId
DeptId
Salary
RunningSalary
RunningDeptSal
1
101
10
25000
25000
25000
2
102
10
35000
60000
60000
3
103
11
15000
75000
15000
4
104
11
18500
93500
33500
The above queries calculates a cumulative sum of salary per department and ORDER BY Id and Aggregate sum () function. The rows are cross joined restricting the join only to equal or smaller ID values in right table.

Cumulative Sum or Running Totals in SQL Server 2012 on wards
In SQL Server 2012 on wards, more complex business problems such as running totals or cumulative sums could be solved without the extensive use of cursors or nested select statement. We can use OVER clause, PARTITION BY with the GROUP BY clause and Aggregates function to pull running totals or cumulative sums against each row as given below:
----- SQL Server 2012 Onwards
----- Running Salary across the data table
SELECT E.Id, E.EmpId, E.DeptId, E.Salary,
SUM(E.Salary) Over (Order by E.Id) As RunningSalary
FROM @EmpSalary E
GROUP BY E.Id, E.EmpId, E.DeptId, E.Salary
ORDER BY E.Id,Sum(E.Salary)

Id
EmpId
DeptId
Salary
RunningSalary
1
101
10
25000
25000
2
102
10
35000
60000
3
103
11
15000
75000
4
104
11
18500
93500

----- Running Salary across the data table within department
SELECT E.Id, E.EmpId, E.DeptId, E.Salary,
SUM(E.Salary) Over (Order by E.Id) As RunningSalary,
---- Use Over with Partition By and Order By
SUM(E.Salary) Over (Partition By E.DeptId Order by  E.Id) As RunningDeptSal
FROM @EmpSalary E
GROUP BY E.Id, E.EmpId, E.DeptId, E.Salary
ORDER BY E.Id,Sum(E.Salary)
Id
EmpId
DeptId
Salary
RunningSalary
RunningDeptSal
1
101
10
25000
25000
25000
2
102
10
35000
60000
60000
3
103
11
15000
75000
15000
4
104
11
18500
93500
33500

Note: This works with SQL Server 2012 and up, 2008 has limited support for window functions.
We can see here that the OVER clause allows us to manage the grouping based on the context specified in relationship to the current row. With the expansion of the OVER clause to include PARTITION BY and ORDER BY support with aggregates, window functions increased their value substantially in SQL Server 2012 on wards.

Conclusion
There are several ways but it depends on your SQL Server version to choose the best approach to pull the cumulative sum or running total against each row. If the running total needs to be calculated to different partitions of data, just to use more conditions in PARTITION BY clause, ORDER BY clause in the OVER clause.

No comments:

Post a Comment

Popular Posts