Thursday, September 28, 2017

Part 2 - Slowly Changing Dimensions in Data Warehouse

Data Warehouse has a core need for maintaining historical information. So, it would become very necessary that how an entity has moved and changed the shape through the passage of time. There are many scenarios where the dimension data (Supplier, Salesperson, Customers, Product Price etc.) changes are expected to happen often. 
Type 3: Add new Attributes (columns) technique – In this case, we can preserves limited history by creating separate columns in Salesperson dimensional data table. This technique preserves limited history because it is limited to the number of columns designated for storing historical data. In this case, we have to add new columns to preserve limited data history as given below-
SalesPersonKey
SalesPersonCode
SalesPersonName
OldRegional
EffectiveDate
CurrentRegional
1
SP001
Ryan Arjun
CA
1/5/2013
OH
2
SP002
Kimmy Wang
IL
5/6/2013
NULL
3
SP003
Don Wilcox
NY
15/7/2013
NULL
4
SP004
Rosy Gray
TX
23/8/2013
NULL
5
SP005
John Smith
TN
23/10/2013
NULL
The main disadvantage of the technique is that this record contains a column for the old regional and current regional but we cannot track the changes if second time the salesperson relocates to another regional.

Type 4: Add history table technique – In this case, we can maintain the movement of the salesperson from one regional to another regional by creating another history table to store old movement history. Both the tables have the surrogate keys which are referenced in the Fact table to enhance query performance.
Now, we have two tables such as Salesperson Dim given below-
SalesPersonKey
SalesPersonCode
SalesPersonName
RegionalCode
1
SP001
Ryan Arjun
OH
2
SP002
Kimmy Wang
IL
3
SP003
Don Wilcox
NY
4
SP004
Rosy Gray
TX
5
SP005
John Smith
TN
and Salesperson History Dim as given below-
SalesPersonKey
SalesPersonCode
SalesPersonName
OldRegional
CreateDate
1
SP001
Ryan Arjun
CA
1/5/2013
2
SP002
Kimmy Wang
IL
5/6/2013
3
SP003
Don Wilcox
NY
15/7/2013
4
SP004
Rosy Gray
TX
23/8/2013
5
SP005
John Smith
TN
23/10/2013
6
SP001
Ryan Arjun
OH
15/10/2014
Conclusion
Dimension/Attribute that changes over the period of time are known as the slowly changing dimension. SCD like the ones in the case of salesperson regions example will perhaps need history with correctly tagged validity period.  It is very important to examine all the dimensions often with an independent perspective to be able to uncover and address the business risks and issues the changing dimensions bring.

Tuesday, September 26, 2017

Part 1 - Slowly Changing Dimensions in Data Warehouse

Dimensional modeling is different from data modeling because it is fundamentally a logical modeling of business requirements. In dimensions, all the entities are often physical in nature such as Customers, Patients, Products, Stores, and Salespersons etc. Identifying Dimensions and their attributes is easy when you know the trick.

Dimensions that change over the time are known as Slowly Changing Dimensions. For example, Customer demographics, product characteristics, classification rules, status of customers and Salespersons regional offices etc. lead to changes in the attributes of dimensions.

If we try to see into a transaction system where many a times the changes are overwritten and track of changes are lost easily.
Salesperson example - a source system may have only the latest Salesperson regional office, in which they work. However, the salespeople are sometimes transferred from one regional office to another.
For historical sales reporting purposes it may be necessary to keep a record of the fact that a particular sales person had been assigned to a particular regional office at an earlier date, whereas that sales person is now assigned to a different regional office.

Customer example - a source system may have only the latest customer PIN Code which is needed to send the marketing and billing statements. However, the customers are sometimes transferred from one location to another.

For historical sales reporting purposes, a data warehouse needs to maintain all the previous PIN Codes as well, because we need to track on how many customers move to new locations over what frequency.
To handle these kinds of issues, a Slowly Changing Dimensions methodology comes into the picture to provide the accurate reporting data to the management. There are various techniques to maintain the historical data information. To understand this, we are taking the example of the Salesperson with the following columns-
SalesPersonKey
SalesPersonCode
SalesPersonName
RegionalCode
1
SP001
Ryan Arjun
CA
2
SP002
Kimmy Wang
IL
3
SP003
Don Wilcox
NY
4
SP004
Rosy Gray
TX
5
SP005
John Smith
TN

You can see that SalesPersoncode and RegionalCode are the natural key and SalesPersonKey is using as the surrogate key. With the help of the above Salesperson dimensional data table, we are going to demonstrate the various Slowly Changing Dimensions (SCD) methodologies as given below-

Type 1:Overwrite technique - This is the technique, simply overwrite the existing salesperson records. For example, Ryan Arjun is aligned with CA regional code and after sometime he is transferred to another OH regional code. So, we can overwrite this record in Salesperson dimensional data table as given below-
SalesPersonKey
SalesPersonCode
SalesPersonName
RegionalCode
1
SP001
Ryan Arjun
OH
2
SP002
Kimmy Wang
IL
3
SP003
Don Wilcox
NY
4
SP004
Rosy Gray
TX
5
SP005
John Smith
TN
The biggest disadvantage of the Type 1 method is that there is no history in the data warehouse and but it has the advantage however that it's easy to maintain.

If one has calculated an aggregate table summarizing facts by RegionalCode, it will need to be recalculated when the RegionalCode is changed.

Type 2: Addnew row technique – In this case, we can easily maintain the movement history of the Salesperson by adding the new column called IsActive in Salesperson dimensional data table.

In case of the salesperson is transferred from one regional office to another then his IsActive status will be changed for the old record as 0 and for the latest record it would be 1.

Now, we are taking the same example where Ryan Arjun was aligned with CA regional code and after sometime he is transferred to another OH regional code. So, we are going to add new row into the data table as given below-
SalesPersonKey
SalesPersonCode
SalesPersonName
RegionalCode
IsActive
1
SP001
Ryan Arjun
CA
0
2
SP002
Kimmy Wang
IL
1
3
SP003
Don Wilcox
NY
1
4
SP004
Rosy Gray
TX
1
5
SP005
John Smith
TN
1
6
SP001
Ryan Arjun
OH
1
There is another method also where we can add start date and end date in the Salesperson dimensional data table and maintain the movement history of the Salesperson as given below-
SalesPersonKey
SalesPersonCode
SalesPersonName
RegionalCode
StartDate
EndDate
1
SP001
Ryan Arjun
CA
1/5/2013
14/10/2014
2
SP002
Kimmy Wang
IL
5/6/2013
NULL
3
SP003
Don Wilcox
NY
15/7/2013
NULL
4
SP004
Rosy Gray
TX
23/8/2013
NULL
5
SP005
John Smith
TN
23/10/2013
NULL
6
SP001
Ryan Arjun
OH
15/10/2014
NULL

The null value in the EndDate column indicates the latest or current status of the SalesPerson in the Salesperson dimensional data table. 
Please visit - Part 2 - Slowly Changing Dimensions in Data Warehouse