Dimensional
modeling is different from data modeling because it is fundamentally a logical
modeling of business requirements. In dimensions, all the entities are
often physical in nature such as Customers, Patients, Products, Stores, and
Salespersons etc. Identifying Dimensions and their attributes is easy when you
know the trick.
Dimensions that change over the time are known
as Slowly Changing Dimensions. For
example, Customer demographics, product characteristics,
classification rules, status of customers and Salespersons regional offices etc.
lead to changes in the attributes of dimensions.
If we try to see into a transaction system where
many a times the changes are overwritten and track of changes are lost easily.
Salesperson
example
- a source system may have only the latest Salesperson regional office, in
which they work. However, the salespeople are sometimes transferred from one
regional office to another.
For historical sales reporting purposes it may
be necessary to keep a record of the fact that a particular sales person had
been assigned to a particular regional office at an earlier date, whereas that
sales person is now assigned to a different regional office.
Customer
example -
a source system may have only the latest customer PIN Code which is needed to
send the marketing and billing statements. However, the customers are sometimes
transferred from one location to another.
For historical sales reporting purposes, a data
warehouse needs to maintain all the previous PIN Codes as well, because we need
to track on how many customers move to new locations over what frequency.
To handle these kinds of issues, a Slowly Changing Dimensions methodology
comes into the picture to provide the accurate reporting data to the
management. There are various techniques to maintain the historical data information.
To understand this, we are taking the example of the Salesperson with the
following columns-
SalesPersonKey
|
SalesPersonCode
|
SalesPersonName
|
RegionalCode
|
1
|
SP001
|
Ryan Arjun
|
CA
|
2
|
SP002
|
Kimmy Wang
|
IL
|
3
|
SP003
|
Don Wilcox
|
NY
|
4
|
SP004
|
Rosy Gray
|
TX
|
5
|
SP005
|
John Smith
|
TN
|
You can see that SalesPersoncode and RegionalCode
are the natural key and SalesPersonKey is using as the surrogate key. With the
help of the above Salesperson dimensional data table, we are going to demonstrate
the various Slowly Changing Dimensions (SCD)
methodologies as given below-
Type 1:Overwrite technique
- This is the technique, simply overwrite the existing salesperson records. For
example, Ryan Arjun is aligned with CA regional code and after sometime he is transferred
to another OH regional code. So, we can overwrite this record in Salesperson dimensional
data table as given below-
SalesPersonKey
|
SalesPersonCode
|
SalesPersonName
|
RegionalCode
|
1
|
SP001
|
Ryan Arjun
|
OH
|
2
|
SP002
|
Kimmy Wang
|
IL
|
3
|
SP003
|
Don Wilcox
|
NY
|
4
|
SP004
|
Rosy Gray
|
TX
|
5
|
SP005
|
John Smith
|
TN
|
The biggest disadvantage
of the Type 1 method is that there is no history in the data warehouse and but
it has the advantage however that it's easy to maintain.
If one has calculated an
aggregate table summarizing facts by RegionalCode, it will need to be
recalculated when the RegionalCode is changed.
Type 2: Addnew row technique
– In this case, we can easily maintain the movement history of the Salesperson
by adding the new column called IsActive in Salesperson dimensional data table.
In case of the salesperson is transferred from
one regional office to another then his IsActive status will be changed for the
old record as 0 and for the latest record it would be 1.
Now, we are taking the same example where Ryan
Arjun was aligned with CA regional code and after sometime he is transferred to
another OH regional code. So, we are going to add new row into the data table
as given below-
SalesPersonKey
|
SalesPersonCode
|
SalesPersonName
|
RegionalCode
|
IsActive
|
1
|
SP001
|
Ryan Arjun
|
CA
|
0
|
2
|
SP002
|
Kimmy Wang
|
IL
|
1
|
3
|
SP003
|
Don Wilcox
|
NY
|
1
|
4
|
SP004
|
Rosy Gray
|
TX
|
1
|
5
|
SP005
|
John Smith
|
TN
|
1
|
6
|
SP001
|
Ryan Arjun
|
OH
|
1
|
There is another method also where we can add
start date and end date in the Salesperson dimensional data table and maintain
the movement history of the Salesperson as given below-
SalesPersonKey
|
SalesPersonCode
|
SalesPersonName
|
RegionalCode
|
StartDate
|
EndDate
|
1
|
SP001
|
Ryan Arjun
|
CA
|
1/5/2013
|
14/10/2014
|
2
|
SP002
|
Kimmy Wang
|
IL
|
5/6/2013
|
NULL
|
3
|
SP003
|
Don Wilcox
|
NY
|
15/7/2013
|
NULL
|
4
|
SP004
|
Rosy Gray
|
TX
|
23/8/2013
|
NULL
|
5
|
SP005
|
John Smith
|
TN
|
23/10/2013
|
NULL
|
6
|
SP001
|
Ryan Arjun
|
OH
|
15/10/2014
|
NULL
|
The null value in the EndDate column indicates
the latest or current status of the SalesPerson in the Salesperson dimensional
data table.
Please visit - Part 2 - Slowly Changing Dimensions in Data Warehouse
Please visit - Part 2 - Slowly Changing Dimensions in Data Warehouse
No comments:
Post a Comment