Friday, September 18, 2015

Data Warehouse / Data Marts – Fact Tables

Facts tables play a very dynamic role in the data warehouses or data marts because they consist of the measurements, metrics or facts of a business process e.g., sales revenue by month by product.

Example of Fact Table

In the schema above, we have fact table FactSales that has a grain which gives us a number of units sold by date, by store, by customer and by product.
All other tables such as DimDate, DimStore, DimCustomer and DimProduct are dimensions tables. This schema is known as star schema.

In this fact table, each entry represents the sale of a specific product on a specific day to a specific customer in accordance with a specific promotion at a specific store. The business measurements captured are the value of the sale, the cost to the store, and the quantity sold.

In a very simple way, we can say that a fact table is nothing but might be contained business sales events such as cash register transactions or the contributions and expenditures of a nonprofit organization. Fact tables usually contain large numbers of rows, sometimes in the hundreds of millions of records when they contain one or more years of history for a large organization.
In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called "Factless fact tables", or "Junction tables". The "Factless fact tables" can for example be used for modeling many-to-many relationships or capture events

Important Features of Fact Tables
  1. They placed at the center of a star schema or a snowflake schema surrounded by dimension tables. Where multiple fact tables are used, these are arranged as a fact constellation schema. A fact table typically has two types of columns: those that contain facts and those that are a foreign key to dimension tables.
  2. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. Fact tables contain the content of the data warehouse and store different types of measures like additive, non-additive, and semi additive measures.
  3. Mostly fact tables contain numerical data (facts) that can be summarized to provide information about the history of the operation of the organization.
  4. Each fact table also includes a multipart index that contains as foreign keys the primary keys of related dimension tables, which contain the attributes of the fact records.
  5. Fact tables should not contain descriptive information or any data other than the numerical measurement fields and the index fields that relate the facts to corresponding entries in the dimension tables.
  6. Fact tables provide the (usually) additive measures that act as independent variables by which dimensional attributes are analyzed. Fact tables are often defined by their grain. The grain of a fact table represents the most atomic level by which the facts may be defined.
  7. Additive measures allow summary information to be obtained by adding various quantities of the measure, such as the sales of a specific item at a group of stores for a particular time period. Non-additive measures such as inventory quantity-on-hand values can also be used in fact tables, but different summarization techniques must then be used.
Measure types
Fact table can store different types of measures such as given below-
  • Additive Measure– As it name implied, additive measures are measures which can be added across all dimensions.
  • Non-additive Measure– different from additive measures, non-additive measures are measures that cannot be added across all dimensions.
  • Semi additive Measure– semi additive measures are measure that can be added across only some dimensions and not across other.
A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables).
Special care must be taken when handling ratios and percentage. One good design rule  is to never store percentages or ratios in fact tables but only calculate these in the data access tool. Thus only store the numerator and denominator in the fact table, which then can be aggregated and the aggregated stored values can then be used for calculating the ratio or percentage in the data access tool.


Types of fact tables
All fact tables are categorized by four most basic measurement events:
Transactional – Transactional fact table is the most basic one that each grain associated with it indicated as “one row per line in a transaction”, e.g., every line item appears on an invoice. Transaction fact table stores data of the most detailed level therefore it has high number of dimensions associated with.
Periodic snapshots – A Periodic snapshots fact table stores data that is a snapshot in a period of time. The source data of periodic snapshots fact table is data from a transaction fact table where you choose period to get the output. A periodic snapshot table is dependent on the transactional table, as it needs the detailed data held in the transactional fact table in order to deliver the chosen performance output.
Accumulating snapshots – The accumulating snapshots fact table describes activity of a business process that has clear beginning and end. This type of fact table therefore has multiple date columns to represent milestones in the process. A good example of accumulating snapshots fact table is processing of a material. As steps towards handling the material are finished, the corresponding record in the accumulating snapshots fact table get updated.
Temporal snapshots - By applying Temporal Database theory and modelling techniques the Temporal Snapshot Fact Table allows to have the equivalent of daily snapshots without really having daily snapshots. It introduces the concept of Time Intervals into a fact table, allowing to save a lot of space, optimizing performances while allowing the end user to have the logical equivalent of the "picture of the moment" he is interested in.

Aggregation in Fact Tables
Aggregation is the process of calculating summary data from detail records. It is often tempting to reduce the size of fact tables by aggregating data into summary records when the fact table is created. However, when data is summarized in the fact table, detailed information is no longer directly available to the analyst. If detailed information is needed, the detail rows that were summarized will have to be identified and located, possibly in the source system that provided the data. Fact table data should be maintained at the finest granularity possible. Aggregating data in the fact table should only be done after considering the consequences.

Designing fact table steps
There are very simple four steps to design a fact table described by Kimball:
Identify business process to model – The first step is to decide what business process to model by gathering and understanding business needs and available data such as Identify a business process for analysis (like sales)
Identify measures of facts – by declaring a grain means describing exactly what a fact table record represents
Identify dimensions for facts once grain of fact table is stated clearly, it is time to determine dimensions for the fact table.

2 comments:

  1. A data warehouse is an enterprise strategy that aims to overcome the common problem of data silos, or isolated pockets of data, which are inaccessible to other parts of the enterprise and not well integrated.seo dashboard

    ReplyDelete