Wednesday, September 2, 2015

DW- Architecture and Multidimensional Model

We know that data warehousing is a collection of methods, techniques and tools which is used to support knowledge workers such as senior managers, directors, managers, and business analysts to conduct data analyses that help with performing decision-making processes and improving information resources.
Fundamentally, data warehouse is the secured place for the historical data where data is never deleted and data updates are normally carried out when data warehouses are offline.

Data Warehouse (with a Staging Area and Data Marts) - Three-Layer Architecture
As per the enterprises/business requirements, you can easily customize your data warehouse architecture according to the different business group. For example, if you want to create a data warehouse for an ERP system (where purchasing, sales, and inventories are separated) then you can do this by adding different data marts, which are systems designed for a particular line (business modules) of business. 

The main advantages of the three-layer architecture is that – 
  1. It creates a common reference data model for a whole enterprise. 
  2. At the same time, it sharply separates the problems of source data extraction and integration from those of data warehouse population. 
  3. Identically, in some cases, the staging layer is also directly used to better accomplish some operational tasks, such as producing daily reports that cannot be satisfactorily prepared using the corporate applications, or generating data flows to feed external processes periodically so as to benefit from cleaning and integration. 
  4. This approach assumes that although a data warehouse is available, it is unable to solve all the queries formulated. 
  5. This means that users may be interested in directly accessing source data from aggregate data (drill-through). 
  6. Data Marts are dependent on the data warehouse and metadata but they are independent to provide information for a particular line of business.
Data Warehouse - Multidimensional Model
Multidimensional databases have generated much research and market interest because they are fundamental for many decision-making support applications, such as data warehouse systems. Before build a data warehouse, you need to have defined the business requirements and also agreed upon the scope of your application, and created a conceptual design where you need to translate your requirements into a system deliverable. To do so, you have to choose the ideal data warehouse architecture to fulfill logical and physical requirement by considering the following things -
  1. The specific data content
  2. Relationships within and between groups of data
  3. The system environment supporting your data warehouse
  4. The data transformations required
  5. The frequency with which data is refreshed

The multidimensional model is the base of any data warehouse and begins with the observation that the factors affecting decision-making processes are enterprise-specific facts, such as sales, shipments, hospital admissions, surgeries, and so on.
The concept of dimension gave life to the broadly used metaphor of cubes to represent multidimensional data. According to this metaphor, events are associated with cube cells and cube edges stand for analysis dimensions. If more than three dimensions exist, the cube is called a hypercube.
Dimensions of the cube are the equivalent of entities in a database. For example, the sales in a store chain can be represented in a three-dimensional space whose dimensions are products, stores, and dates. A data cube allows data to be modeled and viewed in multiple dimensions.

Dimension tables - A dimension is a structure, often composed of one or more hierarchies, that categorizes data. Dimensional attributes help to describe the dimensional value. They are normally descriptive, textual values. Several distinct dimensions, combined with facts, enable you to answer business questions. Commonly used dimensions are customers, products, and time.

Fact tables -A fact table typically has two types of columns: those that contain numeric facts (often called measurements), and those that are foreign keys to dimension tables. A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that contain aggregated facts are often called summary tables. A fact table usually contains facts with the same level of aggregation.
Requirements of Fact Tables
You must define a fact table for each star schema or snowflake schema. From a modeling standpoint, the primary key of the fact table is usually a composite key that is made up of all of its foreign keys.

Conceptual Modeling of Data Warehousing
In relational implementation, the dimensional designs are mapped to a relational set of tables such as dimensions & measures.

Star schema: It supports to de-normalization data concept. A star schema model can be depicted as a simple star: a central table contains fact data and multiple tables radiate out from it, connected by the primary and foreign keys of the database. 

In a star schema implementation, Warehouse Builder stores the dimension data in a single table or view for all the dimension levels.

Snowflake schema: It supports to normalization data concept. The snowflake schema represents a dimensional model which is also composed of a central fact table and a set of constituent dimension tables which are further normalized into sub-dimension tables. 

In a snowflake schema implementation, Warehouse Builder uses more than one table or view to store the dimension data. Separate database tables or views store data pertaining to each level in the dimension.

Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation.
Please visit to know more on -
  1. Collaboration of OLTP and OLAP systems
  2. Major differences between OLTP and OLAP
  3. Data Warehouse
  4. Data Warehouse - Multidimensional Cube



3 comments:

  1. A star schema is used as a basic implementation of an OLAP cube. If your fact table contains a 1 to many relationship to each of your dimensions in your data warehouse schema then it is appropriate to use a star schema. Although if your fact table has a many to many relationship with its dimensions (i.e. many rows in your fact equal many rows in your dimension) then you must resolve this using a snow flake schema where the bridge table contains a unique key to each row in the fact table.

    ReplyDelete
  2. Good article, what could be added is using the Data Vault methodology for the central storage ("Warehouse" layer). Data Vault has many benefits, which are described on my blog: http://www.hansmichiels.com/2016/04/02/data-vault-and-dimensional-modelling-a-happy-marriage-data-vault-series/

    ReplyDelete

Popular Posts