Wednesday, September 28, 2016

Inmon vs Kimball Data Models Approaches

Data is the business asset for every organisation which is audited and protected. Now days, every organisation want to create their own data warehouse to store their business data in a perfect manner to utilise for decision support.
These data warehouse contain massive amounts of highly detailed, time-series data used for decision support. Data warehouses often contain terabytes of data that can be readily queried by end users. To gain in their business, it is become very urgent for every organization to face a range of choices where it requires an understanding of the two main data warehousing models— Inmon’s and Kimball’s, both in terms software tools and development approaches.


We know that ETL is essential to the achievability of the data warehouse in that it challenges to ensure data integrity within the data warehouse.
Inmon vs. Kimball Data Models
To understand, how these two models are similar and how they differ gives us a foundational knowledge of the most basic data warehouse concepts.
At a Glance

Bill Inmon advocates a top-down development approach that adapts traditional relational database tools to the development needs of an enterprise wide data warehouse where as



Ralph Kimball recommends a bottom-up approach that uses dimensional modeling, a data modeling approach unique to data warehousing. Rather than building a single enterprise-wide database, Kimball suggests creating one database (or data mart) per major business process.




We know that most of the data in the data warehouse is extracts from operational databases, then summarises, reconciles, and manipulates by the help of ETLs tools. Then the data is ready to be stored in carefully designed relational database tables in the data warehouse. In the data warehousing move toward, information is requested, processed, and merged continuously, so the information is readily available for direct querying OLAP and analysis at the warehouse. 

For any organisation, understanding the basics of the architecture and methodology of both models provides a good foundational knowledge of data warehousing. It’s very important factor in term of the project cost and maintenance prospective where programmer/developer/data scientist can build situation-specific knowledge that is appropriate to their organization’s needs.


Bill Inmon Model
Ralph Kimball Model
It consists of all information systems and their databases throughout a given organization. It is inspire from the relational data model (OLTP) and follow the same scenarios to develop relational architecture for the data warehouse (3NF).


Kimball model is based on a data modeling method (Dimensional Data Modeling) unique to the data warehouse. His architecture is also known as Data Warehouse Bus. Dimensional modeling focuses on ease of end user accessibility and provides a high level of performance to the data warehouse.
It divides the overall database environment of the organization into four levels which are given below:
Operational
Atomic data warehouse
Departmental
Individual
It involves a bottom-up approach, which in the case of data warehouses means to build one data mart at a time. The four steps of the dimensional design process are:
Select the business process
Declare the grain
Choose the dimensions
Identify the facts
From the operational systems, data is extensively manipulated and then moved to the atomic data warehouse with the help of various ETLs tools.
Data is copied from operational source systems to a staging area. In the staging area, the data is scrubbed and suitable for end-user queries. From the staging area, data is loaded into data marts. The data marts are the source of data for user queries.
Inmon suggests using enterprise-wide data models if possible to save development time. It proposes three levels of data modeling to support a spiral methodology, in that all user views are consistent with the corporate model.
ERD Level (Entity Relationship Diagrams) is used to explore and refine entities, their attributes, and the relationships among the entities.
The second (Mid-Level) data model, establishes the DIS (Data Item set) for each department.
An ERD created in the first-level data model is the basis for a DIS in the second-level data model.
The final physical model is created from the mid-level data model merely by extending the mid-level data model to include keys and physical characteristics of the model.
A database engine can make very strong assumptions about first constraining the heavily indexed dimension tables, and then attacking the fact table all at once with the Cartesian product of the dimension table keys satisfying the user’s constraints.
Dimensional modeling begins with tables rather than entity-attribute data models such as ERDs. These tables are either fact tables or dimension tables.
Fact tables contain metrics, while dimension tables contain attributes of the metrics in the fact tables. Dimension tables routinely contain repeating groups; this violates normalization rules. In contrast, the dimension tables are likely to have only hundreds or thousands of rows, and be only megabytes in total size. These tables contain all the attributes of the data in the fact table in highly denormalized forms.
It explains various techniques for optimizing the performance of the data warehouse at both the atomic and departmental levels.
Dimensional modeling is a data modeling approach that capitalizes on the unique requirements of the data warehouse.
Once the three-level data model is complete, the data warehouse development has started.
Keeping fact tables to a small number of rows and allowing dimension tables to be highly denormalized are both essential. The resulting data mart is highly accessible to the end user and provides reasonable query response times.
Granularity is a measure of the detail of the data. If the volume of data is massive, then the development team needs to consider multiple levels of granularity for the data.
Declaring the grain is the process of deciding what level of detail the data warehouse will contain. The lowest level of granularity is called atomic, means it cannot be further sub-divided.

When there is enough information to do so, the development team conducts a technical assessment to ensure that the data in the warehouse will be accessible and well managed.
The data model bus stated that all data marts must use standardized conformed dimensions. The basic requirements of conformed dimensions are that keys, column names, attribute definitions, and attribute values are consistent across business processes.
One of the salient aspects of a data-driven methodology is that it builds on previous efforts and utilizing both code and processes that have already been developed.
Each data mart is based on a single business process. The data warehouse bus allows the sum of the data marts to truly be an integrated whole—a data warehouse.
In banking, the Customer table generates a primary grouping of data such as account (primary grouping). An Account may have several manifestations, such as loan, savings, or trust (secondary grouping). Relations show that one customer may have several different accounts.
In the retail example, Kimball chooses to include some computed values as well as truly atomic values, making queries easy for the end user and providing acceptable data warehouse performance.


The most prominent similarities between Inmon’s and Kimball’s models are the use of time-stamped data, and the extract, transform, and load (ETL) process. Although the execution of these two elements differs between the two models, the data attributes and query results are very similar.

Conclusion
Inmon’s evolutionary approach grows out of operational relational database technology and development methods. Inmon explains various techniques for optimizing the performance of the data warehouse at both the atomic and departmental levels. Kimball defines business processes quite broadly. To choose the first business process for the data warehouse project and declaring the grain is the process of deciding what level of detail the data warehouse will contain.

No comments:

Post a Comment