Data is the business asset for every
organisation which is audited and protected. Now days, every organisation want
to create their own data warehouse to store their business data in a perfect
manner to utilise for decision support.
These data warehouse contain massive amounts of
highly detailed, time-series data used for decision support. Data warehouses
often contain terabytes of data that can be readily queried by end users. To
gain in their business, it is become very urgent for every organization to face
a range of choices where it requires an understanding of the two main data
warehousing models— Inmon’s and Kimball’s, both in terms software tools and
development approaches. 
We know that ETL is essential to the achievability of the data warehouse in
that it challenges to ensure data integrity within the data warehouse.
Inmon vs. Kimball Data
Models
To understand, how these two models are similar and how they
differ gives us a foundational knowledge of the most basic data warehouse
concepts.
At a Glance 
Bill
  Inmon advocates a top-down development approach that adapts
  traditional relational database tools to the development needs of an enterprise
  wide data warehouse where as  
Ralph
  Kimball recommends a bottom-up approach that uses dimensional modeling,
  a data modeling approach unique to data warehousing. Rather than building a
  single enterprise-wide database, Kimball suggests creating one database (or
  data mart) per major business process. 
 | 
 
We know that most of the data in the data warehouse is extracts from
operational databases, then summarises, reconciles, and manipulates by the help
of ETLs tools. Then the data is ready to be stored in carefully designed
relational database tables in the data warehouse. In the data warehousing move
toward, information is requested, processed, and merged continuously, so the
information is readily available for direct querying OLAP and analysis at the
warehouse. 
For any organisation, understanding the basics of the
architecture and methodology of both models provides a good foundational
knowledge of data warehousing. It’s very important factor in term of the
project cost and maintenance prospective where programmer/developer/data
scientist can build situation-specific knowledge that is appropriate to their
organization’s needs. 
Bill
  Inmon Model 
 | 
  
Ralph
  Kimball Model 
 | 
 
It consists of all information systems and their databases
  throughout a given organization. It is inspire from the relational data model
  (OLTP) and follow the same scenarios to develop relational architecture for
  the data warehouse (3NF).  
 | 
  
Kimball model is based on a data modeling method (Dimensional
  Data Modeling) unique to the data warehouse. His architecture is also known
  as Data Warehouse Bus. Dimensional modeling focuses on ease of end user
  accessibility and provides a high level of performance to the data warehouse. 
 | 
 
It divides the overall database environment of the
  organization into four levels which are given below:  
Operational  
Atomic data warehouse 
Departmental  
Individual 
 | 
  
It involves a bottom-up approach, which in the case of data
  warehouses means to build one data mart at a time. The four steps of the
  dimensional design process are:  
Select the business process 
Declare the grain 
Choose the dimensions  
Identify the facts 
 | 
 
From the operational systems, data is extensively manipulated
  and then moved to the atomic data warehouse with the help of various ETLs
  tools. 
 | 
  
Data is copied from operational source systems to a staging
  area. In the staging area, the data is scrubbed and suitable for end-user
  queries. From the staging area, data is loaded into data marts. The data
  marts are the source of data for user queries. 
 | 
 
Inmon suggests using enterprise-wide data models if possible
  to save development time. It proposes three levels of data modeling to support
  a spiral methodology, in that all user views are consistent with the
  corporate model. 
ERD Level (Entity Relationship
  Diagrams) is used to explore and refine entities, their attributes, and the
  relationships among the entities. 
The second (Mid-Level) data model, establishes
  the DIS (Data Item set) for each department. 
An ERD created in the first-level data model is the basis for
  a DIS in the second-level data model. 
The final physical
  model is created from the mid-level data model merely by extending the
  mid-level data model to include keys and physical characteristics of the
  model. 
 | 
  
A database engine can make very strong assumptions about first
  constraining the heavily indexed dimension tables, and then attacking the
  fact table all at once with the Cartesian product of the dimension table keys
  satisfying the user’s constraints. 
Dimensional modeling begins with tables rather than
  entity-attribute data models such as ERDs. These tables are either fact
  tables or dimension tables.  
Fact tables contain metrics, while dimension tables contain
  attributes of the metrics in the fact tables. Dimension tables routinely
  contain repeating groups; this violates normalization rules. In contrast, the
  dimension tables are likely to have only hundreds or thousands of rows, and
  be only megabytes in total size. These tables contain all the attributes of
  the data in the fact table in highly denormalized forms. 
 | 
 
It explains various techniques for optimizing the performance
  of the data warehouse at both the atomic and departmental levels. 
 | 
  
Dimensional modeling is a data modeling approach that
  capitalizes on the unique requirements of the data warehouse.  
 | 
 
Once the three-level data model is complete, the data
  warehouse development has started. 
 | 
  
Keeping fact tables to a small number of rows and allowing
  dimension tables to be highly denormalized are both essential. The resulting
  data mart is highly accessible to the end user and provides reasonable query
  response times. 
 | 
 
Granularity is a measure of the detail of the data. If the
  volume of data is massive, then the development team needs to consider
  multiple levels of granularity for the data. 
 | 
  
Declaring the grain is the process of deciding what level of
  detail the data warehouse will contain. The lowest level of granularity is
  called atomic, means it cannot be further sub-divided.  
 | 
 
When there is enough information to do so, the development
  team conducts a technical assessment to ensure that the data in the warehouse
  will be accessible and well managed. 
 | 
  
The data model bus stated that all data marts must use
  standardized conformed dimensions. The basic requirements of conformed
  dimensions are that keys, column names, attribute definitions, and attribute
  values are consistent across business processes. 
 | 
 
One of the salient aspects of a data-driven methodology is that
  it builds on previous efforts and utilizing both code and processes that have
  already been developed. 
 | 
  
Each data mart is based on a single business process. The data
  warehouse bus allows the sum of the data marts to truly be an integrated
  whole—a data warehouse. 
 | 
 
In banking, the Customer table generates a primary grouping of
  data such as account (primary grouping). An Account may have several
  manifestations, such as loan, savings, or trust (secondary grouping). Relations
  show that one customer may have several different accounts. 
 | 
  
In the retail example, Kimball chooses to include some
  computed values as well as truly atomic values, making queries easy for the
  end user and providing acceptable data warehouse performance. 
 | 
 
The most prominent
similarities between Inmon’s and Kimball’s models are the use of time-stamped
data, and the extract, transform, and load (ETL) process. Although the
execution of these two elements differs between the two models, the data
attributes and query results are very similar.
Conclusion
Inmon’s evolutionary approach grows out of operational relational database technology and development methods. Inmon explains various techniques for optimizing the performance of the data warehouse at both the atomic and departmental levels. Kimball defines business processes quite broadly. To choose the first business process for the data warehouse project and declaring the grain is the process of deciding what level of detail the data warehouse will contain.



No comments:
Post a Comment