Friday, July 22, 2016

Data Modeling for a Data Warehouse

A data warehouse is the biggest investment in the solution which implements the process to access heterogeneous data sources; clean, filter, and transform the data; and store the data in a structure that is easy to access, understand, and use. Data warehouse provides the base for the powerful data analysis techniques such as data mining and multidimensional analysis, as well as the more traditional query and reporting.  Data analysis programs are often required to scan vast amounts of that data, which could result in a negative impact on operational applications, which are more performance sensitive.

How will you structure the data in your data warehouse?
Data warehouse modeling is a process that produces abstract data models for one or more database components of the data warehouse. It is one part of the overall data warehouse development process, which is comprised of other major processes such as data warehouse architecture, design, and construction. 

A data warehouse modeling process to consist of all tasks related to requirements gathering, analysis, validation, and modeling. The need to model data warehouse databases in a way that differs from modeling operational databases has been promoted as dimensional data modeling, or fact/dimension modeling.
There are many types of data warehouse applications for which traditional Entity-Relationship (ER) modeling is not well suited, especially those that address the needs of a well-identified community of data analysts interested primarily in analyzing their business measures in their business context.
Why Data Modeling Is Important?

To receive benefits faster, the implementation approach of choice has become bottom up with data marts. Implementing in these small increments of small scope provides a larger return-on-investment in a short amount of time. There are three basic reasons to implement the data models such as –
  1. Visualization of the business world
  2. The essence of the data warehouse architecture
  3. Different approaches of data modeling
Actually, data is simply a record of all business activities, resources, and results of the organization and a well- defined data model is a well-organized abstraction of that data.
Implementing data marts does not preclude the implementation of a global data warehouse.
Dimensional modeling gives us an improved capability to visualize the very abstract questions that the business end users are required to answer. Utilizing dimensional modeling, end users can easily understand and navigate the data structure and fully exploit the data.

What are Data Modeling Techniques?
Two most common data modeling techniques that are relevant in a data warehousing environment are ER modeling and dimensional modeling.
  1. Entity-Relationship (ER) modeling produces a data model of the specific area of interest, using two basic concepts: entities and the relationships between those entities. Detailed ER models also contain attributes, which can be properties of either the entities or the relationships. The ER model is an abstraction tool because it can be used to understand and simplify the ambiguous data relationships in the business world and complex systems environments.
  2. Dimensional modeling uses three basic concepts: measures, facts, and dimensions. Dimensional modeling is powerful in representing the requirements of the business user in the context of database tables.

Visualization of a Dimensional Model
The most popular way of visualizing a dimensional model is to draw a cube. We can represent a three-dimensional model using a cube. Usually a dimensional model consists of more than three dimensions and is referred to as a hyper-cube. However, a hyper-cube is difficult to visualize, so a cube is the more commonly used term.


ER Modeling and Dimensional Modeling

The two techniques for data modeling in a data warehouse environment sometimes look very different from each other, but they have many similarities such as -

  1. Dimensional modeling can use the same notation, such as entity, relationship, attribute, and primary key.
  2. Actually, a fact table is just an entity in which the primary key is a combination of foreign keys, and the foreign keys reference the dimensions. Therefore, we could say that dimensional modeling is a special form of ER modeling.
  3. An ER model provides the structure and content definition of the informational needs of the corporation, which is the base for designing the data warehouse.

Conclusion

Data Modeling in the warehouse data is the process of translating requirements into a picture along with the supporting metadata that represents those requirements.   A data model cannot truly be considered complete until the remainder of the metadata is identified and documented during the design phase. The purpose of a data model is to represent a set of requirements for data in a clear and concise manner. 

No comments:

Post a Comment