We know that data warehousing is a collection of methods,
techniques and tools which is used to support knowledge workers such as senior
managers, directors, managers, and business analysts to conduct data analyses
that help with performing decision-making processes and improving information
resources.
Fundamentally, data warehouse is the secured place for the historical
data where data is never deleted and data updates are normally carried out when
data warehouses are offline.
Data Warehouse (with a Staging Area and Data
Marts) - Three-Layer Architecture
As per the enterprises/business requirements, you can easily
customize your data warehouse architecture according to the different business
group. For example, if you want to create a data warehouse for an ERP system (where
purchasing, sales, and inventories are separated) then you can do this by
adding different data marts, which are systems designed for a particular line (business
modules) of business.
The
main advantages of the three-layer architecture is that –
- It creates a common reference data model for a whole enterprise.
- At the same time, it sharply separates the problems of source data extraction and integration from those of data warehouse population.
- Identically, in some cases, the staging layer is also directly used to better accomplish some operational tasks, such as producing daily reports that cannot be satisfactorily prepared using the corporate applications, or generating data flows to feed external processes periodically so as to benefit from cleaning and integration.
- This approach assumes that although a data warehouse is available, it is unable to solve all the queries formulated.
- This means that users may be interested in directly accessing source data from aggregate data (drill-through).
- Data Marts are dependent on the data warehouse and metadata but they are independent to provide information for a particular line of business.
Multidimensional databases have
generated much research and market interest because they are fundamental for
many decision-making support applications, such as data warehouse systems. Before
build a data warehouse, you need to have defined the business requirements and also
agreed upon the scope of your application, and created a conceptual design
where you need to translate your requirements into a system deliverable. To do
so, you have to choose the ideal data warehouse architecture to fulfill logical
and physical requirement by considering the following things -
- The specific data content
- Relationships within and between groups of data
- The system environment supporting your data warehouse
- The data transformations required
- The frequency with which data is refreshed
The
multidimensional model is the base of any data warehouse and begins with the
observation that the factors affecting decision-making processes are
enterprise-specific facts, such as sales, shipments, hospital admissions,
surgeries, and so on.
The
concept of dimension gave life to the broadly used metaphor of cubes to
represent multidimensional data. According to this metaphor, events are
associated with cube cells and cube edges stand for analysis dimensions. If
more than three dimensions exist, the cube is called a hypercube.
Dimensions of the cube are the
equivalent of entities in a database. For example, the sales in a store chain
can be represented in a three-dimensional space whose dimensions are products,
stores, and dates. A data cube allows data to be modeled and viewed in multiple
dimensions.
Dimension tables - A dimension is a structure, often
composed of one or more hierarchies, that categorizes data. Dimensional
attributes help to describe the dimensional value. They are normally
descriptive, textual values. Several distinct dimensions, combined with facts,
enable you to answer business questions. Commonly used dimensions are customers,
products, and time.
Fact tables -A fact table typically has two types
of columns: those that contain numeric facts (often called measurements), and
those that are foreign keys to dimension tables. A fact table contains either
detail-level facts or facts that have been aggregated. Fact tables that contain
aggregated facts are often called summary tables. A fact table usually contains
facts with the same level of aggregation.
Requirements
of Fact Tables
You must define a fact table for each star schema or snowflake
schema. From a modeling standpoint, the primary key of the fact table is
usually a composite key that is made up of all of its foreign keys.
Conceptual Modeling
of Data Warehousing
In relational implementation, the dimensional designs are
mapped to a relational set of tables such as dimensions & measures.
Star schema: It supports to de-normalization data
concept. A star schema model can be depicted as a simple star: a central table
contains fact data and multiple tables radiate out from it, connected by the
primary and foreign keys of the database.
In a star schema implementation,
Warehouse Builder stores the dimension data in a single table or view for all
the dimension levels.
Snowflake schema: It supports to normalization data
concept. The snowflake schema represents a dimensional model which is also
composed of a central fact table and a set of constituent dimension tables
which are further normalized into sub-dimension tables.
In a snowflake schema
implementation, Warehouse Builder uses more than one table or view to store the
dimension data. Separate database tables or views store data pertaining to each
level in the dimension.
Fact constellations: Multiple fact tables share dimension
tables, viewed as a collection of stars, therefore called galaxy schema or fact
constellation.
Please visit to know more on -
- Collaboration of OLTP and OLAP systems.
- Major differences between OLTP and OLAP.
- Data Warehouse
- Data Warehouse - Multidimensional Cube



