Wednesday, January 20, 2016

DW - Full load & Incremental or Refresh load

Data is the business asset which is used to making the right business decision on the right time. It is very important to design our database correctly, up to whatever normal form we can bear. 
While OLAP cubes are still common across the BI technology marketplace, data warehousing has definitely seen growth as it is serving consumers in more modern and complete ways. In the current business world, every organization is spending lot of money on the BI technologies to quick access of the valuable data because they believe that data visualisation is the key of the success in making the right decisions at the right time.

So, it is become very important urgent or sometimes treated as do and die situation and need to apply very highly recommended approaches which should be based on the real cases.  We are trying to highlight the importance of using an incremental load and change data capture technique for loading data to the dashboards for the end users.
ETL tools such as SQL Server Warehouse Builder have the ability to set table loading to insert/update that will support both full load and incremental load with the use of the same ETL routines. The key to supporting this,  staging tables are created in the staging layer to join to source tables based upon update or create dates of the record.
What is Incremental Load?
As we know that Data is very important factor for an organisation for making the right business decision on the right time. So, Incremental load is the best option and it is an widely used method to load data in data warehouses from the respective source systems. This technique is employed to perform faster load in less time utilizing less system resources. To understand the incremental load functionality, we can take an example for the daily product hierarchy sales chain where we can put the staging tables between destination and source.
In this scenario, staging tables always refreshed in the beginning of the ETL process and then after load from the source files after proper data transformation. In the last section of the ETL process, targeted tables are filled from the staging tables based on the following conditions -

  1. If data is matched with the target table then matching data will be updated.
  2. If data is not matched with the target tables then data will be treated as new and append into the target tables.

  
Incremental data loading is the most common approach to capture ongoing changes and keeping historical data on a predefined schedule.  In the incremental approach, the key to supporting this is a created table in the staging layer to join to source tables based upon update or create dates of the record.

What is Full Load?
In this technique, there is no staging area and always truncate the target tables before load all the data from the source raw data files. This technique is very time consuming and easily lost the information but capable to provide the guaranteed information. Full data loading also known as the fully refresh data loading also.

To know more, click on 
  1. Data Warehouse Architecture and Multidimensional Model
  2. Data Warehouse - Dimension tables.
  3. Data Warehouse - Fact tables.
  4. Data Warehouse - Conceptual Modeling.
  5. Data Warehouse - Star schema.
  6. Data Warehouse - Snowflake schema.
  7. Data Warehouse - Fact constellations
  8. Collaboration of OLTP and OLAP systems
  9. Major differences between OLTP and OLAP
  10. Data Warehouse - Multidimensional Cube 

3 comments:

  1. well... incremental loads is not equal to staging. I create ETLs which incremental loads my DW by pushing data directly without having to maintain a staging area.
    When there is no needs to make the stuff more complicated, keep it simple!
    Loading a staging = more IO, more tables to maintain, more ETLs to maintain... so higher cost.
    having ETLs without staging offers near real time option.
    sometimes there is no other option, a staging area is required, but for me the project must require it and the added value must be higher than the cost.
    Also, even when we talk about incremental, we must optimize the code for the full load process. specially at the beginning of a project when the users changes their mind or we identify issues and we have to fully reloads the DW.

    ReplyDelete
  2. I agree with the comment, each project will work different and needs to cater for a specific need but it is good to follow some best practice design principles. When designing the packages it is always a good idea do have a Full Load and Incremental load package which will make reloads much easier when there is design changes etc..

    ReplyDelete
  3. You always have to provide full load (or reload). Factors that influence the decision to do incremental load are how long a full load takes and how stable is the schema. Often the time spent to create and maintain an incremental ETL isn't worth the effort. Less frequently (in my experience) incremental is essential.

    ReplyDelete

Popular Posts