Saturday, June 25, 2016

SQL - Real Time Operational Analytics

This is very highly recommended feature of SQL Server 2016 to do the real-time operational analytics. In this feature, you have to ability to run both analytics operations and OLTP workloads on the same database tables at the same time. If you are doing real time analytical operation on the same database (with OLTP workloads) then you have the ability to eliminate the need for ETL and a data warehouse.

Draw back of the Old Operational Analytics Systems
As we know that business should need to have separate systems for operational (i.e. OLTP) and analytics workloads.  These systems are based on the ETLs operation to  move the data from the operational store (OLTP) to an analytics store (Warehouse or Data Marts) on the regular basis through some scheduled jobs. This solution covers following three key challenges:

Complexity in ETLs- It can be complex to identify which rows have been modified. If there are some changes in the source files then business should need to re-modify the existing ETL process which increases the big complexity to analyses the data.
Costly Hardware/Software’s - Implementing ETL requires the cost of purchasing additional hardware and software licenses. Apart from this, it always requires the manual manpower support also.
Data Latency- Implementing ETL adds a time delay for running the analytics because business also depends on the ETLs jobs which also depends on other sources. For example, if the ETL job runs at end of each business day, the analytics queries will run on data that is at least a day old
Solution - Real-time operational analytics
Real-time operational analytics will come in the picture as the best solution because it remove the time delay dependencies and provide a way to do real-time operational analytics and OLTP workloads run on the same underlying table.
By using this feature, the costs and complexity are greatly reduced by eliminating the need for ETL and the need to purchase and maintain a separate data warehouse.
Note: Real-time operational analytics does not replace the need for a separate data warehouse when business needs to integrate data from multiple sources before running the analytics workload.

Where does Real-time operational analytics work?
It targets the scenario of a single data source such as an ERP application on which business can run both the operational and the analytics workload.
As we stated that Real-time operational analytics is based on updateable ColumnStore index on a rowstore table. Actually, ColumnStore index maintains a copy of the data. In this case, OLTP and analytics workloads run against separate copies of the data and avoid the transaction blocks because once we create a non-clustered ColumnStore index on a table, we cannot directly modify the data in that table. A query with INSERT, UPDATE, DELETE, or MERGE will fail and return an error message.

Benifits: 
  1. Due to having two separate copies of the same data, minimizes the performance impact of both workloads running at the same time.
  2. SQL Server automatically maintains index changes so OLTP changes are always up-to-date for analytics. 
  3. By this awesome design, it is possible and practical to run analytics in real-time on up-to-date data. 
  4. The best thing is that business can use both disk-based and memory-optimized data tables.

A column-based non-clustered index geared toward increasing query performance for workloads that involve large amounts of data, typically found in data warehouse fact tables. To load data into a non-clustered ColumnStore index, first load data into a traditional rowstore groups as a heap or clustered index, then Column Segment and then create ColumnStore Indexes as given below:


How does Non-Clustered ColumnStore Index(NCCI) work?
To understand the NCCI on the operational data for orders, you can create them if orders are shipped and these orders will be in HOT stages in your OLTP workloads. To get the shipped status of any order you must need to use the filter indexes which were introduced in SQL Server 2005.
SQL Queries will load the data for real time analytics from the updateable stage as well as form HOT stage order also. In the OLTP workloads, data goes into cold stage after five days or in Updateable stage and manage it with the help of time dimension; we can load the real time data analytics with help of NCCI which will use the separate copies of the same data.

In SQL Server 2016, you will have the ability to create NCCI on the In-Memory Table also. In this way, you would have the both technologies such as In-memory table as well as NCCI on your In-memory tables also which will provide us real time data analytics functionality also. This is the uniqueness feature which is coming with SQL Server 2016.

The main thing is that to improve the performance, you need to define the limits where you could say that this data could not be available for NCCI for HOT/WARM (Predicate). The most important point in SQL Server 2016, NCCI and CCI will share the same code means same performances is applicable for the both indexes. 
Still working on.....
References: Microsoft

2 comments:

  1. Database usually is the biggest investment in the solution. If you want to make your system fast and reliable, try to use as much db features as possible. Coding in a database independent way is totally waste until the day you want to change database.

    ReplyDelete
  2. Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one.
    cyber security course training in Guwahati

    ReplyDelete