Wednesday, July 8, 2015

SQL - ColumnStore Index

ColumnStore Index
A columnstore index is a technology for storing, retrieving and managing data by using a columnar data format, called a columnstore.
This feature has been introduced with SQL Server 2012 which intends to significantly speed-up the processing time of common data warehousing queries. The main objectives of columnstore indexes is appropriate for typical data warehousing data sets and improve the performance of the query whenever data is pulled from the huge datasets.
They are column based indexes which are capable to transform the data warehousing experience for users by enabling faster performance for common data warehousing queries such as filtering, aggregating, grouping and star-join queries. They store the data column-wise instead of row-wise, as indexes currently do.
Columnstore: A columnstore is data that is logically organized as a table with rows and columns, and physically stored in a column-wise data format.
Rowstore: A rowstore is data that is logically organized as a table with rows and columns, and then physically stored in a row-wise data format. This has been the traditional way to store relational table data. In SQL Server, rowstore is the underlying data storage format for a heap, a clustered index, and an in-memory table.
The data of a table is stored in separate pages for each of the columns due to the column based indexes. Basically, columnstore indexes are memory-optimized column-oriented indexes.
Columnstore indexes are easily categories into clustered columnstore indexes and nonclustered columnstore indexes. In the both cases, data always stores in columnar format which means data of each individual column from each rows is stored together on same page or we can say store with each column of data separated into its own segment. These segments are organized into a row group, which can contain over one million rows.
SQL Server 2014 adds support for updateable clustered columnstore indexes and clustered columnstore index can be used in place of a traditional rowstore clustered index. They permit data modifications and bulk load operations.
A new query execution mechanism called batch-mode execution has been added to SQL Server 2014 that reduces CPU usage by a large amount. Batch-mode execution is closely integrated with, and optimized around, the columnstore storage format. Batch-mode execution is sometimes known as vector-based or vectorized execution.
Columnstore indexes can achieve higher compression rates than traditional indexes because when saving index on the disk, SQL Server compresses the data at the segment level where data is the most homogeneous. 
SQL Server also lets you reorganize a clustered columnstore index when necessary and does not physically store columns in a sorted order. Instead, it stores data to improve compression and performance.
In SQL Server, a nonclustered columnstore index does not physically store columns in a sorted order. Instead, it stores data to improve compression and performance. Pre-sorting the data before creating the columnstore index is not required, but can improve columnstore compression.
To provide the real time analytics in SQL Server 2016, you can enable Non-Clustered ColumnStore Index on your operational data table and no need to do any changes on your operation application. By doing this, you can get the real time analytics and avoid all the ETL’s stuff on your database.
As we know that Non-Clustered Column Index is read only in SQL Server 2012. But they are available in SQL Server 2016 and if you can create directly NCCI on your operation database that means there are no transactions will happen. To avoid these things, SQL Server is introducing them as Updateable NCCI or CCI on your operational tables.
To understand the NCCI on the operational data for orders, you can create them if orders are shipped and these orders will be in HOT stages in your OLTP workloads. To get the shipped status of any order you must need to use the filter indexes which were introduced in SQL Server 2005.
SQL Queries will load the data for real time analytics from the updateable stage as well as form HOT stage order also. In the OLTP workloads, data goes into cold stage after five days or in Updateable stage and manage it with the help of time dimension; we can load the real time data analytics with help of NCCI.
SQL Server 2014 had a feature called In-memory table. In SQL Server 2016, you will have the ability to create NCCI (Non-Cluster Columnstore Index) on the In-Memory Table also. In this way, you would have the both technologies such as In-memory table as well as NCCI on your In-memory tables also which will provide us real time data analytics functionality also. This is the uniqueness feature which is coming with SQL Server 2016.
The main thing is that to improve the performance, you need to define the limits where you could say that this data could not be available for NCCI for HOT/WARM (Predicate). The most important point in SQL Server 2016, NCCI and CCI will share the same code means same performances is applicable for the both indexes. 
ColumnStore Indexes Improves the Performance of the Query
  1. Performance advantages in columnstore indexes are possible by leveraging the VertiPaq compression technology, which enables large amounts of data to be compressed in-memory. Batch-mode processing is optimized for columnstore indexes to take full advantage of their structure and in-memory capabilities.
  2. Clustered columnstore indexes are especially well-suited to large fact tables in a data warehouse due to  clustered columnstore index is updateable, which can be a big advantage to improve the performance of the query.
  3. They are best for data warehouse workloads consisting primarily of read-only queries that analyze large sets of aggregated data. 
Drawback of ColumnStore Indexes
  1. Every time of data load into the system, nonclustered indexes need to be dropped and re-created and clustered index updates may result in poorer query performance.
  2. A column store cannot be combined with PAGE or ROW compression and Change Data Capture. They cannot be used as filtered index as well as cannot include computed or sparse columns.
  3. Certain data types that are not supported and hence any column of that type cannot be part of a column store index.
Type of Indexes in SQL

1 comment:

  1. My friends always suggest me this for getting the best data scientist, because they are providing the variety of data scientist which are made for solving the data problems which having different categories and level. You can choose the data scientist form here according to your working needs and wants.