Microsoft Business Intelligence (Data Tools)|SQL - ColumnStore Index on In-Memory Tables

Thursday, June 30, 2016

SQL - ColumnStore Index on In-Memory Tables

Database usually is the biggest investment in the solution. If you want to make your system fast and reliable, try to use as much database features as possible. The in-memory features of Microsoft SQL Server are a unique combination of fully integrated tools that are currently running on thousands of production systems. These tools are based on In-memory Online Transactional Processing (OLTP) and in-memory ColumnStore is working as a performance booster.

As we understand that In-memory OLTP has full ACID support—it ensures Atomicity, Consistency, Isolation, and Durability of the data. In-memory OLTP is fully integrated into SQL Server which does not require separate installation and no need to learn different tools. In-memory OLTP is memory-optimized data structures which provide native compilation, creating more resourceful data access and querying proficiencies. Imaging thing is that In-memory features can extremely improve the performances of the transactional workloads.

Columnstore index on In-memory tables?

In-memory ColumnStore index is a column-based data storage technology for storing, logically organized as a table with rows and columns, works on column-based query processing and allows running analytic queries concurrently with data loads. They are updatable, memory-optimized, column-oriented indexes used in data warehousing scenarios as well as in operational analytics. They can be as clustered or non-clustered which are particularly useful on memory-optimized tables. In-memory ColumnStore Indexes can be used to achieve up to 100x query-performance gains over traditional row-oriented storage and significant (typically 10x) data compression for common data patterns.

In-memory Indexes: Every memory-optimized table must have at least one index. For durable memory-optimized tables, a unique index is required to uniquely identify a row when processing transaction log records change during Page 5 recovery.

They reside only in-memory and not stored in checkpoint files nor are any changes to the indexes logged.

They maintained automatically during all modification operations on memory-optimized tables, just like B-tree indexes on disk-based tables, Note: If SQL Server restarts, the indexes on the memory-optimized tables are rebuilt as the data are streamed into memory

Due to having columnar storage features, only the required columns are read into memory, reducing I/O even further, unlike row-based storage format where all columns get loaded into memory as part of the rows.

The rows in ColumnStore indexes are generally grouped in a set of 1 million rows to achieve optimal data compression. This grouping of rows is referred to as a rowgroup. Within a rowgroup, the multiple values for each column are compressed and stored as LOBs. These LOB units are referred to as segments. The column segments are the unit of transfer between disk and memory.

Batch-mode query processing

Batch-mode query processing is basically a vector-based query execution mechanism, which is tightly integrated with the ColumnStore index. Queries that target a Columnstore index can use batch-mode to process up to 900 rows together, which enables efficient query execution, providing 3-4x in query performance improvement. In SQL Server, batch-mode processing is optimized for ColumnStore indexes to take full advantage of their structure and in-memory capabilities.

Materialized views with Columnstore

Materialized views have a very peculiar use case. If a user can create a materialized view on an OLTP table and store that materialized view in a ColumnStore format, this could result in almost cube-like functionality. There would be pre-aggregated data, which are always kept updated, like OLTP. This is quite expensive to maintain, however, and although it has value, it will impede OLTP and data-load performance.

References: Microsoft

Mukesh Singh

With over 17 years of experience in the Data Engineering stack across a variety of cloud and on-premises systems, I have successfully delivered more than ten complete business product solutions. My expertise lies in building robust infrastructure and architecture to support data engineering, data analytics, and machine learning processes. These solutions have significantly improved collaboration among cross-functional teams, including data scientists, business analysts, software engineers, and stakeholders. Key Contributions Data Modelling and Integration • Data Modeling: Developed various data models to produce suitable data for business users, data analytics, data science, and data visualization teams. • Legacy Systems and Cloud Technologies: Integrated legacy systems with modern cloud-based technologies (AWS, Azure, GCP), data lakes, and data warehouses. • Streamlined Data Pipelines: Built efficient data pipelines, data warehouses, BI reports, and dashboards to streamline data access and insights.

Microsoft Business Intelligence (Data Tools)

Thursday, June 30, 2016

SQL - ColumnStore Index on In-Memory Tables

No comments:

Post a Comment

Popular Posts