Microsoft Business Intelligence (Data Tools)|Managing Memory for In-Memory OLTP

Monday, August 8, 2016

Managing Memory for In-Memory OLTP

The introduction of SQL Server-In Memory allows customers to exploit the performance of memory within SQL Server platforms, which extends customers data in the leadership position as the best performing data warehouse technology at the most competitive price.

SQL Server In-Memory technology is built into the data warehouse and customers don't have to buy a separate appliance. It is unnecessary and impractical to keep all data in memory, because all data do not have the same value to justify being placed in expensive memory.

Managing Memory for In-Memory OLTP - How does it Work?

We live in an era of data superabundance. To harness the power of all that information, we need to analyze more data much faster. All data is stored in a database and each SQL query processed by a database management system. So, it would be great to know the memory capacity to hold your requested data because SQL Server needs enough memory to hold the data in memory-optimized tables and indexes, as well as additional memory to support the online workload.

The size of a memory-optimized table corresponds to the size of data plus some overhead for row headers. When migrating a disk-based table to memory-optimized, the size of the memory-optimized table will roughly correspond to the size of the clustered index or heap of the original disk-based table.
Indexes on memory-optimized tables tend to be smaller than nonclustered indexes on disk-based tables. The size of nonclustered indexes is in the order of [primary key size] * [row count]. The size of hash indexes is [bucket count] * 8 bytes.
How much memory is needed in practice depends on the workload, but to be safe the recommendation is to start with two times the expected size of memory-optimized tables and indexes, and observe what are the memory requirements in practice.

There are some basic concept of data component which should be use within the system memory

Memory for the table - A memory-optimized table row is comprised of three parts:

Timestamps - Row header/timestamps = 24 bytes.

Index pointers - For each hash index in the table, each row has an 8-byte address pointer to the next row in the index. Since there are 4 indexes, each row will allocate 32 bytes for index pointers (an 8 byte pointer for each index).

Data - The size of the data portion of the row is determined by summing the type size for each data column. In our table we have five 4-byte integers, three 50-byte character columns, and one 30-byte character column. Therefore the data portion of each row is 4 + 4 + 4 + 4 + 4 + 50 + 50 + 30 + 50 or 200 bytes.

Memory for indexes - Memory for each hash index

Each hash index is a hash array of 8-byte address pointers. The size of the array is best determined by the number of unique index values for that index. A hash array that is too small slows performance since there will be too many collisions by index values that hash to the same index.

Memory for row versioning

To avoid locks, In-Memory OLTP uses optimistic concurrency when updating or deleting rows. This means that when a row is updated, an additional version of the row is created. In addition, deletes are logical - the existing row is marked as deleted, but not removed immediately. The system keeps old row versions (including deleted rows) available until all transactions that could possibly use the version have finished execution.

Memory for table variables

Memory used for a table variable is released only when the table variable goes out of scope. Deleted rows, including rows deleted as part of an update, from a table variable are not subject to garbage collection. No memory is released until the table variable exits scope.

Memory for growth

The above calculations estimate your memory needs for the table as it currently exists. In addition to this memory, you need to estimate the growth of the table and provide sufficient memory to accommodate that growth.

References: Microsoft

Mukesh Singh

With over 17 years of experience in the Data Engineering stack across a variety of cloud and on-premises systems, I have successfully delivered more than ten complete business product solutions. My expertise lies in building robust infrastructure and architecture to support data engineering, data analytics, and machine learning processes. These solutions have significantly improved collaboration among cross-functional teams, including data scientists, business analysts, software engineers, and stakeholders. Key Contributions Data Modelling and Integration • Data Modeling: Developed various data models to produce suitable data for business users, data analytics, data science, and data visualization teams. • Legacy Systems and Cloud Technologies: Integrated legacy systems with modern cloud-based technologies (AWS, Azure, GCP), data lakes, and data warehouses. • Streamlined Data Pipelines: Built efficient data pipelines, data warehouses, BI reports, and dashboards to streamline data access and insights.

Microsoft Business Intelligence (Data Tools)

Monday, August 8, 2016

Managing Memory for In-Memory OLTP

No comments:

Post a Comment

Popular Posts