Microsoft Business Intelligence (Data Tools)|Tables and Index Data Structures Architecture

Wednesday, November 9, 2016

Tables and Index Data Structures Architecture

Tables are the main components to store our data in the database where as indexes are special lookup tables that the database search engine can use to speed up data retrieval. We know that tables and indexes are stored as a collection of 8-KB pages means a single table or indexed table could be dispersed into multiple pages and every page size is 8KB on the hard disk.

In this topic, we will describe the organized way for table and index pages. First of all, we are going to understand the table organisation. We know that a table may be distributed in one or more partitions where each partition contains data rows in either a heap or a clustered index structure. These pages for heap or a clustered index are managed in one or more allocation units, depending on the column types in the data rows.

Partitions

By nature, a table or index has only one partition that contains all the table or index pages. If data is increased dynamically in a table year by year then some user defined functions are responsible to distribute this data into one or more partitions. These partitions can be stored in one or more file groups in the database. In this case, these tables or indexes are treated as a single logical entity when queries or updates are performed on the data.

Clustered Tables, Heaps, and Indexes

Microsoft stated that SQL Server tables use one of two methods to organize their data pages within a partition. A heap is a table without a clustered index and clustered tables are tables that have a clustered index.

Indexed views have the same storage structure as clustered tables.

When a heap or a clustered table has multiple partitions, each partition has a heap or B-tree structure that contains the group of rows for that specific partition. For example, if a clustered table has four partitions, there are four B-trees; one in each partition.

Non-clustered Indexes

For Non-clustered indexes, they have also B-tree index structure which is similar to clustered indexes structure. Non-clustered indexes do not disturb the order of the data rows. The leaf layer of a non-clustered index is made up of index pages instead of data pages and each index row contains the non-clustered key value, a row locator and any included, or non-key, columns. The locator points to the data row that has the key value.

Allocation Units

An allocation unit is nothing more than a collection of pages within a heap or B-tree which is used to manage data based on their page type. There are three types of allocation units for a heap or B-tree which are given below:

IN_ROW_DATA : These are the pages which contain all type of data except large object data.
LOB_DATA: These pages contain large object data and data types are text, ntext, image, xml, varchar(max), nvarchar(max), varbinary(max), or CLR user-defined types (CLR UDT).
ROW_OVERFLOW_DATA: Variable length data stored in varchar, nvarchar, varbinary, or sql_variant columns which exceed the 8,060 byte row size limit.

References: https://technet.microsoft.com/en-us/library/ms189051(v=sql.105).aspx

Mukesh Singh

With over 17 years of experience in the Data Engineering stack across a variety of cloud and on-premises systems, I have successfully delivered more than ten complete business product solutions. My expertise lies in building robust infrastructure and architecture to support data engineering, data analytics, and machine learning processes. These solutions have significantly improved collaboration among cross-functional teams, including data scientists, business analysts, software engineers, and stakeholders. Key Contributions Data Modelling and Integration • Data Modeling: Developed various data models to produce suitable data for business users, data analytics, data science, and data visualization teams. • Legacy Systems and Cloud Technologies: Integrated legacy systems with modern cloud-based technologies (AWS, Azure, GCP), data lakes, and data warehouses. • Streamlined Data Pipelines: Built efficient data pipelines, data warehouses, BI reports, and dashboards to streamline data access and insights.

Microsoft Business Intelligence (Data Tools)

Wednesday, November 9, 2016

Tables and Index Data Structures Architecture

No comments:

Post a Comment

Popular Posts