Wednesday, November 9, 2016

Tables and Index Data Structures Architecture

Tables are the main components to store our data in the database where as indexes are special lookup tables that the database search engine can use to speed up data retrieval. We know that tables and indexes are stored as a collection of 8-KB pages means a single table or indexed table could be dispersed into multiple pages and every page size is 8KB on the hard disk.
In this topic, we will describe the organized way for table and index pages. First of all, we are going to understand the table organisation. We know that a table may be distributed in one or more partitions where each partition contains data rows in either a heap or a clustered index structure. These pages for heap or a clustered index are managed in one or more allocation units, depending on the column types in the data rows.



Partitions
By nature, a table or index has only one partition that contains all the table or index pages. If data is increased dynamically in a table year by year then some user defined functions are responsible to distribute this data into one or more partitions. These partitions can be stored in one or more file groups in the database. In this case, these tables or indexes are treated as a single logical entity when queries or updates are performed on the data.

Clustered Tables, Heaps, and Indexes
Microsoft stated that SQL Server tables use one of two methods to organize their data pages within a partition. A heap is a table without a clustered index and clustered tables are tables that have a clustered index. 
Indexed views have the same storage structure as clustered tables.
When a heap or a clustered table has multiple partitions, each partition has a heap or B-tree structure that contains the group of rows for that specific partition. For example, if a clustered table has four partitions, there are four B-trees; one in each partition.

Non-clustered Indexes
For Non-clustered indexes, they have also B-tree index structure which is similar to clustered indexes structure. Non-clustered indexes do not disturb the order of the data rows. The leaf layer of a non-clustered index is made up of index pages instead of data pages and each index row contains the non-clustered key value, a row locator and any included, or non-key, columns. The locator points to the data row that has the key value.

Allocation Units
An allocation unit is nothing more than a collection of pages within a heap or B-tree which is used to manage data based on their page type. There are three types of allocation units for a heap or B-tree which are given below:
  1. IN_ROW_DATA : These are the pages which contain all type of data except large object data.
  2. LOB_DATA: These pages contain large object data and data types are text, ntext, image, xml, varchar(max), nvarchar(max), varbinary(max), or CLR user-defined types (CLR UDT).
  3. ROW_OVERFLOW_DATA: Variable length data stored in varchar, nvarchar, varbinary, or sql_variant columns which exceed the 8,060 byte row size limit.


References: https://technet.microsoft.com/en-us/library/ms189051(v=sql.105).aspx

No comments:

Post a Comment