Microsoft Business Intelligence (Data Tools)|Clustered Index structure in SQL Server

Tuesday, February 14, 2017

Clustered Index structure in SQL Server

We know that both types of indexes are following B-Tree (Balanced-Tree) structures except ColumnStore indexes. In the index structure, the top node is known as Root node and the bottom node is known as Leaf node. In the clustered indexes, leaf nodes are the actual data pages. A table having clustered index, the data is physically stored on the data page in ascending order and the order of values of the index pages is also in ascending.

How does SQL Server maintain uniqueness of key values for clustered index?

Internally, SQL Server does maintain uniqueness of key values for a clustered index even if the column data is not unique. SQL Server does this automatically by generating a value that stored with the duplicate key value.

To understand this, we could take an alphabet example where all the data on the clustered page is unique as given below:

What will happen to clustered index page after adding another alphabet ‘N’?

In this case, SQL Server will generate an internal number with this key value that uses to maintain uniqueness among these duplicate keys as given below in yellow colored background on the clustered index page:

This feature enables SQL Server to differentiate between all instances of alphabet ‘N’.

How does SQL Server navigate a clustered index?

To understand this, we are taking the above example and we need to find out the alphabet ‘M’ from our data table where clustered index is built on the first name data field and we are using below script to find out the result as given below:

SELECT FirstName, LastName

FROM dbo.DataTable

WHERE FirstName='M'

Root node Navigation

SQL Server starts with the Root node of the clustered index and evaluated that ‘M’ is greater than or equal to the key value actor ‘A’ and the compression result is to true. Now, SQL server moves to next key value ‘M’ on the root node and do the same compression evaluation which is to true again. Now, SQL server moves to next key value ‘R’ on the root node and do the same compression evaluation which is false. In this case, SQL Server uses the previous key value ‘M’ on the root node and moves to intermediate node.

Intermediate node Navigation

After moving to intermediate node, SQL Server evaluated that ‘M’ is greater than or equal to the key value actor ‘M’ where the compression result is to true. Now, SQL server moves to next key value ‘N’ on the intermediate node and compression evaluation result is false. In this case, SQL Server uses the previous key value ‘M’ on the intermediate node and moves to Leaf node.

Leaf node Navigation

SQL Server reads through the data page until it finds ‘M’ and then it returns the row to the query process. The pages in the data chain and the rows in them are ordered on the value of the clustered index key. All inserts are made at the point where the key value in the inserted row fits in the ordering sequence among existing rows.

Conclusion

The clustered index is implemented as a B-tree index structure that supports fast retrieval of the rows, based on their clustered index key values. Depending on the data types in the clustered index, each clustered index structure will have one or more allocation units in which to store and manage the data for a specific partition.
Index structure in SQL Server

Nonclustered Index structure in SQL Server

How does Nonclustered index navigate on a Heap?

References: https://technet.microsoft.com/en-us/library/ms177443(v=sql.105).aspx

Mukesh Singh

With over 17 years of experience in the Data Engineering stack across a variety of cloud and on-premises systems, I have successfully delivered more than ten complete business product solutions. My expertise lies in building robust infrastructure and architecture to support data engineering, data analytics, and machine learning processes. These solutions have significantly improved collaboration among cross-functional teams, including data scientists, business analysts, software engineers, and stakeholders. Key Contributions Data Modelling and Integration • Data Modeling: Developed various data models to produce suitable data for business users, data analytics, data science, and data visualization teams. • Legacy Systems and Cloud Technologies: Integrated legacy systems with modern cloud-based technologies (AWS, Azure, GCP), data lakes, and data warehouses. • Streamlined Data Pipelines: Built efficient data pipelines, data warehouses, BI reports, and dashboards to streamline data access and insights.

Microsoft Business Intelligence (Data Tools)

Tuesday, February 14, 2017

Clustered Index structure in SQL Server

No comments:

Post a Comment

Popular Posts