Tuesday, July 7, 2015

SQL - Clustered and Non-Clustered Index

In SQL Server, indexes are special lookup tables which are used by database search engine to speed up data retrieval. However, knowing the right types of indexes as well as finding the right balance of indexes takes strategy. Indexes are the main key factors to improve the query performances dynamically but they also carry some draw backs also.
Clustered Index
A clustered index stores the data for the table based on the columns defined in the create index statement.  As such, only one clustered index can be defined for the table because the data can only be stored and sorted one way per table.
A clustered index is built on the physical leaf pages of a table, so a table with a clustered index will always show a clustered index scan rather than a table scan in an execution plan - even if the two have the same query cost.
Interesting things is that a clustered index always covers a query, since it contains all of the data in a table. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.
A clustered index is the most important index you can apply to a table. If the database engine can use a clustered index during a query, the database does not need to follow references back to the rest of\ the data.
Syntax for Clustered Index

USE tempdb;
GO
-- Create a new table with three columns.
CREATE TABLE Employee
    (EmpId int NOT NULL,
     Name varchar(30) NULL,
     Address varchar(50) NULL);
GO

-- Create a clustered index called Ind_Employee_EmpId
-- on the dbo.Employee table using the EmpId column.
CREATE CLUSTERED INDEX Ind_Employee_EmpId
    ON dbo.Employee (EmpId);
GO

---- How to see the indexes in the table
EXEC sp_helpindex Employee

---Insert value into the table
Insert into Employee Values (1, 'Ryan Arjun', 'India')

---- Check table size into the database
EXEC  sp_spaceused Employee

---- drop employee table
drop table Employee

Limitations and Restrictions
  1. If we update a record and change the value of an indexed column in a clustered index, the database might need to move the entire row into a new position to keep the rows in sorted order. This behavior essentially turns an update query into a DELETE followed by an INSERT, with an obvious decrease in performance. 
  2. A table's clustered index can often be found on the primary key or a foreign key column, because key values generally do not change once a record is inserted into the database.
  3.  The index key of a clustered index cannot contain varchar columns that have existing data in the ROW_OVERFLOW_DATA allocation unit. 
  4. If a clustered index is created on a varchar column and the existing data is in the IN_ROW_DATA allocation unit, subsequent insert or update actions on the column that would push the data off-row will fail.
Nonclustered Index
Any indexes you will want to define in addition to the clustered index will be nonclustered indexes. A table can have multiple non-clustered indexes because they don’t affect the order in which the rows are stored on disk like clustered indexes.
Nonclustered indexes have a structure separate from the data rows. A nonclustered index contains the nonclustered index key values and each key value entry has a pointer to the data row that contains the key value.
The pointer from an index row in a nonclustered index to a data row is called a row locator. Nonclustered index is also a binary tree but it doesn't create a physical order of rows. So the leave nodes of nonclustered index contain PK (if it exists) or row index. 
They are just used for fast retrieval of data, not sure to have unique data.
Syntax for Nonclustered Index
USE tempdb;
GO
-- Create a new table with three columns.
CREATE TABLE Employee
    (EmpId int NOT NULL,
     Name varchar(30) NULL,
     Address varchar(50) NULL);
GO

-- Create a nonclustered index called Ind_Employee_EmpId
-- on the dbo.Employee table using the name column.
CREATE NONCLUSTERED INDEX Ind_Employee_EmpName
    ON dbo.Employee (Name);
GO

---- How to see the indexes in the table
EXEC sp_helpindex Employee

---Insert value into the table
Insert into Employee Values (1, 'Ryan Arjun', 'India')

---- alter non-clustered index
ALTER INDEX Ind_Employee_EmpName ON Employee
REORGANIZE;

---- Check table size into the database
EXEC  sp_spaceused Employee

---- drop employee table
drop table Employee

Differences between clustered and non-clustered indexes
The followings are the basic differences:
  1. A clustered index determines the order in which the rows of the table will be stored on disk and it actually stores row level data in the leaf nodes of the index itself. A nonclustered index has no effect on which the order of the rows will be stored.
  2. Using a clustered index is an advantage when groups of data that can be clustered are frequently accessed by some queries. Non-clustered indexes own storage separate from data storage in the table.
  3. A clustered index can be a disadvantage because any time a change is made to a value of an indexed column, the subsequent possibility of re-sorting rows to maintain order is a definite performance hit.
  4. A table can have multiple non-clustered indexes. But, a table can have only one clustered index.
  5. Non clustered indexes store both a value and a pointer to the actual row that holds that value. Clustered indexes don’t need to store a pointer to the actual row because of the fact that the rows in the table are stored on disk in the same exact order as the clustered index – and the clustered index actually stores the row-level data in its leaf nodes.
Type of Indexes in SQL
The following types of indexes are available in SQL Server-

2 comments:

  1. Does this mean clustered index decreases performance?

    ReplyDelete
  2. No, a clustered index has a strict row order like any other B-tree index: it sorts the rows according to the index definition.Clustered indexes have the advantage that they support very fast range scans and they can deliver subsequent rows quickly when accessed directly.

    ReplyDelete