Wednesday, July 15, 2015

SQL - Partition Indexes

Table partitions are the most important feature of the SQL and provide a way to spread a single table over multiple partitions. In this way, every partition of the table can behaved like a separate file group. Partitioning can make large tables and indexes more manageable and scalable to improve the performance.
In the table partitioning process, the data is partitioned horizontally because partitioned groups of rows are mapped into individual partitions. To access the data smoothly, it is important to having all partitions of a single index or table must reside in the same database. In this way, these partitioned tables or indexes will be treated as a single logical entity when queries or updates are performed on the data.

Table Partitioning Process
As we know that table partitioning process goes like as
  • Create file groups and files
  • Create partitioning function
  • Partitioning scheme - (map intervals to appropriate file groups)
  • Recreate clustered index - this is the moment when table is physically moved to another files
It is very important to have data partitioning on the large tables or indexes because they can have the following manageability and performance benefits -
  • Partitioned data is very helpful to transfer or access subsets of data quickly and efficiently, while maintaining the integrity of a data collection.
  • Maintenance operations can apply on one or more partitions more quickly. These operations are more efficient because they target only these data subsets, instead of the whole table.
  • Table partitions are also very helpful to increase and improve query performance due to equi-join queries between two or more partitioned tables faster when the partitioning columns in the tables are the same.
  • When SQL Server performs data sorting for I/O operations, it sorts the data first by partition. Although SQL Server still sorts data by partition, it can access all the drives of each partition at the same time.
  • Table partitions can have enabled lock escalation at the partition level and reduce lock contention on the table to improve the query performances.
Components and Concepts
Before implement the components and concepts, I would like to introduced a Sales Margin Details table having the large data for year 2012, 2013, 2014 and 2015 and we will use the following terms which are applicable to table and index partitioning-
Partition function
A database object that defines how the rows of a table or index are mapped to a set of partitions based on the values of certain column, called a partitioning column.

/* Create a partition function. */

Create Partition Function
    [Fun_YearlySalesPartition] (datetime)

---- Where RIGHT Stands for < or >=
---- LEFT stands for <= and >
    As Range Right For Values
    (
                '2012-01-01',
                '2013-01-01',
                '2014-01-01',
                '2015-01-01'
    );
Go
Partition scheme
A database objects that maps the partitions of a partition function to a set of filegroups.

/* Associate the partition function with a partition scheme. */
CREATE PARTITION
SCHEME Sch_YearlySalesPartitionScheme
AS PARTITION
Fun_YearlySalesPartition
---- Optimized file structure
ALL TO ([PRIMARY])
Partitioning column
Partition function uses a column of a table or index to partition the table or index. Computed columns that participate in a partition function must be explicitly marked PERSISTED.

/* Create a partitioned table and column. */
Create Table dbo.SalesOrders
(
      order_id  int Identity(1,1)   Not Null
    , orderDate datetime            Not Null
    , ItemName varchar(30) Not Null
    , SalesPerson varchar(30) Not Null
    , Customer varchar(30) Not Null
    Constraint PK_orders Primary Key Clustered
    ( order_id  , orderDate  )
) On Sch_YearlySalesPartitionScheme (orderDate);
Go

/* Insert records partitioned table and column. */
Insert into dbo.SalesOrders (orderDate,ItemName,SalesPerson, Customer) Values
('2012-01-01','ItemNameA', 'SalesPerson1',' Customer01'),
('2012-10-18','ItemNameB', 'SalesPerson2',' Customer02'),
('2013-03-01','ItemNameA', 'SalesPerson1',' Customer01'),
('2013-04-01','ItemNameB', 'SalesPerson2',' Customer02'),
('2014-03-01','ItemNameA', 'SalesPerson1',' Customer01'),
('2014-06-01','ItemNameB', 'SalesPerson2',' Customer02'),
('2015-01-01','ItemNameA', 'SalesPerson1',' Customer01'),
('2015-01-01','ItemNameB', 'SalesPerson2',' Customer02')

Aligned index
An index which is built on the same partitioned scheme as its corresponding table. When a table and its indexes are in alignment, SQL Server can switch partitions quickly and efficiently while maintaining the partition structure of both the table and its indexes

/* Let’s create an aligned, partitioned index. */
Create NonClustered Index IX_OrderAligned
    On dbo.SalesOrders(order_id)
    On Sch_YearlySalesPartitionScheme(orderDate);

Nonaligned index
An index partitioned independently from its corresponding table. That is, the index has a different partition scheme or is placed on a separate filegroup from the base table.

/* Now let’s create a Nonaligned index. */
Create NonClustered Index IX_OrderNonaligned
    On dbo.SalesOrders(order_id)
    On [Primary];

For both aligned and nonaligned indexes, the memory requirement can be greater if SQL Server is applying degrees of parallelism to the build operation on a multiprocessor computer. This is because the greater the degrees of parallelism, the greater the memory requirement.
Although partitioned indexes can be implemented independently from their base tables but it generally makes sense to design a partitioned table and then create an index on the table.
Whenever you are planning for partitioning a unique nonclustered index then index key must contain the partitioning column but for partitioning a non-unique, nonclustered index, SQL Server adds the partitioning column by default as a nonkey (included) column of the index to make sure the index is aligned with the base table.
How to verify the Partition Index
---- Verify the rows in the different partitions
select partition_id, index_id, partition_number, Rows
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID)='SalesOrders'


We have created and verified a horizontal partitioned table for SalesOrder table and data always falls in these partitioned tables whenever matched the conditions.
Benefits of both Partitioned Tables and Indexes:
  • Faster and easier data loading
  • Faster and easier data deletion or archival
  • Faster queries
  • Sliding windows: A sliding window is basically what was referred to earlier in the discussion about adding new data and then deleting or archiving old data.
 At a Glance simple steps
/* Create a partition function. */
Create Partition Function
    [Fun_YearlySalesPartition] (datetime)
---- Where RIGHT Stands for < or >=
---- LEFT stands for <= and >
    As Range Right
                For Values
    (
                '2012-01-01',
                '2013-01-01',
                '2014-01-01',
                '2015-01-01'
 );
Go

/* Associate the partition function with a partition scheme. */
CREATE PARTITION
SCHEME Sch_YearlySalesPartitionScheme
AS PARTITION
Fun_YearlySalesPartition
---- Optimized file structure
ALL TO ([PRIMARY])

/* Create a partitioned table and column. */
Create Table dbo.SalesOrders
(
      order_id  int Identity(1,1)   Not Null
    , orderDate datetime            Not Null
    , ItemName varchar(30) Not Null
                , SalesPerson varchar(30) Not Null
    , Customer varchar(30) Not Null
    Constraint PK_orders Primary Key Clustered
    (
        order_id
      , orderDate
    )
) On Sch_YearlySalesPartitionScheme(orderDate);
Go

/* Insert records partitioned table and column. */
Insert into dbo.SalesOrders (orderDate,ItemName,SalesPerson, Customer) Values
('2012-01-01','ItemNameA', 'SalesPerson1',' Customer01'),
('2012-10-18','ItemNameB', 'SalesPerson2',' Customer02'),
('2013-03-01','ItemNameA', 'SalesPerson1',' Customer01'),
('2013-04-01','ItemNameB', 'SalesPerson2',' Customer02'),
('2014-03-01','ItemNameA', 'SalesPerson1',' Customer01'),
('2014-06-01','ItemNameB', 'SalesPerson2',' Customer02'),
('2015-01-01','ItemNameA', 'SalesPerson1',' Customer01'),
('2015-01-01','ItemNameB', 'SalesPerson2',' Customer02')


/* Let’s create an aligned, partitioned index. */
Create NonClustered Index IX_OrderAligned
    On dbo.SalesOrders(order_id)
    On Sch_YearlySalesPartitionScheme(orderDate);


/* Now let’s create an Nonaligned index. */
Create NonClustered Index IX_OrderNonaligned
    On dbo.SalesOrders(order_id)
    On [Primary];

---- Verify the rows in the different partitions
select partition_id, index_id, partition_number, Rows
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID)='SalesOrders'


Note: Partitioned tables and indexes are not available in every edition of Microsoft SQL Server. Beginning with SQL Server 2012, statistics are not created by scanning all the rows in the table when a partitioned index is created or rebuilt. Instead, the query optimizer uses the default sampling algorithm to generate statistics.

Type of Indexes in SQL

References:
https://msdn.microsoft.com/en-us/library/ms190787(v=sql.130).aspx
http://sqlfool.com/2008/12/indexing-for-partitioned-tables/

No comments:

Post a Comment