Friday, July 10, 2015

SQL - XML Indexes

XML Indexes
We know that Indexing is the central key for every types of the data management. They are easily applicable in relational tables or even files in a tree of directories (a file system actually resembles an index in many ways).

The XML indexes have been introduced with SQL Server 2005 and they are using in the most of the organizations because of the ubiquity and readability of the XML format. XML instances are stored in xml type columns as the large binary objects (BLOBs) and they can be too large up to 2 GB. 

Absence of an XML index, these binary large objects is shredded at run time to evaluate a query. This shredding can be time-consuming. This run-time shredding can be costly, depending on the size and number of instances stored in the column.

As per the name indication, XML Indexes are applicable on the columns of XML data type. XML Indexes will be applicable on the specific table; the table must have a clustered index on the primary key column.

Types of XML Indexes
XML indexes fall into the following categories:
  1. Primary XML index
  2. Secondary XML index
The first index on the XML type column must be the primary XML index. Using the primary XML index, the following types of secondary indexes are supported: PATH, VALUE, and PROPERTY. Depending on the type of queries, these secondary indexes might help improve query performance.

Primary XML indexes
The primary XML index is a shredded and persisted representation of the XML BLOBs in the xml data type column. For each XML binary large object (BLOB) in the column, the index creates several rows of data. The number of rows in the index is approximately equal to the number of nodes in the XML binary large object.
Just like a table can only have one clustered index, an XML column can have only one primary XML index. They index all tags, values and paths over the XML instances in the column and benefit query performance. 
Queries within XML instances use the primary XML index, and can return scalar values or XML subtrees by using the index itself.

Each row stores the following node information:
  • Tag name such as an element or attribute name.
  • Node value.
  • Node type such as an element node, attribute node, or text node.
  • Document order information, represented by an internal node identifier.
  • Path from each node to the root of the XML tree. This column is searched for path expressions in the query.
  • Primary key of the base table. The primary key of the base table is duplicated in the primary XML index for a back join with the base table, and the maximum number of columns in the primary key of the base table is limited to 15.
Syntax of Primary XML Index
Creating a primary XML index is just as straight-forward as creating a regular single-column index on a table.
USE [AdventureWorks2012]

--- Create Primary XML Index
CREATE PRIMARY XML INDEX IndXML_ProductModel_CatalogDescription_Path
ON [Production].[ProductModel]

Secondary XML indexes
To enhance search performance, you can create secondary XML indexes. A primary XML index must first exist before you can create secondary indexes. 
The secondary XML index still covers the same data as the underlying primary index (i.e. the entire XML object), but you could say that it creates a more specific index, based on the primary index.
The T-SQL syntax for setting up a secondary XML index is very similar to the primary index, with the addition of the “USING XML INDEX” clause, which tells SQL Server that this isn’t a primary index, but is based on one.

Syntax of Secondary XML Index
Creating a Secondary XML index is just as straight-forward as creating a regular single-column index on a table which will be based on primary index.

USE [AdventureWorks2012]

---- Create Secondry XML Index
CREATE XML INDEX [IXML_ProductModel_CatalogDescription] ON [Production].[ProductModel]
USING XML INDEX IndXML_ProductModel_CatalogDescription_Path
        FOR PATH;

There are three different types of secondary indexes (specified with the FOR keyword at the end of the statement). The difference between them is what they index and how this information is arranged in the database based what does it suit your query needs.

PATH secondary indexes
The PATH secondary XML index is optimally designed for looking up paths, for instance when you’re using the exist() method in your queries. The index key of this type of index is the node pathof all nodes (elements, attributes, values) in the XML document.
VALUE secondary indexes
The index key of the VALUE secondary XML index is the node value and the node path, so you may want to consider this type if you’re looking for a specific value, but don’t know the exact path.
PROPERTY secondary indexes
The PROPERTY secondary XML index’s key is the base table’s primary keynode path and node value. This provides the means to perform queries that search specific table records and specific paths in those records’ XML documents, all in a single index scan/seek.

Guidelines for secondary indexes
  1. If your workload uses path expressions significantly on XML columns, the PATH secondary XML index is likely to speed up your workload. The most common case is the use of the exist() method on XML columns in the WHERE clause of Transact-SQL.
  2. If your workload retrieves multiple values from individual XML instances by using path expressions, clustering paths within each XML instance in the PROPERTY index may be helpful. This scenario typically occurs in a property bag scenario when properties of an object are fetched and its primary key value is known.
  3. If your workload involves querying for values within XML instances without knowing the element or attribute names that contain those values, you may want to create the VALUE index. 
Type of Indexes in SQL


No comments:

Post a Comment