Thursday, March 31, 2016

SQL - Statistics in SQL Server

Statistics are very important object in the SQL Server which is used for query optimization to define the query execution plans. They contain statistical information about the distribution of values in one or more columns of a table. 
The optimiser does the smart work and always uses a cost based weighting to choose a best plan, and the statistics are used to estimate the number of rows that will be processed by a given operator, and hence the cost of that operator. 
With the help of statistics information, the optimizer would know the selectivity for the data.
Whenever statistics are created against any table then Database Engine sorts the values of the columns on which the statistics are being built. The query optimizer computes a histogram on the column values in the first key column of the statistics object. A histogram measures the frequency of occurrence for each distinct value in a data set. Histogram has limitation of 200 steps, separated by intervals. 



What Are Statistics Objects?

Each statistics object is created on a list of one or more table columns which includes a histogram displaying the distribution of values in the first column. Statistics objects on multiple columns also store statistical information about the correlation of values among the columns. These correlation statistics, or densities, are derived from the number of distinct rows of column values.


In a very simple manner, we can understand that histogram is nothing but it is something which specifies how many rows exactly match each interval value, how many rows fall within an interval, and a calculation of the density of values, or the incidence of duplicate values, within an interval.
As the data in a column changes, index and column statistics can become out-of-date and cause the query optimizer to make less-than-optimal decisions on how to process a query. However, the query optimizer still considers it to be a good candidate based on the index's outdated distribution statistics that are based on the data before the update.

Multicolumn statisticsJust what they sound like, statistics over multiple columns.  These are created by default whenever we create a composite index. We can also create them ourself without an index using the CREATE STATISTICS command. This will be helpful us when we have queries that filter on multiple columns absence of indexes.
Filtered statistics We can also put a filter on our statistics.  This is useful when we have a very large table but normally only query on a small subset of rows.  If these rows are highly selective, creating filtered statistics will help the optimizer choose the best path.
What are Filtered Statistics?

Filtered statistics are nothing but they play a very important role to improve query performance for SQL data queries that select from well-defined subsets of data. Filtered statistics use a filter predicate to select the subset of data that is included in the statistics. Well-designed filtered statistics can improve the query execution plan compared with full-table statistics.
SQL Server is responsible to maintain statistical information about key value distribution for all b-tree indexes so that it can more accurately estimate the number of qualifying rows. This information is used by the cost-based optimizer to generate the most efficient execution plan for the query, and aid in decisions about whether to use or not use an index for a particular query.


The most important thing is that Statistics do not have the b-tree structures to facilitate locating the data. However, they do provide cardinality statistics that the optimizer can use to develop more accurate row count estimates, resulting in more efficient query plans. Cardinality estimator is only interested in predicates. 

How to view Statistics within Data table?
DBCC SHOW_STATISTICS displays current query optimization statistics for a table or indexed view.  The syntax lets you specify a table or indexed view along with a target index name, statistics name, or column name:

DBCC SHOW_STATISTICS ( table_or_indexed_view_name , target )
[ WITH [ NO_INFOMSGS ] < option > [ , n ] ]
< option > :: =
STAT_HEADER | DENSITY_VECTOR | HISTOGRAM | STATS_STREAM

Arguments
table_or_indexed_view_name: Name of the table or indexed view for which to display statistics information. Name of the table that contains the statistics to display. The table cannot be an external table.
Target: Name of the index, statistics, or column for which to display statistics information. Target is enclosed in brackets, single quotes, double quotes, or no quotes. If target is a name of an existing index or statistics on a table or indexed view, the statistics information about this target is returned
NO_INFOMSGS: Suppresses all informational messages that have severity levels from 0 through 10.
STATS_STREAM is Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.



NOTE: If target is the name of an existing column, and an automatically created statistics on this column exists, information about that auto-created statistic is returned. If an automatically created statistic does not exist for a column target, error message 2767 is returned. In SQL Data Warehouse and Parallel Data Warehouse, target cannot be a column name.
Example:

DBCC SHOW_STATISTICS('[dbo].[Employee]', [_WA_Sys_00000001_0EA330E9]);

DBCC SHOW_STATISTICS('[dbo].[Employee]', [_WA_Sys_00000001_0EA330E9]) WITH STATS_STREAM;









The Different way statistics being created
There are many ways to create statistics on the indexes on tables. Some of them are given below:
  1. The query optimizer creates statistics on key column for indexes on tables when the index is created.
  2. The query optimizer creates statistics for single columns in query predicates.
  3. Composite indexes creates Multi column statistics.
  4. sp_createstats stored procedure.
  5. Create Statistics Statement.
We will explain them one by one here.
References: https://msdn.microsoft.com/en-gb/library/ms174384.aspx

2 comments:

  1. Controls of Mathematics, Statistics, Computer science, and Information innovation adds to their speculations and systems in the foundation of the field of Data Science. ExcelR Data Science Courses

    ReplyDelete
  2. Your music is amazing. You have some very talented artists. I wish you the best of success. 数学家教

    ReplyDelete