Thursday, March 31, 2016

SQL - Statistics in SQL Server

Statistics are very important object in the SQL Server which is used for query optimization to define the query execution plans. They contain statistical information about the distribution of values in one or more columns of a table. 
The optimiser does the smart work and always uses a cost based weighting to choose a best plan, and the statistics are used to estimate the number of rows that will be processed by a given operator, and hence the cost of that operator. 
With the help of statistics information, the optimizer would know the selectivity for the data.
Whenever statistics are created against any table then Database Engine sorts the values of the columns on which the statistics are being built. The query optimizer computes a histogram on the column values in the first key column of the statistics object. A histogram measures the frequency of occurrence for each distinct value in a data set. Histogram has limitation of 200 steps, separated by intervals. 

What Are Statistics Objects?

Each statistics object is created on a list of one or more table columns which includes a histogram displaying the distribution of values in the first column. Statistics objects on multiple columns also store statistical information about the correlation of values among the columns. These correlation statistics, or densities, are derived from the number of distinct rows of column values.

In a very simple manner, we can understand that histogram is nothing but it is something which specifies how many rows exactly match each interval value, how many rows fall within an interval, and a calculation of the density of values, or the incidence of duplicate values, within an interval.
As the data in a column changes, index and column statistics can become out-of-date and cause the query optimizer to make less-than-optimal decisions on how to process a query. However, the query optimizer still considers it to be a good candidate based on the index's outdated distribution statistics that are based on the data before the update.

Multicolumn statisticsJust what they sound like, statistics over multiple columns.  These are created by default whenever we create a composite index. We can also create them ourself without an index using the CREATE STATISTICS command. This will be helpful us when we have queries that filter on multiple columns absence of indexes.
Filtered statistics We can also put a filter on our statistics.  This is useful when we have a very large table but normally only query on a small subset of rows.  If these rows are highly selective, creating filtered statistics will help the optimizer choose the best path.
What are Filtered Statistics?

Filtered statistics are nothing but they play a very important role to improve query performance for SQL data queries that select from well-defined subsets of data. Filtered statistics use a filter predicate to select the subset of data that is included in the statistics. Well-designed filtered statistics can improve the query execution plan compared with full-table statistics.
SQL Server is responsible to maintain statistical information about key value distribution for all b-tree indexes so that it can more accurately estimate the number of qualifying rows. This information is used by the cost-based optimizer to generate the most efficient execution plan for the query, and aid in decisions about whether to use or not use an index for a particular query.

The most important thing is that Statistics do not have the b-tree structures to facilitate locating the data. However, they do provide cardinality statistics that the optimizer can use to develop more accurate row count estimates, resulting in more efficient query plans. Cardinality estimator is only interested in predicates. 

How to view Statistics within Data table?
DBCC SHOW_STATISTICS displays current query optimization statistics for a table or indexed view.  The syntax lets you specify a table or indexed view along with a target index name, statistics name, or column name:

DBCC SHOW_STATISTICS ( table_or_indexed_view_name , target )
[ WITH [ NO_INFOMSGS ] < option > [ , n ] ]
< option > :: =

table_or_indexed_view_name: Name of the table or indexed view for which to display statistics information. Name of the table that contains the statistics to display. The table cannot be an external table.
Target: Name of the index, statistics, or column for which to display statistics information. Target is enclosed in brackets, single quotes, double quotes, or no quotes. If target is a name of an existing index or statistics on a table or indexed view, the statistics information about this target is returned
NO_INFOMSGS: Suppresses all informational messages that have severity levels from 0 through 10.
STATS_STREAM is Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.

NOTE: If target is the name of an existing column, and an automatically created statistics on this column exists, information about that auto-created statistic is returned. If an automatically created statistic does not exist for a column target, error message 2767 is returned. In SQL Data Warehouse and Parallel Data Warehouse, target cannot be a column name.

DBCC SHOW_STATISTICS('[dbo].[Employee]', [_WA_Sys_00000001_0EA330E9]);

DBCC SHOW_STATISTICS('[dbo].[Employee]', [_WA_Sys_00000001_0EA330E9]) WITH STATS_STREAM;

The Different way statistics being created
There are many ways to create statistics on the indexes on tables. Some of them are given below:
  1. The query optimizer creates statistics on key column for indexes on tables when the index is created.
  2. The query optimizer creates statistics for single columns in query predicates.
  3. Composite indexes creates Multi column statistics.
  4. sp_createstats stored procedure.
  5. Create Statistics Statement.
We will explain them one by one here.

Sunday, March 20, 2016

SSRS – Format() Function

The most common fact for the data representation is that this is information being pulled from a stored procedure and we don't really know what the query was and which kind of data formatting is using in the script. This data formatting can be easily done with some very common expression formatting in SSRS. If we are developing an SSRS report to show gross sales which are broken down in order date, cost, gross profit, gross profit %, order count, total sales etc.

A standard principle is to separate data from display, so use the Value property to store the data in its native data type and use the Format property to display it how we want to display the data. This data formatting become very urgent when exporting our report to Excel because if we have the right data type for our data then it will export to Excel as the right data type as we defined in the report.
Most Common Data Representation Formats - As we know that Reporting Services already introduced enhanced rich text functionality to permit mixed formatting, or even mixed tooltips & actions, within a single textbox and different formats could include mixing font sizes, colors, and other things like bold and italics within one textbox. Some very common data formatting are given below:
  1. Various date formats
  2. Currency with 2 decimal places
  3. Number with 4 decimal places
  4. Percentage with decimal places
  5. Negative value format
To understand the report data formatting, we can use the following data to generate the report –
121 AS QTY,
121 * 1.5 AS SALES,
121 * 1.2 AS COST,
121 * 1.5 - 121 * 1.2 AS MARGIN
120 AS QTY,
120 * 1.75 AS SALES,
120 * 1.12 AS COST,
120 * 1.75 - 120 * 1.12 AS MARGIN
32 AS QTY, 32 * 21.75 AS SALES,
32 * 21.12 AS COST,
32 * 21.75 - 32 * 21.12 AS MARGIN 

Report will be look like as given below after using the above script –

To apply the format settings in the existing report column, we need to right click on the column and choose the text box properties as given below –
In the Textbox properties window, choose number to see all the available formats where we can choose the pre-define formats. If you want to add your own formats then we can choose the custom format as given below- 
Or we can use the column/Textbox properties window and fill the Format as given below-

Various date formats - Date formatting in SQL Server Reporting Services (SSRS) is a very common requirement where customers look for a very specific format and it is become very important to understand the basic components of date formats.
Date Format Expression

Currency with 2 decimal places – Currency formatting is very important for the report because every country has its own currency.
Currency Format Expression
Number with 4 decimal places –To display the number with n number decimal points, we can use the following format() settings -
Number Format Expression
Percentage with decimal places –To display the percentage number with n number decimal points, we can use the following format() settings -
Percentage number Format Expression