Microsoft Business Intelligence (Data Tools)|2016

Wednesday, December 28, 2016

Understanding Query Store components

We know that the Query Store is one of the biggest enhancements to the SQL Server 2016 database engine that will complete the performance tuning toolbox of database administrators. The query store is extremely helpful, but it does require some management where we must enable it on each user database individually and need to change the space allocated to the query store from the default of 100 MB per database. We can use Query Store in all databases without concerns, in even densely packed pools.

The Query Store is the combination of two stores known as Plan Store and Runtime Stats Store. The most interesting fact is that both stores are in-memory objects that hold query information and the default space allocation for Query Store is 100 MB.

Common scenarios for using the Query Store

There are many common scenarios for using the Query Store features as given below:

Quickly find and fix a plan performance regression by forcing the previous query plan. Fix queries that have recently regressed in performance due to execution plan changes.
Determine the number of times a query was executed in a given time window, assisting a DBA in troubleshooting performance resource problems.
Identify top n queries (by execution time, memory consumption, etc.) in the past x hours.
Audit the history of query plans for a given query.
Analyze the resource (CPU, I/O, and Memory) usage patterns for a particular database.

In the Query Store, a Plan Store that persists the execution plans and a Run-Time Stats Store that persists the statistics surrounding query execution, such as CPU, I/O, memory, and other metrics. SQL Server retains this data until the space allocated to Query Store is full. The number of unique plans that can be stored for a query in the plan store is limited by the max_plans_per_query configuration option. To reduce the impact on performance, SQL Server writes information to each of these stores asynchronously.

Query Store Catalog Views

There are seven catalog views which can be used get the information such as return metadata and query execution history as given below:

Query_Store_Runtime_Stats – This view will show the run-time execution statistics for queries.
Query_Store_Runtime_Stats_Interval - This view will show the start and end times for the intervals over which run-time execution statistics are collected.
Query_Store_Plan - This view will show the execution plan information for queries.
Query_Store_Query - This view will show the query information and its overall aggregated run-time execution statistics.
Query_Store_Query_Text - This view will show the query text as entered by the user, including white space, hints, and comments.
Database_Query_Store_Options - Returns the Query Store options for this database.
Query_Context_Settings - Contains information about the semantics affecting context settings associated with a query.

Reviewing information in the query store

After enabling Query Store for a database, you have access to the following four dashboards:

Regressed Queries - We can use this dashboard to review queries that might have regressed because of execution plan changes. The dashboard allows us to view the queries and their plans as well as to select queries based on statistics (total, average, minimum, maximum, and standard deviation) by query metric (duration, CPU time, memory consumption, logical reads, logical writes, and physical reads) for the top 25 regressed queries over the last hour.

Overall Resource Consumption- We can use this dashboard to visualize overall resource consumption during the last month in four charts: duration, execution count, CPU time, and logical reads. You have the option to toggle between a chart view and a grid view of the query store data.

Top Resource Consuming Queries - We can use this dashboard to review queries in the set of top 25 resource consumers during the last hour. We can filter the queries by using the same criteria available in the Regressed Queries dashboard.

Tracked Queries - We can use this Use this dashboard to monitor a specify query.

All the dashboards except Overall Resource Consumption allow you to view the execution plan for a query. In addition, we have the option to force an execution plan at the click of a button in the dashboard, which is one of the most powerful features of the query store. However, the plan must still exist in the query plan cache to use this feature.

We can use the Query Store in all databases without concerns, in even densely packed pools. All issues related to excessive resource usage, that might have occurred when Query Store was enabled for the large number of databases in the Elastic Pools, have been resolved.

Mukesh Singh

With over 17 years of experience in the Data Engineering stack across a variety of cloud and on-premises systems, I have successfully delivered more than ten complete business product solutions. My expertise lies in building robust infrastructure and architecture to support data engineering, data analytics, and machine learning processes. These solutions have significantly improved collaboration among cross-functional teams, including data scientists, business analysts, software engineers, and stakeholders. Key Contributions Data Modelling and Integration • Data Modeling: Developed various data models to produce suitable data for business users, data analytics, data science, and data visualization teams. • Legacy Systems and Cloud Technologies: Integrated legacy systems with modern cloud-based technologies (AWS, Azure, GCP), data lakes, and data warehouses. • Streamlined Data Pipelines: Built efficient data pipelines, data warehouses, BI reports, and dashboards to streamline data access and insights.

Monday, December 26, 2016

Query Store in SQL Server 2016

The Query Store is one of the biggest enhancements to SQL Server 2016 database engine that will complete the performance tuning toolbox of database administrators and simplifies identification of performance outliers, manages execution plan regression, and allows for easier upgrades between versions of SQL Server. The query store also acts as a flight data recorder for the database, capturing query run-time statistics and providing a dashboard to sort queries by resource consumption.

The query store gives unprecedented insight into the operations of a database and helps to troubleshoot query performance by capturing a range of information about query usage, CPU, memory consumption, I/O and execution time, and retaining every Execution Plan for analysis.

Query Store Architecture

As per the Query Store Architecture, Query Store is the combination of two stores known as Plan Store and Runtime Stats Store and these stores are in-memory objects that hold query information as given below.

What does happen? when a query submitted in Query Store enabled database, first of all, the compiled query Execution Plan is written to the Query Store in Plan Store section and the runtime information of the query is recorded in the Runtime Stats Store section inside the in-memory stores.

After a specific interval these information inside the in-memory stores is asynchronously written to disk to harden the information. The Query Store writes the Execution Plans and runtime statistics into the primary file group of the database that we queried against.

Enabling the Query Store

This feature is disabled by default in the local databases. The Query Store requires at least SQL Server 2016 version of Management Studio and to enable it, we need to follow the following instructions in SQL Server Management Studio (SSMS) –

Open Object Explorer and connect to the database engine,
Navigate to the database, for which we want to enable Query Store,
Right-click the database, selects Properties, and then clicks Query Store in the Database Properties dialog box as given below-

In the Operation Mode (Requested) value from Off to Read Only or Read Write. By selecting Read Write, we enable Query Store to record the run-time information necessary to make better decisions about queries.

After enabling our Query Store, we can see the Query Store within our database as given below-

Besides setting all the options through the Query Store GUI, we can also configure all of these options through T-SQL.

---Enabling Query Store

ALTER DATABASE Demo

SET QUERY_STORE = ON

(

OPERATION_MODE = READ_WRITE ,

CLEANUP_POLICY = ( STALE_QUERY_THRESHOLD_DAYS = 5 ) ,

DATA_FLUSH_INTERVAL_SECONDS = 2000 ,

MAX_STORAGE_SIZE_MB = 10 ,

INTERVAL_LENGTH_MINUTES = 10

);

Understanding Query Store Components
Conclusion

The Query Store helps us to get insights on query and overall performance of a database and greatly simplifies performance troubleshooting by helping us quickly find performance differences, even after a server restart or upgrade as it persists this information inside internal tables of the database and captures query text as well as the query execution plan. It helps us to identify queries that have a high runtime variance by doing comparison over time.

Mukesh Singh

Monday, December 19, 2016

Analysis Services in SQL Server 2016

SQL Server 2016 is fully loaded with loaded with several new amazing features, including more speed, more security and more value.

Analysis Services is an online analytical data engine and SQL Server 2016 makes several enhancements to it including improvements in enterprise decision support and business analytics, modeling platform, BI tools, SharePoint integration, and hybrid BI.

We know that Analysis Services provides two different ways to support fast reporting and analytics and these are multidimensional and tabular mode. Multidimensional mode has long been a component in SQL Server, whereas tabular mode was introduced in SQL Server 2012. These both modes were the best way to deliver superior query performance before introducing of columnar storage in the SQL Server database engine.

After Analysis Services enhancements in SQL Server 2016, we can choose whether to use in-memory OLTP or columnstore indexes in the database engine (one of the Analysis Services engines) or even a hybrid-mode in which we use ROLAP mode with a multidimensional model or DirectQuery mode for a tabular model to directly query the database engine instead of data stored by Analysis Services.

Microsoft development team stated that the performance of queries that include an unnatural hierarchy is faster than it was in previous versions, although natural hierarchies remain faster yet.

Natural versus unnatural hierarchies in a multidimensional database

A natural hierarchy describes the one-to-many relationship moving between levels from the top down, such as Year, Quarter, and Month, in which one year can have four quarters and one quarter can have twelve months. A natural hierarchy is always a better structure to design into a dimension because of the materialization of the data in the Analysis Services database. However, we must define attribute relationships for the natural hierarchy to materialize the data.

The most interesting fact is that this optimization requires no configuration or redevelopment of models to improve query performance except only to deploy our multidimensional model to a SQL Server 2016 Analysis Services multidimensional instance.

Furthermore, a change in memory allocation request handling is available in SQL Server 2016. In previous versions, we had two options for memory allocations such as Windows Low-Fragmentation Heap (LFH) or a custom heap allocator for Analysis Services where LFH is set default to LFH to improve performance for multi-user workloads. Ordinarily, LFH works well with small memory blocks, but some situations can compromise the efficiency of LFH for Analysis Services by fragmenting memory, which in turn reduces query performance. In SQL Server 2016 Analysis Services, the default option is a new hybrid allocator that strikes a balance by using LFH for small allocations and the custom heap allocator for large allocations.

Remember: Tabular mode uses the same memory settings as multidimensional mode, but it has its own msmdsrv.ini file in which these settings are defined. The tabular mode tends to require larger allocations than multidimensional mode, so the new hybrid allocator should rely more heavily on the custom heap allocator.

Tabular-model performance has always been good because of its use of the same columnar technology as columnstore indexes and in-memory OLTP. However, it’s even better in SQL Server 2016 because of DAX optimizations, storage-engine caching improvements, and changes to the way that T-SQL is generated for DirectQuery mode.

In SQL Server 2016, Analysis services tabular models simplify solving complex business problems using over 50 new DAX functions and new relationship types can help solve many to many issues because DAX optimization is a reduction in the number of queries sent to the storage engine from Power BI and Power BI Desktop. A single DAX query can return multiple result sets as intermediate results that Power BI can use across multiple requests.

In SQL Server 2016, Microsoft development teams focused on tabular models in Analysis Services to enhancements and they have added around 1200 new compatibility level functions in tabular model databases. Analysis Services 2016 is coming with a number of enhancements to multidimensional models also.
This efficiency is a result of changes in the following areas that affect measure execution:

Variables - As we describe in more detail in Chapter 6, “More analytics,” the use of variables in a DAX expression for measures allows you to reuse logic within the same expression, which can reduce overall execution time.
Conditional expressions - Rather than evaluating each branch of an IF or SWITCH conditional expression, a branch with a false condition no longer generates a storage-engine query.
Nonempty calculations - The number of scans necessary to retrieve a nonempty result set is reduced from many to one.
Multiple measures in same table - A single query now includes all measures from the same table.

Measure grouping - When a query requests a measure at different levels of granularity, such as Month, Quarter, Year, and Total, the query requests only the lowest-level measure value and then derives the higher-level values.
Joins - A storage-engine query now returns both dimension columns and measure values in the same query to eliminate redundant joins, and the order of joins now starts from the most restrictive intermediate table that correlates with the greatest number of other intermediate tables.
Multiple result sets - A storage-engine query now returns both dimension columns and measure values in the same query.
Countrows -This function is optimized to now use table heuristics.
Storage-engine caching - The storage engine uses one cache per database rather than one cache per server as it did in prior versions.

DirectQuery is the feature in tabular models that retrieves data from the data source instead of data that has been imported into memory. Now that MDX queries are supported by DirectQuery, we can use Excel to connect to a DirectQuery-enabled tabular model, the translation to T-SQL is also improved in SQL Server 2016 by just restoring our tabular database to a SQL Server 2016 Analysis Services tabular instance.

Conclusion

Both multidimensional mode and tabular mode benefit from enhancements that improve performance in SQL Server 2016. The Analysis Services multidimensional engine includes an optimization for faster query performance. Memory allocation request handling is available in SQL Server 2016 where the default option is a new hybrid allocator that strikes a balance by using LFH for small allocations and the custom heap allocator for large allocations. The existing multidimensional model requires no configuration or redevelopment of models to improve query performance except only to deploy them to a SQL Server 2016 Analysis Services multidimensional instance.

Referenced Book: Introducing Microsoft SQL Server 2016

Mukesh Singh

Friday, December 16, 2016

Important features of Fact tables in Data Warehouse

Facts tables play a very dynamic role in the data warehouses or data marts because they consist of the measurements, metrics or facts of a business process e.g., sales revenue by month by product.

A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables).

Note: Special care must be taken whenever you are handling ratios and percentage because one good design rule is to never store percentages or ratios in fact tables but only calculate these in the data access tool. Thus only store the numerator and denominator in the fact table, which then can be aggregated and the aggregated stored values can then be used for calculating the ratio or percentage in the data access tool.

Important Features of Fact Tables

They placed at the centre of a star schema or a snowflake schema surrounded by dimension tables. Where multiple fact tables are used, these are arranged as a fact Constellation schema or Galaxy schema.
A fact table typically has two types of columns: those that contain facts and those that are a foreign key to dimension tables.
The primary key of a fact table is usually a composite key that is made up of all of its foreign keys.
Fact tables contain the content of the data warehouse and store different types of measures like additive, non-additive, and semi additive measures.
Mostly fact tables contain numerical data (facts) that can be summarized to provide information about the history of the operation of the organization.
Each fact table also includes a multi-part index that contains as foreign keys the primary keys of related dimension tables, which contain the attributes of the fact records.
Fact tables should not contain descriptive information or any data other than the numerical measurement fields and the index fields that relate the facts to corresponding entries in the dimension tables.
Fact tables provide the (usually) additive measures that act as independent variables by which dimensional attributes are analyzed.
Fact tables are often defined by their grain. The grain of a fact table represents the most atomic level by which the facts may be defined.
Additive measures allow summary information to be obtained by adding various quantities of the measure, such as the sales of a specific item at a group of stores for a particular time period.
Non-additive measures such as inventory quantity-on-hand values can also be used in fact tables, but different summarization techniques must then be used.

Conclusion

Before design a fact table, we must have to decide what business process to model by gathering and understanding business needs and available data such as Identify a business process for analysis (like sales) where we should know the granularity level of the facts means describing exactly what a fact table record represents and also need to determine dimensions for the fact table.

Mukesh Singh

Thursday, December 8, 2016

SQL Server - How to Calculate Running Total or Cumulative Sum

Cumulative Sum or Running Total is a sequence of partial sums of a given sequence which is used to display the total sum of data as it grows with time or any other series or progression. This lets us know and view the total contribution so far of a given measure against a given sequence or time.

In SQL Server, getting running totals in T-SQL is not hard task because there are different ways of calculating cumulative sums or running totals. To understand it in a better way, we are taking an employee example to accumulate their salaries across the company as well as accumulate their salary within their departments also.

Demo data of Employee Master – We are using table variable to view the total contribution so far of a given measure against a given sequence or time as given below:

------ Declare table varible

DECLARE @EmpSalary TABLE

(

Id INT IDENTITY(1,1),

EmpId INT,

DeptId INT,

Salary FLOAT

)

----- Insert values into @EmpSalary

INSERT INTO @EmpSalary(EmpId, DeptId, Salary)

VALUES

(101,10,25000), (102,10,35000),

(103,11,15000), (104,11,18500)

---- Pull result from @EmpSalary

SELECT Id, EmpId, DeptId, Salary

FROM @EmpSalary

Id	EmpId	DeptId	Salary
1	101	10	25000
2	102	10	35000
3	103	11	15000
4	104	11	18500

Expected Output for Cumulative Sum or Running Total

By using T-SQL in SQL Server, we can get the expected result as given below:

Running Salary across the data table

Id	EmpId	DeptId	Salary	RunningSalary
1	101	10	25000	25000
2	102	10	35000	60000
3	103	11	15000	75000
4	104	11	18500	93500

Running Salary across the data table within department

Id	EmpId	DeptId	Salary	RunningSalary	RunningDeptSal
1	101	10	25000	25000	25000
2	102	10	35000	60000	60000
3	103	11	15000	75000	15000
4	104	11	18500	93500	33500

Cumulative Sum or Running Totals in SQL Server 2005 or 2008 R2

SQL Server supports most of the aggregation functions such as SUM and AVG in this context with the exceptions of grouping and we can use self-join within the table, nested select statements to pull the cumulative sum or running total in SQL Server 2005 or 2008 R2 as given below:

----- Running Salary across the data table

SELECT E.Id, E.EmpId, E.DeptId, E.Salary,

SUM(S.Salary) As RunningSalary

FROM @EmpSalary E

---- self-join

INNER JOIN @EmpSalary S on E.Id>=S.Id

GROUP BY E.Id, E.EmpId, E.DeptId, E.Salary

ORDER BY E.Id,Sum(S.Salary)

Id	EmpId	DeptId	Salary	RunningSalary
1	101	10	25000	25000
2	102	10	35000	60000
3	103	11	15000	75000
4	104	11	18500	93500

----- Running Salary across the data table within department

SELECT E.Id, E.EmpId, E.DeptId

,E.Salary, E.RunningSalary

,Sum(D.Salary) As RunningDeptSal

FROM

----- Drived data table

(

----- Pull Running Total Salary accross the data

SELECT E.Id, E.EmpId, E.DeptId, E.Salary,

SUM(S.Salary) As RunningSalary

FROM @EmpSalary E

---- self join

INNER JOIN @EmpSalary S on E.Id>=S.Id

GROUP BY E.Id, E.EmpId, E.DeptId, E.Salary

---- Self join within dept

INNER JOIN @EmpSalary D on E.Id>=D.Id

AND E.DeptId=D.DeptId

----- group all column of derived table

GROUP BY E.Id, E.EmpId, E.DeptId,

E.Salary, E.RunningSalary

ORDER BY E.Id,Sum(D.Salary)

Id	EmpId	DeptId	Salary	RunningSalary	RunningDeptSal
1	101	10	25000	25000	25000
2	102	10	35000	60000	60000
3	103	11	15000	75000	15000
4	104	11	18500	93500	33500

The above queries calculates a cumulative sum of salary per department and ORDER BY Id and Aggregate sum () function. The rows are cross joined restricting the join only to equal or smaller ID values in right table.

Cumulative Sum or Running Totals in SQL Server 2012 on wards

In SQL Server 2012 on wards, more complex business problems such as running totals or cumulative sums could be solved without the extensive use of cursors or nested select statement. We can use OVER clause, PARTITION BY with the GROUP BY clause and Aggregates function to pull running totals or cumulative sums against each row as given below:

----- SQL Server 2012 Onwards

----- Running Salary across the data table

SELECT E.Id, E.EmpId, E.DeptId, E.Salary,

SUM(E.Salary) Over (Order by E.Id) As RunningSalary

FROM @EmpSalary E

GROUP BY E.Id, E.EmpId, E.DeptId, E.Salary

ORDER BY E.Id,Sum(E.Salary)

Id	EmpId	DeptId	Salary	RunningSalary
1	101	10	25000	25000
2	102	10	35000	60000
3	103	11	15000	75000
4	104	11	18500	93500

----- Running Salary across the data table within department

SELECT E.Id, E.EmpId, E.DeptId, E.Salary,

SUM(E.Salary) Over (Order by E.Id) As RunningSalary,

---- Use Over with Partition By and Order By

SUM(E.Salary) Over (Partition By E.DeptId Order by E.Id) As RunningDeptSal

FROM @EmpSalary E

GROUP BY E.Id, E.EmpId, E.DeptId, E.Salary

ORDER BY E.Id,Sum(E.Salary)

Id	EmpId	DeptId	Salary	RunningSalary	RunningDeptSal
1	101	10	25000	25000	25000
2	102	10	35000	60000	60000
3	103	11	15000	75000	15000
4	104	11	18500	93500	33500

Note: This works with SQL Server 2012 and up, 2008 has limited support for window functions.

We can see here that the OVER clause allows us to manage the grouping based on the context specified in relationship to the current row. With the expansion of the OVER clause to include PARTITION BY and ORDER BY support with aggregates, window functions increased their value substantially in SQL Server 2012 on wards.

To learn more, please visit us at YouTube -

Conclusion

There are several ways but it depends on your SQL Server version to choose the best approach to pull the cumulative sum or running total against each row. If the running total needs to be calculated to different partitions of data, just to use more conditions in PARTITION BY clause, ORDER BY clause in the OVER clause.

Mukesh Singh

Wednesday, December 28, 2016

Understanding Query Store components

Monday, December 26, 2016

Query Store in SQL Server 2016

Monday, December 19, 2016

Analysis Services in SQL Server 2016

Friday, December 16, 2016

Important features of Fact tables in Data Warehouse

Thursday, December 8, 2016

SQL Server - How to Calculate Running Total or Cumulative Sum

Popular Posts