Friday, May 6, 2016

SQL - Queries Tuning and Optimization Techniques

In SQL, it is very difficult to write complex SQL queries involving joins across many (at least 3-4) tables and involving several nested conditions because a SQL statement, once it reaches a certain level of complexity, is basically a little program in and of itself. 
A database index is a data structure that improves the speed of operations on a database table. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random look ups and efficient access of ordered records. Indexing is incredibly important when working with large tables, however, occasionally smaller tables should be indexed, if they are expected to grow.

Try to consistently indent and don't be afraid to use multiple lines. You don't have to write it all at once. Complex queries can sometimes just be a collection of simple queries. You need to follow some basic guidelines and Take the time to think these through such as-

  1. List all of the columns that are to be returned
  2. List all of the columns that are used in the WHERE clause
  3. List all of the columns used in the JOINs (if applicable)
  4. List all the tables used in JOINs (if applicable)
  5. Get the correct records selected first
  6. Save the complex calculations for last
  7. If you do use a Common Table Expression (CTE), be aware that the query only persists until the next query is run, so in some cases where you are using the CTE in multiple queries, it might be better for performance to use a temp table.

Once you have the above information organized into this easy-to-comprehend form, it is much easier to identify those columns that could potentially make use of indexes when executed.
It makes a big difference to really understand how the data is combined, selected, filtered, and output. Here, query Optimization tricks comes into the picture to increase the performance of the program or software. There are a lot of guideline points to tune your query which do work as the boost of the query performance.  These guideline points are mentioned below: 

  1. SET NOCOUNT ON at the beginning of each stored procedure you write. This statement should be included in every stored procedure, trigger, etc. that you write.
  2. The SQL query becomes faster if you use the actual columns names in SELECT statement instead of than '*'.
  3. HAVING clause is used to filter the rows after all the rows are selected if you are using aggregation functions. It is just like a filter. Do not use HAVING clause for any other purposes.
  4. It is the best practice to avoid sub queries in your SQL statement and try to minimize the number of subquery block in your query if possible.
  5. Use operator EXISTS, IN and table joins appropriately in your query. The reason is- Usually IN has the slowest performance 
  6. IN is efficient when most of the filter criteria are in the sub-query.  
  7. EXISTS is efficient when most of the filter criteria is in the main query.
  8. Use EXISTS instead of DISTINCT when using joins which involves tables having one-to-many relationship.
  9. Be careful while using conditions in WHERE clause.
  10. To write queries which provide efficient performance follow the general SQL standard rules.
  11. Use single case for all SQL verbs
  12. Begin all SQL verbs on a new line
  13. Separate all words with a single space 
  14. Right or left aligning verbs within the initial SQL verb
  15. Indexes have the advantages as well as disadvantages as given below-
  16. Do not automatically add indexes on a table because it seems like the right thing to do. Only add indexes if you know that they will be used by the queries run against the table.
  17. Indexes should be measured on all columns that are frequently used in WHERE, ORDER BY, GROUP BY, TOP and DISTINCT clauses.
  18. Do not add more indexes on your OLTP tables to minimize the overhead that occurs with indexes during data modifications.
  19. Drop all those indexes that are not used by the Query Optimizer, generally.
  20. If possible, try to create indexes on columns that have integer values instead of characters. Integer values use less overhead than character values.
  21. To provide up-to-date statistics, the query optimizer needs to make smart query optimization decisions. You will generally want to leave the "Auto Update Statistics" database option on. This helps to ensure that the optimizer statistics are valid, ensuring that queries are properly optimized when they are run.
  22. If you want to boost the performance of a query that includes an AND operator in the WHERE clause, consider the following:
  23. Of the search criteria in the WHERE clause, at least one of them should be based on a highly selective column that has an index.
  24. If at least one of the search criteria in the WHERE clause is not highly selective, consider adding indexes to all of the columns referenced in the WHERE clause.
  25. If none of the columns in the WHERE clause are selective enough to use an index on their own, consider creating a covering index for this query.
  26. Queries that include either the DISTINCT or the GROUP BY clauses can be optimized by including appropriate indexes. Any of the following indexing strategies can be used:
  27. Include a covering, non-clustered index (covering the appropriate columns) of the DISTINCT or the GROUP BY clauses.
  28. Include a clustered index on the columns in the GROUP BY clause.
  29. Include a clustered index on the columns found in the SELECT clause.
  30. Adding appropriate indexes to queries that include DISTINCT or GROUP BY is most important for those queries that run often.
  31. When you need to execute a string of Transact-SQL, you should use the sp_executesql stored procedure instead of the EXECUTE statement.
  32. When calling a stored procedure from your application, it is important that you call it using its qualified name.
  33. Use stored procedures instead of views because they offer better performance and don't include code, variable or parameters that don't do anything.
  34. If possible, avoid using SQL Server cursors. They generally use a lot of SQL Server resources and reduce the performance and scalability of your applications.
  35. Instead of using temporary tables, consider using a derived table instead. A derived table is the result of using a SELECT statement in the FROM clause of an existing SELECT statement. By using derived tables instead of temporary tables, you can reduce I/O and often boost your application's performance.
  36. Don't use the NVARCHAR or NCHAR data types unless you need to store 16-bit character (Unicode) data. They take up twice as much space as VARCHAR or CHAR data types, increasing server I/O and wasting unnecessary space in your buffer cache.
  37. If you use the CONVERT function to convert a value to a variable length data type such as VARCHAR, always specify the length of the variable data type. If you do not, SQL Server assumes a default length of 30.
  38. If you are creating a column that you know will be subject to many sorts, consider making the column integer-based and not character-based. This is because SQL Server can sort integer data much faster than character data.
  39. Don't use ORDER BY in your SELECT statements unless you really need to, as it adds a lot of extra overhead. For example, perhaps it may be more efficient to sort the data at the client than at the server.
  40. Don't return more data (both rows and columns) from SQL Server than you need to the client or middle-tier and then further reduce the data to the data you really need at the client or middle-tier. This wastes SQL Server resources and network bandwidth.

To tune our SQL queries, understanding our database does play the most important role. In SQL, typically each table column has an associated data type. Text, Integer, Varchar, Date, and more, are typically available types for developers to choose from. When writing SQL statements, make sure you choose the proper data type for the column. Sometimes it's easier to break up sub groups into their own select statement. To write a query, we need to know about the actual need of the query and scope of the query also. 


  1. Can you prove this:
    The SQL query becomes faster if you use the actual columns names in SELECT statement instead of than '*'.

    1. Thats the first thing I thought also.

    2. That's the are where common sense comes into play

  2. There are a lot of presumptions in these tips which are not always correct. For example an IN statement is as fast as a exists when the column is not nullable.

  3. I have found that complex queries always involve multiple tables. To get the query correct, I always write a sub query against each table that gives me exactly the data I need, either to return or to select. Sometimes I even define these as views so they can be independently verified and used in multiple places. Then I combine the subqueries/views into a straightforward join query that gives me the data I need. I never worry about performance until a correct result is achieved. I review the query execution plan if needed to see if additional indices are needed.

    I find the authors guidelines misleading because they do not include the use of views or the use of query execution plans and I find both of these to be essential to writing correct, performance queries.

  4. Many of the tips above is prescribing one solution for all, which is not correct.

    Many of the tip can be proven to be wrong from query to query.

    Optimization tips must be based on Execution Plan, not on Query. The tips above are based on Query.


Popular Posts