Microsoft Business Intelligence (Data Tools)|lower isolation level

Showing posts with label lower isolation level. Show all posts

Wednesday, October 14, 2015

SQL – Stored Procedure Vs Ad-Hoc (In-Line)

This would be a valuable decision making between the stored procedures and doing SQL (Ad-Hoc (In-Line) SQL) inside the coding before developing the back end of a database application.

At that time, you need to decide the best approach to accomplish your tasks (insert, update, delete, view and other SQL statements) by using stored procedures or Ad-Hoc (In-Line) SQL. As performance wise both stored procedure or Ad-Hoc (In-Line) SQL, both stored in cache and hardly any difference in new SQL versions. Before make the right decision, we can see the advantages and disadvantages of stored procedures as shown below:

Advantages of Stored procedures:

They are more easily available for code reviewed. After creating once, you can call them n-number of times from the various sources and no need to write you SQL query again and again. This is the biggest relief for a developer.
Stored procedures are stored in a pre-complied form and stores in our database. That is once a Stored procedure is executed, the compiled code is used in subsequent calls whereas Ad-Hoc (In-Line) SQL codes are compiled each and every time whenever you called in application and they stay in your application code. You can tested your stored procedures easily due to less coupled.
By using Stored procedures we can separate all the queries from the Business logic code and more easily tuned them whenever we want them. Stored procedures prevents SQL Injection Errors and provide the security to our database.

Performance is generally better, from the point of view of network traffic - if you have a cursor, or similar, then there aren't multiple trips to the database but this is not applicable for Ad-Hoc (In-Line) SQL.
By using them, we can protect access to the data more easily, remove direct access to the tables, enforce security through the procs - this also allows you to find relatively quickly any code that updates a table.
If there are other services involved (such as Reporting services), you may find it easier to store all of your logic in a stored procedure, rather than in code, and having to duplicate it.
Developers and database designers can work concurrently using stored procedures. While a programmer writes business logic, another one can create stored procedures at the same time. This is not possible with Ad-Hoc (In-Line) SQL.
One more thing is if we can wrong execute store procedure the transaction may rollback but Ad-Hoc (In-Line) SQL can't for many update statement.
Stored procedures keep coding and debugging simpler by dividing the work up into layers.

Disadvantages of Stored procedures

The real problem with stored procedures is that many programmers that use them are inclined to use procedural programming techniques like cursors, loops, recursion which defeats the purpose of a database engine optimized for set based queries.
The only down side of stored procedures are maintenance and redeployment. It's a little annoying to make an update and redeploy it from development to the production platform.

Now, we will discuss them which is better based on the below points-

Performance : Use inline SQL for CRUD operations and there is no need to write a stored procedure if the select statement is based on a single table then Ad-Hoc (In-Line) SQL will perform this task in more better way but remember security should be part of your design phase. Stored Procedures, especially in MS SQL Server have a major performance benefit. One of the reason to use Stored Procedure instead of inline SQL in the program is because of CPU consumption and even though the SQL itself is simple, each time you executed it inline, the SQL engine needs to compile it (check statistics, prepare plan, etc.) before it can be run. Once the inline SQL has been parameterized into a Stored Procedure, the plan will be prepared and thus, will not need to be compiled again when it is executed the hundreds/thousands of times / minute. Stored procedures support to use much less network traffic and again improve scalability.

Security: If you used a secured model of using Stored Procedures and only given access to execute stored procedures to the user then you’re adding a new layer of security which might (or might not) save you from that attack. It is resolved by using parameterized SQL which eliminate the SQL injection possibility. Allowing ad-hoc SQL implies that the calling user context (whether integrated or standard security model is used) has access to the tables themselves. Such a model puts all the security control in the hands of the client when it probably ought to live in multiple layers. (Obviously, not all applications need to be particularly secure.) Restricting a user to the stored procedure interfaces not only abstracts the data, it protects the tables from a malicious user.

Maintainability: You can modify a SQL statement without having to compile and redeploy the application. If the application does not use stored procedures exclusively, there will be SQL fragments in several places in the code and every single fragment must be inspected in detail and followed from its origin to the construction of a complete SQL statement and its handoff to the database - to verify that the application is safe from SQL injection. This is a great deal more work. By using stored procedures and views, you abstract the code from the underlying database structure, and enable yourself to make changes to optimize it.

With simple inline queries, dynamic SQL is very easy to create. If your applications have complex queries, then the lack of a visual SQL creation tool makes them very lumbering and error prone.

Development speed: You can modify a SQL statement without having to compile and redeploy the application. Developers and database designers can work concurrently using stored procedures. While a programmer writes business logic, another one can create stored procedures at the same time. This is not possible with Ad-Hoc (In-Line) SQL.

Conclusion

So it is always better to stick to Ad-Hoc (In-Line) SQL for small requirement and for a medium, secured and complex logic depend on stored procedures. Stored procedures have been limited to batch processes where the ability to process large volumes of data as quickly as possible is essential and they are aligned much more closely to the tenets of design-by-contract, test-driven development, verifiable inputs/outputs, and code readability. Stored procedures are cached and reused a lot more efficiently than dynamic SQL. Ad-hoc SQL provides a lot of run-time flexibility. Disallowing direct table access ensures that correctly written stored procedures can validate correct visibility to underlying data.

There is a very simple fundamental fact that most seem to neglect, T-SQL is not a programming language, it’s a relational language driven by an engine which is very dynamic and complex, that requires maintenance, optimization and monitoring in order to ensure scalability and performance. By taking the code out of the database, you are leaving DBA handicapped from controlling the data flow and ensuring uptime. And while this will work in a lot of environments this is not an enterprise approach, and designs such these require complete redesign.

Learn more on another feature of SQL as:

SQL – Stored Procedure Vs Ad-Hoc (In-Line)
SQL - Merge Statement
SQL - Functions
SQL - Cursors

Mukesh Singh

With over 17 years of experience in the Data Engineering stack across a variety of cloud and on-premises systems, I have successfully delivered more than ten complete business product solutions. My expertise lies in building robust infrastructure and architecture to support data engineering, data analytics, and machine learning processes. These solutions have significantly improved collaboration among cross-functional teams, including data scientists, business analysts, software engineers, and stakeholders. Key Contributions Data Modelling and Integration • Data Modeling: Developed various data models to produce suitable data for business users, data analytics, data science, and data visualization teams. • Legacy Systems and Cloud Technologies: Integrated legacy systems with modern cloud-based technologies (AWS, Azure, GCP), data lakes, and data warehouses. • Streamlined Data Pipelines: Built efficient data pipelines, data warehouses, BI reports, and dashboards to streamline data access and insights.

Sunday, July 19, 2015

SQL - Transaction Isolation Levels

The concept of isolation levels was first introduced in under the name Degrees of Consistency. We know that commercial databases support different isolation levels to allow programmers to trade off consistency for a potential gain in performance.

The most important thing is to know that which appropriate isolation level is set for query to access the result from relational database by avoiding deadlocks, dirty reads, non-repeatable reads and Phantom read or poor performances.

Before the elaboration of Transaction Isolation levels, we will go through the some important topics which play the most important in isolation levels.

Transactions

A transaction is nothing but it is a sequence of database operations with the following properties which are normally known as ACID:

Atomic: Operations of a transaction are executed all-or nothing, and are never left “half-done”.

Consistency: Assume all database constraints are satisfied at the start of a transaction, they should remain satisfied at the end of the transaction.

Isolation: It determines how transaction integrity is visible to other users and systems. Transactions must behave as if they were executed in complete isolation from each other.

Durability: If the DBMS crashes after a transaction commits, all effects of the transaction must remain in the database when DBMS comes back up.

SQL Transactions

A transaction is automatically started whenever a user executes an SQL statement and subsequent statements in the same session are executed as part of this transaction. SQL transactions must have two following commands -

COMMIT command commits the transaction effects are made final and visible to subsequent transactions

ROLLBACK command aborts the transaction effects are undone.

Apart of this, many DBMS support an AUTOCOMMIT feature, which automatically commits every single statement.

Read phenomena

The ANSI/ISO standard SQL 92 refers to three different read phenomena when Transaction 1 reads data that Transaction 2 might have changed.

Dirty reads - A dirty read occurs when a transaction is allowed to read data from a row that has been modified by another running transaction and not yet committed. Dirty reads work similarly to non-repeatable reads; however, the second transaction would not need to be committed for the first query to return a different result. The only thing that may be prevented in the READ UNCOMMITTED isolation level is updates appearing out of order in the results; that is, earlier updates will always appear in a result set before later updates.
Non-repeatable reads - A non-repeatable read occurs, when during the course of a transaction, a row is retrieved twice and the values within the row differ between reads. Non-repeatable reads phenomenon may occur in a lock-based concurrency control method when read locks are not acquired when performing a SELECT, or when the acquired locks on affected rows are released as soon as the SELECT operation is performed. Under the multiversion concurrency control method, non-repeatable reads may occur when the requirement that a transaction affected by a commit conflict must roll back is relaxed.
Phantom reads - A phantom read occurs when, in the course of a transaction, two identical queries are executed, and the collection of rows returned by the second query is different from the first. This can occur when range locks are not acquired on performing a SELECT ... WHERE operation. The phantom reads anomaly is a special case of Non-repeatable reads when Transaction 1 repeats a ranged SELECT ... WHERE query and, between both operations, Transaction 2 creates (i.e. INSERT) new rows (in the target table) which fulfill that WHERE clause.

Transaction Isolation Levels

A lower isolation level or weaker Isolation level increases the ability of many users to access data at the same time, but increases the number of concurrency effects (such as dirty reads or lost updates) users might encounter.

Conversely, a higher isolation level or stronger isolation level reduces the types of concurrency effects that users may encounter, but requires more system resources and increases the chances that one transaction will block another.
We can change the level of isolation that a particular connection is operating in by using the SET TRANSACTION ISOLATION LEVEL command.

Transaction Isolation levels are responsible to controls the locking and row versioning behavior of Transact-SQL statements issued by a connection to SQL Server. Transaction Isolation has the following isolation levels which behave differently with dirty reads, non-repeatable reads and Phantom reads-

Weaker Isolation Levels

Read Uncommitted

Read Committed – This is the default isolation level for all SQL Server databases.

Repeatable Read

Stronger Isolation Levels

Serializable

Snapshot

Isolation level	Dirty Reads	Non-repeatable Reads	Phantoms
Read uncommitted	Possible	Possible	Possible
Read committed	Impossible	Possible	Possible
Repeatable Read	Impossible	Impossible	Possible
Serializable	Impossible	Impossible	Impossible
Snapshot	Impossible	Impossible	Impossible

Note: Concurrency issues such as dirty reads and phantom reads, as their names suggest, apply to read operations, not write.

Read Uncommitted: This is a lower or weaker isolation level. If in the same time, two transactions request are running on the database. One transaction request can read data modified within another transaction but still not committed. This is because database engine does not apply shared locks when Read Uncommitted is specified, making this the least restrictive of the isolation levels.

When this option is set, it is possible to read uncommitted modifications, which are called dirty reads. It’s also possible for data to be modified by another transaction between issuing statements within the current transaction, resulting in non-repeatable reads or phantom reads. Values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction.

This option has the same effect as setting NOLOCK on all tables in all SELECT statements in a transaction.
See the explanation of Read Uncommitted functionality here.

Read committed: This is also a lower or weaker isolation level. If in the same time, two transactions request are running on the database. One transaction request cannot read data modified within another transaction but still not committed. So, dirty data will not appear in the result set. Data can be changed by other transactions between individual statements within the current transaction, resulting in non-repeatable reads or phantom data.

The isolation level uses shared locking or row versioning to prevent dirty reads, depending on whether the READ_COMMITTED_SNAPSHOT database option is enabled. Read Committed is the default isolation level for all SQL Server databases.

See the explanation of Read Committed functionality here.

Repeatable Read: This is also support a lower or weaker isolation level. If in the same time, two transactions request are running on the database. One transaction request cannot read data modified within another transaction but still not committed, thus preventing dirty reads.

Shared locks are placed on all data read by each statement in the transaction and are held until the transaction completes. This prevents other transactions from modifying any rows that have been read by the current transaction.

However, if another transaction inserts new rows that match the search condition in the current transaction, in between the current transaction accessing the same data twice, phantom rows can appear in the second read.
See the explanation of Repeatable Read here.

Serializable: This isolation supports to a higher or stronger isolation level. If in the same time, two transactions request are running on the database. One transaction request cannot read data modified within another transaction but still not committed. No other transaction can modify data being read by the current transaction until it completes, and no other transaction can insert new rows that would match the search condition in the current transaction until it completes.

Range locks are placed in the range of key values that match the search conditions of each statement executed in a transaction. This blocks other transactions from updating or inserting any rows that would qualify for any of the statements executed by the current transaction.

As a result, the Serializable isolation level prevents dirty reads, non-repeatable reads, and phantom reads. However, it can have the biggest impact on performance, compared to the other isolation levels.

See the explanation of Serializable here.

Snapshot: This isolation supports to a higher or stronger isolation level. If in the same time, two transactions request are running on the database. One transaction request cannot read data modified within another transaction but still not committed. If another transaction modifies data after the start of the first transaction, the data is not visible to the first transaction.

The first transaction works with a snapshot of the data as it existed at the beginning of that transaction. Snapshot transactions do not request locks when reading data, nor do they block other transactions from writing data. In addition, other transactions writing data do not block the current transaction for reading data. As with the Serializable isolation level, the Snapshot level prevents dirty reads, non-repeatable reads and phantom reads. However, it is susceptible to concurrent update errors.

See the explanation of Snapshot here.

Points to be remember:

· Read Committed is the default isolation level for all SQL Server databases.

· An Isolation level of Serializable is the most restrictive of all isolation levels.

· NOLOCK is a query hint and as such only applies to the specific table within the query in which it is specified.

· NOLOCK hint is a dirty reader, reads uncommitted data from the table.

· READ COMMITTED, corresponds to two isolation levels (READ committed and READ COMMITTED SNAPSHOT).

Conclusion
Only one type of the isolation level options can be set at a time which will be remained set for that connection until it is explicitly changed. All read operations performed within the transaction operate under the rules for the specified isolation level unless a table hint in the FROM clause of a statement specifies different locking or versioning behavior for a table.
When you change a transaction from one isolation level to another, resources that are read after the change are protected according to the rules of the new level. Resources that are read before the change continue to be protected according to the rules of the previous level.

References:

https://msdn.microsoft.com/en-IN/library/ms173763.aspx

Mukesh Singh

Wednesday, October 14, 2015

SQL – Stored Procedure Vs Ad-Hoc (In-Line)

Sunday, July 19, 2015

SQL - Transaction Isolation Levels

Popular Posts