Saturday, October 1, 2016

Sparse Column in SQL Server

Sparse Column feature was introduced in SQL Server 2008. They are just ordinary columns that have an optimized storage for null values. We can consider sparse columns to reduce (20 percent to 40 percent) the space requirements for null values at the cost of more overhead to retrieve non-null values and provide more extensible schemas.
Sparse columns are defined on the table by using the CREATE TABLE or ALTER TABLE statements and can be used with column sets and filtered indexes. In filter indexes, they are especially appropriate because sparse columns have many null-valued rows. In this way, a filtered index on a sparse column can index only the rows that have populated values.
Characteristics of Sparse Column
  1. Sparse columns are very useful to optimize storage for null values and design to allow a relational database to store and process relatively unstructured data, where any individual entity may have a modest selection from a very wide set of attributes.
  2. SPARSE keyword within column definition is to optimize the storage of values in that column and the values require no storage in case of null value in that column.
  3. The sys.columns catalog view contains a row for each column in the table and includes a column set if one is defined.
  4. Sparse columns come into storage layer, rather than the logical table. Therefore a SELECT…INTO statement does not copy over the sparse column property into a new table.
  5. The COLUMNS_UPDATED function returns a varbinary value to indicate all the columns that were updated during a DML action.

To understand Sparse Columns importance, we are taking some examples. In these examples, we will use an ordinary data table without sparse columns and another table with sparse columns such as given below:
Use Demo
Go
----- Create table without SPARSE
CREATE TABLE UnSparsedEmployee
(
EmpID INT IDENTITY(1,1),
FirstName Varchar(20) Null,
MiddleName VARCHAR(20) Null,
LastName VARCHAR(20) Null
)

----- Create table with SPARSE columns
CREATE TABLE SparsedEmployee
(
EmpID INT IDENTITY(1,1),
FirstName Varchar(20) Null,
MiddleName VARCHAR(20) SPARSE,
LastName VARCHAR(20) Null
)
GO

----- View the storage of each table before
----- Insert any records
sp_spaceused 'UnSparsedEmployee'
go
sp_spaceused 'SparsedEmployee'
go


We have created two set of data table and also observed the total occupied storage space on the disk also. Now, we are inserting some records in the both tables and will observe the total occupied storage space on the disk again as given below:
Use Demo
Go
---- local varible to set counter for records
declare @ictr int = 0
while @ictr<3000
begin
---- Insert same values in both tables
INSERT INTO UnSparsedEmployee (FirstName,MiddleName,LastName)
Values('Ryan',Null,'Arjun'),
('Kimmg',Null,'Wang'), ('Bill',Null,'Johnson')

---- Insert same values in both tables
INSERT INTO SparsedEmployee (FirstName,MiddleName,LastName)
Values('Ryan',Null,'Arjun'),
('Kimmg',Null,'Wang'), ('Bill',Null,'Johnson')

set @ictr=@ictr+1;
End
GO
----- View the storage of each table after
----- Insert  records
sp_spaceused 'UnSparsedEmployee'
go
sp_spaceused 'SparsedEmployee'
Go


Now, we can observe that unused space is 48Kb for sparse column based data table which is greater than un-sparsed data table. This difference will increase if we can set the sparse columns only on those columns which are basically contain null values such as address columns, alternate mobile numbers and billing addresses also.

Estimated Space Savings by Data Type
They require more storage space in case of column does not non-null value means they requires required for identical data that is not marked SPARSE. The NULL Percentage column indicates what percent of the data must be NULL for a net space savings of 40 percent. Table containing SPARSE column can have maximum size of 8018 bytes instead of regular 8060 bytes. A table operation which involves SPARSE column takes performance hit over regular column.

Data types cannot be recognized as SPARSE
Sparse columns are feasible with most of the data types in SQL Server but there are some data type which cannot be recognized as SPARSE such as geography, text, geometry, timestamp, image, ntext and user-defined data types.
Boundaries for Using Sparse Columns
Even that sparse column is very useful to optimize the storage of values for null columns but they also contains some draw-backs also which are given below:
  • Changing a column from sparse to non-sparse or non-sparse to sparse requires changing the storage format of the column. The SQL Server Database Engine uses the following procedure to accomplish this change:
  1. We should require adding a new column to the table in the new storage size and format.
  2. For each row in the table, updates and copies the value stored in the old column to the new column.
  3. After adding and updating new column, we should require removing the old column from the table schema.
  4. Rebuilds the table (if there is no clustered index) or rebuilds the clustered index to reclaim space used by the old column.
  • They are neither feasible for default value nor for bound to a rule. They can be used in the computed column but a computed column never be marked as sparse.
  • SPARSE column does not have IDENTITY or ROWGUIDCOL property. So, a sparse column cannot be part of a clustered index or a unique primary key index and cannot be used as a partition key of a clustered index or heap. 
  • Merge replication and Data compression doesn't work for Using Sparse Columns.


Conclusion
SPARSE column are better at managing NULL and ZERO values in SQL Server but they take extra space in database for non-null values. We do need to specify SPARSE, it shouldn't be automatic! When we say SPARSE, the column is not stored at each row; it is stored in separate data pages attached to the table. This is why a non-null value stored in a sparse column is 4 bytes longer than its datatype, to provide for this storage overhead. Update and select statements of this sparse column would be slower (access row, then also access the separate storage for sparse columns), plus we can actually waste more space if your null columns are actually densely populated, so it shouldn't be automatically enabled for all null columns. As MS suggests, pick the right compromise between the calculated storage savings and potential performance hit.
Referenceshttps://msdn.microsoft.com/en-in/library/cc280604(v=sql.120).aspx

No comments:

Post a Comment