Sparse
Column feature was introduced in SQL Server 2008. They are just ordinary
columns that have an optimized storage for null values. We can consider sparse
columns to reduce (20 percent to 40 percent) the space requirements for null
values at the cost of more overhead to retrieve non-null values and provide
more extensible schemas.
Sparse
columns are defined on the table by using the CREATE TABLE or ALTER TABLE statements and can be used
with column sets and filtered indexes. In filter indexes, they are especially
appropriate because sparse columns have many null-valued rows. In this way, a
filtered index on a sparse column can index only the rows that have populated
values.
Characteristics of Sparse Column
- Sparse columns are very useful to optimize storage for null values and design to allow a relational database to store and process relatively unstructured data, where any individual entity may have a modest selection from a very wide set of attributes.
- SPARSE keyword within column definition is to optimize the storage of values in that column and the values require no storage in case of null value in that column.
- The sys.columns catalog view contains a row for each column in the table and includes a column set if one is defined.
- Sparse columns come into storage layer, rather than the logical table. Therefore a SELECT…INTO statement does not copy over the sparse column property into a new table.
- The COLUMNS_UPDATED function returns a varbinary value to indicate all the columns that were updated during a DML action.
To understand Sparse Columns importance, we are taking some examples. In these examples, we will use an ordinary data table without sparse columns and another table with sparse columns such as given below:
Use Demo
Go
----- Create table without SPARSE
CREATE TABLE
UnSparsedEmployee
(
EmpID INT IDENTITY(1,1),
FirstName Varchar(20) Null,
MiddleName VARCHAR(20) Null,
LastName VARCHAR(20) Null
)
----- Create table with SPARSE columns
CREATE TABLE SparsedEmployee
(
EmpID INT IDENTITY(1,1),
FirstName Varchar(20) Null,
MiddleName VARCHAR(20) SPARSE,
LastName VARCHAR(20) Null
)
GO
----- View the storage of each table before
----- Insert any records
sp_spaceused 'UnSparsedEmployee'
go
sp_spaceused 'SparsedEmployee'
go
|
We have created two set of data table and also observed the total occupied storage space on the disk also. Now, we are inserting some records in the both tables and will observe the total occupied storage space on the disk again as given below:
Use Demo
Go
---- local varible to set counter for records
declare @ictr int = 0
while @ictr<3000
begin
---- Insert same values in both tables
INSERT INTO
UnSparsedEmployee (FirstName,MiddleName,LastName)
Values('Ryan',Null,'Arjun'),
('Kimmg',Null,'Wang'), ('Bill',Null,'Johnson')
---- Insert same values in both tables
INSERT INTO SparsedEmployee (FirstName,MiddleName,LastName)
Values('Ryan',Null,'Arjun'),
('Kimmg',Null,'Wang'), ('Bill',Null,'Johnson')
set @ictr=@ictr+1;
End
GO
----- View the storage of each table after
----- Insert
records
sp_spaceused 'UnSparsedEmployee'
go
sp_spaceused 'SparsedEmployee'
Go
|
Now, we can observe that unused space is 48Kb for sparse column based data table which is greater than un-sparsed data table. This difference will increase if we can set the sparse columns only on those columns which are basically contain null values such as address columns, alternate mobile numbers and billing addresses also.
Estimated Space Savings by Data Type
They require
more storage space in case of column does not non-null value means they requires
required for identical data that is not marked SPARSE. The NULL Percentage
column indicates what percent of the data must be NULL for a net space savings
of 40 percent. Table containing SPARSE column can have maximum size of 8018 bytes
instead of regular 8060 bytes. A table operation which involves SPARSE column
takes performance hit over regular column.
Data types cannot be recognized as SPARSE
Sparse
columns are feasible with most of the data types in SQL Server but there are
some data type which cannot be recognized as SPARSE such as geography, text,
geometry, timestamp, image, ntext and user-defined data types.
Boundaries for Using Sparse Columns
Even that
sparse column is very useful to optimize the storage of values for null columns
but they also contains some draw-backs also which are given below:
- Changing a column from sparse to non-sparse or non-sparse to sparse requires changing the storage format of the column. The SQL Server Database Engine uses the following procedure to accomplish this change:
- We should require adding a new column to the table in the new storage size and format.
- For each row in the table, updates and copies the value stored in the old column to the new column.
- After adding and updating new column, we should require removing the old column from the table schema.
- Rebuilds the table (if there is no clustered index) or rebuilds the clustered index to reclaim space used by the old column.
- They are neither feasible for default value nor for bound to a rule. They can be used in the computed column but a computed column never be marked as sparse.
- SPARSE column does not have IDENTITY or ROWGUIDCOL property. So, a sparse column cannot be part of a clustered index or a unique primary key index and cannot be used as a partition key of a clustered index or heap.
- Merge replication and Data compression doesn't work for Using Sparse Columns.
Conclusion
References- https://msdn.microsoft.com/en-in/library/cc280604(v=sql.120).aspx
No comments:
Post a Comment