Saturday, June 18, 2016

Usage of Sparsed columns

Found interesting SQL Server feature "Sparsed columns" (see BOL: https://msdn.microsoft.com/en-us/library/cc280604%28v=sql.105%29.aspx?f=255&MSPPError=-2147217396), which has been with SQL Server for a long while since 2008R2.

What is "Sparsing"?
"Sparsing" is the way SQL Server optimizes spacing for NULL values at the cost of overhead for non-NULL values.
In other words, if you expect having in your column more nulls than not nulls you can SPARSE that column to optimize the space.

I've seen the situations when a lot of columns in Data Mart tables were almost completely filled with NULLS and I started wondering if "SPARSE" can be a good tool to gain some space.

I've tested it on a small table:


CREATE TABLE tbl_TestSparse (ID INT IDENTITY(1,1)
       CONSTRAINT PK_ID PRIMARY KEY WITH (FILLFACTOR=100)
       , Sparsed_Column INT SPARSE NULL);


I inserted million rows into that table and started my testing by changing "Sparsed_Column" type and percentage of nulls in that column. Here are results in Megabytes I've got:


As you can see, SPARSE become profitable only for NVARCHAR(10) column when 50% of it's values were NULLs. I'd argue on this gain. If you know you'll have too many nulls in text columns - make them VARCHAR/NVARCHAR because as you probably noticed, those data types are handled by SQL Server so good that SPARSE does not any positive impact on them.

The only case of usage for SPARSE I see for really large tables where fixed size columns, such as INT, BIGINT,  MONEY, DATETIME are mostly having NULL values.

Would say our table had Billion records, then by Sparsing empty INT column we could potentially save 3 GB - 3.5 GB. It might be not very big percentage from the entire table, but still pretty big number from the Disk Space perspective.

If you are not sure how many NULLs you'll have in a column, do not use "SPARSE". It will add the complexity and might negatively impact performance in the future.

In any case, if you see a column, where 99% values are NULLs it could be the sign of bad design or bad data.




No comments:

Post a Comment