Friday, September 11, 2015
How many Clustered Indexes you may have?
Have you ever got that question on a job interview?
You've been smart and answered: "- Only One!"
Is that the right answer?
I like tricky questions and that one is one of them.
At first, you might have no clustered indexes at all and at second, question did not specify "a single table". That means you can have as many clustered indexes in your database as many tables you already have.
Even more! Do not forget about views, you can have clustered indexes on views as well.
So, the really true answer is "- As many as you want, but no more than one per table or view."
Why is that?
Let's look at the nature of Clustered Index. Just forget everything what you know about indexes for a second and just imagine Clustered Index as your table data set sorted in a certain order. Would say you sorted your data by Sale's Person ID and Sale Date and data lay in that order - that will be your Clustered Index. You physical can't have the same data set sorted in more than one way, otherwise it will be different data set.
That the reason why we can't have more than one Clustered index per Table or view.
Can we live without clustered index at all?
Sure we can. Table without clustered index is called Heap.
Do we always have to have Clustered Index?
Not at all. For huge tables with multiple indexes and a lot of lookup operations it might be beneficial is not having Clustered Index!
Look at two same size tables in the sample AdventureWorks2014 database:
- Production.ProductProductPhoto - 504 records without Clustered Index
- Production.Product - 504 records with Clustered Index
One of them has Clustered Index and another do not.
Run simple selects against these tables:
They have almost identical execution plans with only difference that Heap using RID (Row Identifier) Lookup and Clustered table uses Key Lookup.
When we look in "Messages" tab we can see how many IO operations were used for each select:
Select against table with clustered index took 4 IO reads (2 reads for index + 2 reads for Clustered Index) and 3 IO reads for Heap (2 reads for index + ONLY 1 read for Heap!)
Those are extremely small tables and if you have 6-7 Clustered Index levels you will have all these extra reads vs only one read for Heap.
Did I just say Heap is better than Clustered Index? - Absolutely NOT!
I've said that you should not fallow "Must Have Clustered Index" strategy and be more flexible.
Heaps also have their own caveats. The biggest one is that if you need and have only one index on a table better to have it Clustered.
In this case you will avoid extra lookup operations and won't fail in a nightmare of RID Lookups.
In addition to this there is not very well known Heap issue as Extra Spacing.
Run a script, which creates a simple table with only one varchar column and inserts there 700 records:
Now will add new Integer ID column, which is supposed to add some space to the table and build Clustered Index on it:
Tables with Clustered Indexes could be little bit smaller than Heaps.
Clustered Indexes have their own problems. Large, rapidly growing tables can fail into a fragmentation and page split issues unless the primary field of the index is growing DATETIME, IDENTITY or any other growing over the time value. At the same time that exact solution can generate page contention problem, which could lead to locks and even deadlocks.
Do you see how many different problems and questions bring Clustered Indexes?
Do you still want to have them for every table even if they will save space?