Wednesday 8 November 2017

sql - Clustered vs Non-Clustered

itemprop="text">

My lower level knowledge of SQL
(Server 2008) is limited, and is now being challanged by our DBAs. Let me explain (I
have mentioned obvious statements in the hope that I am right, but if you see something
wrong, please tell me) the scenario:



We have a
table which holds 'Court Orders' for people. When I created the table, (Name:
CourtOrder), I created it
like:



CREATE TABLE
dbo.CourtOrder
(

CourtOrderID INT NOT NULL
IDENTITY(1,1), (Primary Key)
PersonId INT NOT NULL,
+ around 20
other fields of different
types.
)


I
then applied a non-clustered index to the primary key (for efficiency). My reasons is
that it is a unique field (primary key), and should be indexed, mainly for selection
purposes, as we often Select from table where primary key =
...



I then applied a CLUSTERED
index on PersonId. The reason was to group orders for a particular person physically, as
the vast majority of work is getting orders for a person. So, select from
mytable where personId =
...




I have been pulled
up on this now. I have been told that we should put the clustered index on the primary
key, and the normal index on the personId. That seems very strange to me. First off, why
would you put a clustered index on a unique column? what is it clustering? Surely that's
a waste of the clustered index? I'd have believed a normal index would be used on a
unique column. Also, clustering the index would mean we can't cluster a different column
(One per table, right?).



The reasoning for me
being told I have made a mistake is that they believe putting a clustered index on the
PersonId would make inserts slow. For the 5% gain in speed of a select, we would be
getting a 95% degradation in speed on inserts and updates. Is that correct and
valid?



They say that because we cluster the
personId, SQL Server has to rearrange data when ever we insert or make a change to the
PersonId.



So then I have asked, why would SQL
have the concept of a CLUSTERED INDEX, if it's so slow? Is it as slow as they're saying?
How should I have setup my indexes to achieve optimum performance? I'd have thought
SELECT is used more than INSERT... but they say that we're having locking issues on
INSERTS...



Hope someone can help
me.



Answer





The distinction between a
clustered vs. non-clustered index is that the clustered index determines the
physical order of the rows in the database
. In other words, applying the
clustered index to PersonId means that the rows will be
physically sorted by PersonId in the table, allowing an index
search on this to go straight to the row (rather than a non-clustered index, which would
direct you to the row's location, adding an extra
step).



That said, it's
unusual for the primary key not to be the clustered index, but not
unheard of. The issue with your scenario is actually the opposite of what you're
assuming: you want unique values in a clustered index, not
duplicates. Because the clustered index determines the physical order of the row, if the
index is on a non-unique column, then the server has to add a background value to rows
who have a duplicate key value (in your case, any rows with the same
PersonId) so that the combined value (key + background value)
is unique.



The only thing I would suggest is
not using a surrogate key (your
CourtOrderId) column as the primary key, but instead use a
compound primary key of the PersonId and some other
uniquely-identifying column or set of columns. If that's not possible (or not
practical), though, then put the clustered index on
CourtOrderId.


No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...