There is more to creating indexes then just
knowing the syntax and following tips and recommendations that you find
from various sources. As a database administrator, you must ensure that
the indexes created on your system do not hinder performance rather than
improve it.
Understanding the usage
patterns of your applications significantly improves your decision
making when determining the indexes you create on your system. Because
of the volume of queries executed on a system, covering every query with
an exact index is usually not the best thing to do. When you understand
the usage patterns of an application, you can make better decisions in
terms of prioritizing the importance of frequently executed queries.
Determining the indexes to create on your system then becomes an easier
task.
1. Creating Clustered and Nonclustered Indexes
Creating clustered
and nonclustered indexes in SQL Server is one of the fundamental tasks
of database administrators. The general syntax is the same for creating
each type. However, the issues to think about when you're in the
planning stage can be different.
1.1. Issues When Creating Clustered Indexes
As discussed earlier,
clustered indexes determine the order of the data for each table and are
accessed frequently. When choosing your clustered indexes, think about
the following points:
Data accessibility:
Think about how the data within the table is utilized and accessed. Are
you going to be adding, updating, and deleting records often? Are you
going to be bulk loading data daily or monthly and then retrieving data
from the table all day long? Understanding the accessibility of the data
will help you determine the key value.
Narrow keys:
Remember that every nonclustered index will contain the clustered index
key. So large clustered index keys could potentially cause
fragmentation problems in nonclustered index pages. Just keep narrow
keys in mind while determining the clustered index key.
Uniqueness:
Unique values for clustered index keys enable queries that use the
clustered index (as well as queries that use the nonclustered index and
need to lookup the data associated with the data row locator) more
efficient. SQL Server has to force uniqueness when data is inserted into
a clustered index, which cost IO and processing time. Creating a unique
key yourself is the preferred method.
Sequential keys:
Consider choosing a clustered index key that is sequential in terms of
time or numbers. If the order of your clustered key is sequential, then
inserting data will always occur in a fashion that minimizes page
splits. The data is added to the end of pages, minimizing the cost of
ordering all of your data.
Static keys:
Choose clustered index keys that will not be modified. If the
modification of the clustered index key occurs, then all of the
nonclustered indexes associated with the key will also require updates.
The table will also have to reorder the data if the key value moves to
another page. Clearly, you can see how costly this operation would be on
your system if updates happened frequently.
Order By columns: Columns that are often used in ORDER BY
clauses may be candidates for clustered indexes. Remember, the data
will be ordered based on the key values in the clustered index creation.
JOIN clauses:
The primary table that contains the column used for joining multiple
tables together may prove to be beneficial for clustered indexes. This
option really coincides with understanding your data and usage patterns.
Think seriously about
these issues when creating a clustered index because the performance of
your application depends on your making correct and reasonable choices.
1.2. Issues When Creating Nonclustered Indexes
There are a number of items
to consider before creating nonclustered indexes. Nonclustered indexes
are equally important as the clustered indexes that we just discussed.
In fact, you might find that you rely on nonclustered indexes more than
on your clustered indexes to fulfill the requests of queries. Following
are some things to think about:
Data accessibility:
Yes, accessibility is important for nonclustered indexes, too. Think
about your data access patterns a step further than you did with the
clustered indexes. Clustered indexes focus on the structure of your
data. Nonclustered indexes should focus on the various types of
questions that the data will answer. For example, how many accounts were
opened in the past week?
Priority queries:
Make sure you start creating indexes for the highest priority, most
frequently accessed queries first. If you focus on the highest priority
queries first, you will ensure that the response time of the application
is sufficient while you work on the other queries over time.
Cover your queries:
When determining the index keys for your high-priority queries, think
about covering your queries. Depending on the query, you may want to
cover the SELECT and WHERE
clauses of those queries. Spend some time analyzing your queries and
determining what the best strategy is to cover the queries.
Don't cover everything:
Although SQL Server allows you to cover every column in a query, that
doesn't mean you should. Just because you increase the performance of
one query does not mean you will not impact the other queries that write
to that table.
Don't over index:
Remember, all the data for the key values that you choose will be
stored in the nonclustered index as well as in the table. Every time the
table is inserted, updated, or deleted, every nonclustered index whose
key columns are modified will be impacted. Be careful not to over index a
table such that performance is impacted.
Uniqueness:
Try to create nonclustered indexes on key columns where the cardinality
or selectivity is high. The query optimizer will be more likely to use
those nonclustered indexes instead of doing a table scan.
JOIN clauses: Pay attention to columns listed in JOIN
clauses. Parent/child joins are common. Parent tables are typically
accessed through their primary keys, which often correspond to their
clustered indexes. What is often overlooked, though, are the foreign key
values in the child tables. Consider indexing those using nonclustered
indexes. For example, you might create a nonclustered index on the
order_number column in a line_items table.
Nonclustered
indexes will be utilized frequently within your application and can
provide significant performance improvements. Just keep in mind that
adding nonclustered indexes incorrectly could potentially cause
performance problems. So plan your indexes before implementation.
1.3. Creating an Index
Finally, we will actually
create some indexes. The syntax for index creation is fairly
straightforward. The following script shows the create index syntax:
CREATE [CLUSTERED | NONCLUSTERED] INDEX index_name
ON <object>(column [ASC | DESC], [,...])
[INCLUDE ( column_name [,...n])
[WITH (relational_index_options [,...n])
The preceding code
shows you how to build composite indexes, which are indexes with
multiple columns. You can also see the syntax for specifying the
relational index options that you want to use. Let's go ahead and use
the syntax in some examples. The first two examples create a clustered
and nonclustered index with the default option values:
USE AdventureWorks2008
GO
CREATE CLUSTERED INDEX ix_bookId
ON apWriter.Books(bookId)
CREATE NONCLUSTERED INDEX ix_Title
ON apWriter.Books(Title)
As you create more
indexes, you should take advantage of relational index options. When you
create an index, you can specify the following:
ALLOW_ROW_LOCKS: Allows the Database Engine to use row locks if it deems them necessary.
ALLOW_PAGE_LOCKS: Allows the Database Engine to use page locks if necessary.
DATA_COMPRESSION:
Identifies the type of compression you want used for the clustered and
nonclustered indexes. The available options are as follows: NONE (indicating that you don't want the data compressed), ROW (to compress data row by row), and PAGE (to compress entire pages at a time).
DROP_EXISTING:
Allows the dropping of a named index prior to rebuilding the index. The
index names must be identical, even though you can change the
definition of the index. We have mixed feelings about using this option.
It removes the benefit of online index operations, which allow the
index to still be used during the rebuild process. If you need to change
the definition of an index, you can create another index and drop the
previous one once you are done. On the other hand, this option is
helpful if you are recreating an index mentioned in any hints that your
application places into its queries. Obviously, you would want your
newly rebuilt index to have the same name in that case. Regardless of
our opinion, the choice exists for you to use.
FILLFACTOR:
Determines how much free space remains on a leaf level page when
creating and rebuilding indexes. The default value is 0, or 100% full.
We generally specify a lesser FILLFACTOR
option on indexes that are going to have records inserted within the
index instead of at the bottom or end of the page. That's because writes
that result in new index entries in the middle of a page can lead to page splits. Frequent page splits will influence performance because of the cost of the split and fragmentation created within the index.
IGNORE_DUP_KEY:
Prevents records that violate a unique constraint from causing the
entire batch inserts or updates to fail. Without enabling this option,
one record that violates the unique constraint will cause all the
records not to be written to the table.
MAXDOP:
Gives you the opportunity to override the server setting for the
maximum degree of parallelism used for the index operations. The
available options are as follows: 1 (prevents parallel execution), any
number greater than 1 (specifies the number of parallel executions
allowed up to 64), and 0 (uses the appropriate number of processors
based on the current load of the system).
ONLINE:
Allows you to create, rebuild, or drop indexes without preventing user
access to the data in the underlying table. By default, this option is
set to off, which causes the underlying table to be locked, thereby
preventing user access. Tables that contain large object (LOB) data
types like Varchar(Max), Varchar(Binary), and XML, cannot be rebuilt
while online. There are also a couple of other conditions that prevent
online index maintenance. Consider using ONLINE
where possible to limit the impact of index maintenance to your
application users. Online index operations are Enterprise Edition
features only.
PAD_INDEX: When specified with FILLFACTOR, determines the amount of free space stored on the intermediate level pages of an index. The PAD_INDEX option will use the same percentage specified in the FILLFACTOR option. The intermediate level page has to be large enough to store at least two records, and if the FILLFACTOR is not large enough, then the Database Engine will override the FILLFACTOR percentage internally.
SORT_IN_TEMPDB: Identifies the location where the temporary sorting of the index will take place. If tempdb
is stored on a separate physical drive from the data, then the index
creation process should complete in a shorter amount of time. Bear in
mind, though, that sorting the data in a separate database requires that
SQL Server move the data to that target database. For that reason,
sorting in tempdb increases the amount of disk space needed over the default behavior.
STATISTICS_NORECOMPUTE:
Gives you the option not to update statistics after the index is
created or rebuilt. The default value is no, which forces the statistics
to update automatically. You may want to experiment with this option if
AUTO UPDATE STATISTICS is giving you a problem, specifically on larger tables.
Now let's create some covering
indexes with composite keys and include columns. You can also play with
some of the relational index options. The following code example
demonstrates the creation of a composite key with the FILLFACTOR option set to 75% and online operations turned on. The FILLFACTOR
for this index is set because you can easily see new people being added
and the index having to make room for their last names to fit on the
respective pages. The goal is to minimize the page splits every time a
new person is added, so we leave some free space on the leaf level pages
at creation time.
USE AdventureWorks2008
GO
CREATE NONCLUSTERED INDEX ix_peronName
ON person.Person(LastName, FirstName, MiddleName)
WITH (FILLFACTOR = 75, ONLINE = ON, MAXDOP = 2)
Now, let's say you have
decided that you want the middle name as a key value for the previously
created index. Since the middle name is returned in most of the queries,
you decide to include the middle name to reduce lookups on the primary
key. You also don't want to break the index hints that are in place, so
you keep the same name. The following code shows an example of the DROP_EXISITNG option with an INCLUDE option:
USE AdventureWorks2008
GO
CREATE NONCLUSTERED INDEX ix_peronName
ON person.Person(LastName, FirstName)
INCLUDE (MiddleName)
WITH (FILLFACTOR = 75, ONLINE = ON, MAXDOP = 2, DROP_EXISTING = ON)
Lastly, we want to show you an
example of using data compression with your index creation statement.
The following code creates an index with page data compression enabled:
USE AdventureWorks2008
GO
CREATE NONCLUSTERED INDEX ix_peronName6
ON person.Person(LastName, FirstName, MiddleName)
INCLUDE (Suffix,Title)
WITH (FILLFACTOR = 75, ONLINE = ON, DATA_COMPRESSION = PAGE)
Compression not only saves
disk space, but sometimes it can actually increase performance. That's
because using compression means more entries can fit on a page,
resulting in fewer pages of IO.
2. Creating Unique and Primary Key Indexes
Creating unique and primary
key indexes are methods of ensuring distinctness within key columns.
Remember a table can have only one primary key but multiple unique
indexes. Unique indexes can exist as both clustered and nonclustered
indexes. Frequently, the primary key is the clustered index on a table.
Creating unique indexes requires you to understand the data. The
following list provides some things you should consider before creating
primary keys and unique indexes:
Uniqueness within the data:
Make sure the keys of an index will truly be unique within the context
of the data. Think through as many scenarios as possible prior to
implementing a unique index. When designing a database environment, some
things sound like they should be unique when in actuality opportunities
for duplication exist. For example, you may decide that every Social
Security number will be unique within your environment. That sounds
great until you get a duplicated Social Security number. (Trust us, it
happens.) Make sure you think through your unique keys so that you don't
create a constraint that will come back to haunt you down the road.
Nulls in key columns:
Keep in mind that primary keys force uniqueness and don't allow nulls
in the key columns, whereas unique indexes do allow nulls.
Updates to keys: You can update the key of a unique index, but not the key of a primary index.
Query optimizations:
Don't overlook possible query optimizations that can come about from
choices you make at index creation time. For example, when you are
creating an index on one or more columns and your data is such that it
is valid to create that index as a unique index, then do so. A unique
index helps the query optimizer by letting the it know that the data
within the key will be unique. Don't miss out on the opportunity to
improve the performance of your system by not taking advantage of unique
index creation opportunities.
The syntax for creating a
unique clustered or nonclustered index is similar to that for creating
clustered and nonclustered indexes. The only difference comes from the
keyword UNIQUE. By default,
primary key constraints create unique clustered indexes, and unique
constraints create unique indexes. However, you are the one in control.
If you want the primary key represented by a unique nonclustered index,
then you can create the index that way. The following code demonstrates
how to create unique indexes:
CREATE UNIQUE [CLUSTERED | NONCLUSTERED] INDEX index_name
ON <object>(column [ASC | DESC], [,...])
[INCLUDE ( column_name [,...n])
[WITH (relational_index_options [,...n])
Now that you understand the syntax, review the following code and create a unique clustered and nonclustered index:
USE AdventureWorks2008
GO
CREATE UNIQUE CLUSTERED INDEX ix_bookId
ON apWriter.Books2(bookId)
WITH(ONLINE = ON, FILLFACTOR = 95, DROP_EXISTING = ON)
USE AdventureWorks2008
GO
CREATE UNIQUE NONCLUSTERED INDEX ix_AuthorTitle
ON apWriter.Books2(Title)
WITH(IGNORE_DUP_KEY = ON, ONLINE = ON, DROP_EXISTING = ON)
The first code example re-creates a clustered index on the example database, specifying the FILLFACTOR
and an online index operation. The second example re-creates an index,
using the same name but removing a key column. The re-creation is done
with the index online. The option IGNORE_DUP_KEY is enabled, preventing records that violate unique constraints from causing the entire batch modification to fail.