SQL Server 2008 : Creating Indexes via T-SQL (part 1) - Creating Clustered and Nonclustered Indexes & Creating Unique and Primary Key Indexes

6/17/2011 11:38:48 AM

There is more to creating indexes then just knowing the syntax and following tips and recommendations that you find from various sources. As a database administrator, you must ensure that the indexes created on your system do not hinder performance rather than improve it.

Understanding the usage patterns of your applications significantly improves your decision making when determining the indexes you create on your system. Because of the volume of queries executed on a system, covering every query with an exact index is usually not the best thing to do. When you understand the usage patterns of an application, you can make better decisions in terms of prioritizing the importance of frequently executed queries. Determining the indexes to create on your system then becomes an easier task.

1. Creating Clustered and Nonclustered Indexes

Creating clustered and nonclustered indexes in SQL Server is one of the fundamental tasks of database administrators. The general syntax is the same for creating each type. However, the issues to think about when you're in the planning stage can be different.

1.1. Issues When Creating Clustered Indexes

As discussed earlier, clustered indexes determine the order of the data for each table and are accessed frequently. When choosing your clustered indexes, think about the following points:

Data accessibility: Think about how the data within the table is utilized and accessed. Are you going to be adding, updating, and deleting records often? Are you going to be bulk loading data daily or monthly and then retrieving data from the table all day long? Understanding the accessibility of the data will help you determine the key value.
Narrow keys: Remember that every nonclustered index will contain the clustered index key. So large clustered index keys could potentially cause fragmentation problems in nonclustered index pages. Just keep narrow keys in mind while determining the clustered index key.
Uniqueness: Unique values for clustered index keys enable queries that use the clustered index (as well as queries that use the nonclustered index and need to lookup the data associated with the data row locator) more efficient. SQL Server has to force uniqueness when data is inserted into a clustered index, which cost IO and processing time. Creating a unique key yourself is the preferred method.
Sequential keys: Consider choosing a clustered index key that is sequential in terms of time or numbers. If the order of your clustered key is sequential, then inserting data will always occur in a fashion that minimizes page splits. The data is added to the end of pages, minimizing the cost of ordering all of your data.
Static keys: Choose clustered index keys that will not be modified. If the modification of the clustered index key occurs, then all of the nonclustered indexes associated with the key will also require updates. The table will also have to reorder the data if the key value moves to another page. Clearly, you can see how costly this operation would be on your system if updates happened frequently.
Order By columns: Columns that are often used in ORDER BY clauses may be candidates for clustered indexes. Remember, the data will be ordered based on the key values in the clustered index creation.
JOIN clauses: The primary table that contains the column used for joining multiple tables together may prove to be beneficial for clustered indexes. This option really coincides with understanding your data and usage patterns.

Think seriously about these issues when creating a clustered index because the performance of your application depends on your making correct and reasonable choices.

1.2. Issues When Creating Nonclustered Indexes

There are a number of items to consider before creating nonclustered indexes. Nonclustered indexes are equally important as the clustered indexes that we just discussed. In fact, you might find that you rely on nonclustered indexes more than on your clustered indexes to fulfill the requests of queries. Following are some things to think about:

Data accessibility: Yes, accessibility is important for nonclustered indexes, too. Think about your data access patterns a step further than you did with the clustered indexes. Clustered indexes focus on the structure of your data. Nonclustered indexes should focus on the various types of questions that the data will answer. For example, how many accounts were opened in the past week?
Priority queries: Make sure you start creating indexes for the highest priority, most frequently accessed queries first. If you focus on the highest priority queries first, you will ensure that the response time of the application is sufficient while you work on the other queries over time.
Cover your queries: When determining the index keys for your high-priority queries, think about covering your queries. Depending on the query, you may want to cover the SELECT and WHERE clauses of those queries. Spend some time analyzing your queries and determining what the best strategy is to cover the queries.
Don't cover everything: Although SQL Server allows you to cover every column in a query, that doesn't mean you should. Just because you increase the performance of one query does not mean you will not impact the other queries that write to that table.
Don't over index: Remember, all the data for the key values that you choose will be stored in the nonclustered index as well as in the table. Every time the table is inserted, updated, or deleted, every nonclustered index whose key columns are modified will be impacted. Be careful not to over index a table such that performance is impacted.
Uniqueness: Try to create nonclustered indexes on key columns where the cardinality or selectivity is high. The query optimizer will be more likely to use those nonclustered indexes instead of doing a table scan.
JOIN clauses: Pay attention to columns listed in JOIN clauses. Parent/child joins are common. Parent tables are typically accessed through their primary keys, which often correspond to their clustered indexes. What is often overlooked, though, are the foreign key values in the child tables. Consider indexing those using nonclustered indexes. For example, you might create a nonclustered index on the order_number column in a line_items table.

Nonclustered indexes will be utilized frequently within your application and can provide significant performance improvements. Just keep in mind that adding nonclustered indexes incorrectly could potentially cause performance problems. So plan your indexes before implementation.

1.3. Creating an Index

Finally, we will actually create some indexes. The syntax for index creation is fairly straightforward. The following script shows the create index syntax:

CREATE [CLUSTERED | NONCLUSTERED] INDEX index_name
ON <object>(column [ASC | DESC], [,...])
[INCLUDE ( column_name [,...n])
[WITH (relational_index_options [,...n])

The preceding code shows you how to build composite indexes, which are indexes with multiple columns. You can also see the syntax for specifying the relational index options that you want to use. Let's go ahead and use the syntax in some examples. The first two examples create a clustered and nonclustered index with the default option values:

USE AdventureWorks2008
GO

CREATE CLUSTERED INDEX ix_bookId
ON apWriter.Books(bookId)

CREATE NONCLUSTERED INDEX ix_Title
ON apWriter.Books(Title)

As you create more indexes, you should take advantage of relational index options. When you create an index, you can specify the following:

ALLOW_ROW_LOCKS: Allows the Database Engine to use row locks if it deems them necessary.
ALLOW_PAGE_LOCKS: Allows the Database Engine to use page locks if necessary.
DATA_COMPRESSION: Identifies the type of compression you want used for the clustered and nonclustered indexes. The available options are as follows: NONE (indicating that you don't want the data compressed), ROW (to compress data row by row), and PAGE (to compress entire pages at a time).
DROP_EXISTING: Allows the dropping of a named index prior to rebuilding the index. The index names must be identical, even though you can change the definition of the index. We have mixed feelings about using this option. It removes the benefit of online index operations, which allow the index to still be used during the rebuild process. If you need to change the definition of an index, you can create another index and drop the previous one once you are done. On the other hand, this option is helpful if you are recreating an index mentioned in any hints that your application places into its queries. Obviously, you would want your newly rebuilt index to have the same name in that case. Regardless of our opinion, the choice exists for you to use.
FILLFACTOR: Determines how much free space remains on a leaf level page when creating and rebuilding indexes. The default value is 0, or 100% full. We generally specify a lesser FILLFACTOR option on indexes that are going to have records inserted within the index instead of at the bottom or end of the page. That's because writes that result in new index entries in the middle of a page can lead to page splits. Frequent page splits will influence performance because of the cost of the split and fragmentation created within the index.
IGNORE_DUP_KEY: Prevents records that violate a unique constraint from causing the entire batch inserts or updates to fail. Without enabling this option, one record that violates the unique constraint will cause all the records not to be written to the table.
MAXDOP: Gives you the opportunity to override the server setting for the maximum degree of parallelism used for the index operations. The available options are as follows: 1 (prevents parallel execution), any number greater than 1 (specifies the number of parallel executions allowed up to 64), and 0 (uses the appropriate number of processors based on the current load of the system).
ONLINE: Allows you to create, rebuild, or drop indexes without preventing user access to the data in the underlying table. By default, this option is set to off, which causes the underlying table to be locked, thereby preventing user access. Tables that contain large object (LOB) data types like Varchar(Max), Varchar(Binary), and XML, cannot be rebuilt while online. There are also a couple of other conditions that prevent online index maintenance. Consider using ONLINE where possible to limit the impact of index maintenance to your application users. Online index operations are Enterprise Edition features only.
PAD_INDEX: When specified with FILLFACTOR, determines the amount of free space stored on the intermediate level pages of an index. The PAD_INDEX option will use the same percentage specified in the FILLFACTOR option. The intermediate level page has to be large enough to store at least two records, and if the FILLFACTOR is not large enough, then the Database Engine will override the FILLFACTOR percentage internally.
SORT_IN_TEMPDB: Identifies the location where the temporary sorting of the index will take place. If tempdb is stored on a separate physical drive from the data, then the index creation process should complete in a shorter amount of time. Bear in mind, though, that sorting the data in a separate database requires that SQL Server move the data to that target database. For that reason, sorting in tempdb increases the amount of disk space needed over the default behavior.
STATISTICS_NORECOMPUTE: Gives you the option not to update statistics after the index is created or rebuilt. The default value is no, which forces the statistics to update automatically. You may want to experiment with this option if AUTO UPDATE STATISTICS is giving you a problem, specifically on larger tables.

Now let's create some covering indexes with composite keys and include columns. You can also play with some of the relational index options. The following code example demonstrates the creation of a composite key with the FILLFACTOR option set to 75% and online operations turned on. The FILLFACTOR for this index is set because you can easily see new people being added and the index having to make room for their last names to fit on the respective pages. The goal is to minimize the page splits every time a new person is added, so we leave some free space on the leaf level pages at creation time.

USE AdventureWorks2008
GO

CREATE NONCLUSTERED INDEX ix_peronName
ON person.Person(LastName, FirstName, MiddleName)
WITH (FILLFACTOR = 75, ONLINE = ON, MAXDOP = 2)

Now, let's say you have decided that you want the middle name as a key value for the previously created index. Since the middle name is returned in most of the queries, you decide to include the middle name to reduce lookups on the primary key. You also don't want to break the index hints that are in place, so you keep the same name. The following code shows an example of the DROP_EXISITNG option with an INCLUDE option:

USE AdventureWorks2008
GO

CREATE NONCLUSTERED INDEX ix_peronName
ON person.Person(LastName, FirstName)
INCLUDE (MiddleName)
WITH (FILLFACTOR = 75, ONLINE = ON, MAXDOP = 2, DROP_EXISTING = ON)

Lastly, we want to show you an example of using data compression with your index creation statement. The following code creates an index with page data compression enabled:

USE AdventureWorks2008
GO
CREATE NONCLUSTERED INDEX ix_peronName6
ON person.Person(LastName, FirstName, MiddleName)
INCLUDE (Suffix,Title)
WITH (FILLFACTOR = 75, ONLINE = ON, DATA_COMPRESSION = PAGE)

Compression not only saves disk space, but sometimes it can actually increase performance. That's because using compression means more entries can fit on a page, resulting in fewer pages of IO.

2. Creating Unique and Primary Key Indexes

Creating unique and primary key indexes are methods of ensuring distinctness within key columns. Remember a table can have only one primary key but multiple unique indexes. Unique indexes can exist as both clustered and nonclustered indexes. Frequently, the primary key is the clustered index on a table. Creating unique indexes requires you to understand the data. The following list provides some things you should consider before creating primary keys and unique indexes:

Uniqueness within the data: Make sure the keys of an index will truly be unique within the context of the data. Think through as many scenarios as possible prior to implementing a unique index. When designing a database environment, some things sound like they should be unique when in actuality opportunities for duplication exist. For example, you may decide that every Social Security number will be unique within your environment. That sounds great until you get a duplicated Social Security number. (Trust us, it happens.) Make sure you think through your unique keys so that you don't create a constraint that will come back to haunt you down the road.
Nulls in key columns: Keep in mind that primary keys force uniqueness and don't allow nulls in the key columns, whereas unique indexes do allow nulls.
Updates to keys: You can update the key of a unique index, but not the key of a primary index.
Query optimizations: Don't overlook possible query optimizations that can come about from choices you make at index creation time. For example, when you are creating an index on one or more columns and your data is such that it is valid to create that index as a unique index, then do so. A unique index helps the query optimizer by letting the it know that the data within the key will be unique. Don't miss out on the opportunity to improve the performance of your system by not taking advantage of unique index creation opportunities.

The syntax for creating a unique clustered or nonclustered index is similar to that for creating clustered and nonclustered indexes. The only difference comes from the keyword UNIQUE. By default, primary key constraints create unique clustered indexes, and unique constraints create unique indexes. However, you are the one in control. If you want the primary key represented by a unique nonclustered index, then you can create the index that way. The following code demonstrates how to create unique indexes:

CREATE UNIQUE [CLUSTERED | NONCLUSTERED] INDEX index_name
ON <object>(column [ASC | DESC], [,...])
[INCLUDE ( column_name [,...n])
[WITH (relational_index_options [,...n])

Now that you understand the syntax, review the following code and create a unique clustered and nonclustered index:

USE AdventureWorks2008
GO

CREATE UNIQUE CLUSTERED INDEX ix_bookId
ON apWriter.Books2(bookId)
WITH(ONLINE = ON, FILLFACTOR = 95, DROP_EXISTING = ON)

USE AdventureWorks2008
GO
CREATE UNIQUE NONCLUSTERED INDEX ix_AuthorTitle
ON apWriter.Books2(Title)
WITH(IGNORE_DUP_KEY = ON, ONLINE = ON, DROP_EXISTING = ON)

The first code example re-creates a clustered index on the example database, specifying the FILLFACTOR and an online index operation. The second example re-creates an index, using the same name but removing a key column. The re-creation is done with the index online. The option IGNORE_DUP_KEY is enabled, preventing records that violate unique constraints from causing the entire batch modification to fail.

Related -----------------

- SQL Server 2008 : Creating Indexes via T-SQL (part 2) - Creating Filtered Indexes & Creating XML Indexes

- SQL Server 2008 : Creating Indexes via T-SQL (part 1) - Creating Clustered and Nonclustered Indexes & Creating Unique and Primary Key Indexes

Other -----------------

- SQL Server 2008 : Index Vocabulary, Structure, and Concepts

- BizTalk 2009 : The Enterprise Service Bus Toolkit 2.0 - The Architecture (part 2) - Adapter Providers & Mediation Policies

- BizTalk 2009 : The Enterprise Service Bus Toolkit 2.0 - The Architecture (part 1) - Mediation Components & Resolvers

- BizTalk 2009 : The Enterprise Service Bus Toolkit 2.0 - BizTalk and the ESB Concept

- SQL Server 2008 High Availability : Log Shipping (part 2) - SharePoint and Log Shipping

- SQL Server 2008 High Availability : Log Shipping (part 1) - How to Configure Log Shipping

- Windows Server 2008 R2 : Manage Internet Information Services (part 2) - Remotely Manage IIS Servers & Manage IIS with PowerShell

- Windows Server 2008 R2 : Manage Internet Information Services (part 1) - Work with the IIS Management Console

- Microsoft Dynamics CRM 2011 : Merging Account or Contact Records

- Microsoft Dynamics CRM 2011 : Assigning Accounts and Contacts to Other Users