SQL Server 2008 : Creating Indexes via T-SQL (part 2) - Creating Filtered Indexes & Creating XML Indexes

6/17/2011 11:40:35 AM

3. Creating Filtered Indexes

With the introduction of filtered indexes in SQL Server 2008, you can create indexes for subsets of data. The data stored within a filtered index is restricted only to rows meeting the WHERE clause that you specify. Consider filtered indexes as nonclustered indexes optimized for performance. With a smaller subset of data, retrievals from a filtered index will be faster, storage of the index on disk will be smaller, and maintenance of the index will cost less and happen less frequently because writes only occur when data meeting the filter specification is modified. The following list describes some things that you should consider before creating filtered indexes.

Data accessibility: When creating filtered indexes, understanding your data is even more important. Make sure that your filtered indexes will be for subsets of data that are meaningful for your application. If you frequently query only Ford automobiles by the number of seating positions, then a filtered index on number_of_seats specifying where make="Ford" might make sense. But specifying where make="GM" in that case would be silly.
Choose subsets wisely: When deciding to create a filtered index on a subset of data, make sure the query optimizer will find the filtered index useful. Think about using filtered indexes to help queries that filter through unpopulated key values or useless values that the application does not care about. Have you ever supported an application that stopped using a particular value in a field? The primary type of data queried used to be 1-3, but now the new version of the application only populates and retrieves values 5-7. Filtered indexes are useful in such scenarios because they keep currently unused key values out of your nonclustered indexes.
Cover your queries: Make sure the query optimizer utilizes your filtered indexes by ensuring that they cover the intended queries. Limit the number of includes and key columns that exist in the filtered indexes. If the performance of your queries is fast enough just from creating filtered indexes alone, then you may not even have to add any include columns to those indexes. The bottom line: Make sure unused columns are not added to your indexes.

To create a filtered index, write a WHERE clause into your creation statement. The following syntax shows how and where to do that:

CREATE [CLUSTERED | NONCLUSTERED] INDEX index_name
ON <object>(column [ASC | DESC], [,...])
[INCLUDE ( column_name [,...n])
WHERE <filter_predicate>
[WITH (relational_index_options [,...n])

One example to demonstrate the usefulness of filtered indexes comes from an invoice system. Companies often provide goods and services for their customers and bill them later. These companies keep track of the invoices that they mail their customers and often generate reports or retrieve information pertaining to the unpaid invoices. With filtered indexes, you can create an index that only contains unpaid invoices. Such an index might be very useful when created on a table containing all invoices. Imagine a collection group that queries the table a number of times to contact customers for payment. Having a small index specifically covering customers with amounts due will tend to make the collection's groups queries run faster than they would if they each had to slog through an index including all of the older, paid-for orders.

The following code is an example of a script that creates a table and the appropriate indexes for the scenario we've just described:

USE AdventureWorks2008
CREATE TABLE dbo.Invoice
(
 InvoiceId INT IDENTITY(1,1),
 CompanyName VARCHAR(200),
 isPaid smallint
)
GO

CREATE CLUSTERED INDEX ix_Invoice
ON dbo.Invoice(InvoiceId)
GO

CREATE NONCLUSTERED INDEX ix_FilterUnPaid
ON dbo.Invoice(inVoiceStatus) include(companyName)
WHERE isPaid = 0

Reviewing this code, you can see that it creates an invoice table, a clustered index, and a nonclustered index. The nonclustered index is the filtered index that restricts data to rows where isPaid = 0. If you want to see the indexes work, use the following code to add data to the table:

INSERT INTO dbo.Invoice
(CompanyName,isPaid)
VALUES('Apress',0),
('Apress1',0),
('Sylvester123',1)

Once you have inserted the data, execute the query in Listing 1. Results of the query are shown in Figure 1.

Example 1. SQL Query Used to Retrieve Information from the Invoice Table with isPaid = 1.

SELECT CompanyName
FROM dbo.Invoice
WHERE isPaid = 1

Figure 1. The results of Listing 10-1

To view the indexes utilized during execution of the query, turn on the option to Include the Actual Execution Plan and re-run the preceding query. Figure 2 shows the execution plan used for the query.

Figure 2. The execution plan from Listing 10-1

The plan in Figure 2 shows that the query executed using the clustered index on the invoice table. The filtered index was not used because it only covers the case where isPaid=0, not where isPaid=1.

So the index scan has to be performed on the clustered index. Now let's see what happens when you modify the query to retrieve data that has not been paid. Execute the query in Listing 2 and see the results in Figure 3.

Example 2. SQL Statement to Retrieve the Invoices from the Invoice Table Where isPaid = 0

SELECT CompanyName
FROM dbo.Invoice
WHERE isPaid= 0

Figure 3. The results of Listing 10-2

Now, review the execution plan of Listing 2, as shown in Figure 4 . This time it shows that the optimizer was able to use the filtered index.

Figure 4. The execution plan for Listing 10-2

Filtered indexes can be very powerful. Don't go crazy with them, but it's worth researching to see whether you have opportunity to use them in your own environment.

4. Creating XML Indexes

SQL Server 2008 allows you to create two types of indexes on your XML data types: a primary XML index and a secondary XML index. Combined, the two index types cover the paths, values, properties, and tags within your XML column. Before creating XML indexes on a table, you must first create a clustered index. The clustered index is required in XML indexes to make sure that XML indexes can fit into existing partition schemas. Keep in mind that XML columns are large, as large as 2 GB, and that they are stored in binary large objects (BLOBs). If your application is utilizing XML columns frequently, you may want to spend some time understanding how XML indexes can benefit your system. The syntax for creating a primary and secondary index is straightforward:

CREATE [PRIMARY] XML INDEX index_name
     ON <object> (xml_column_name)
   [USING XML INDEX xml_index_name
    [ FOR { VALUE| PATH | PROPERTY} ] ]
[ WITH ( <xml_index_option> [ ,...n] ]

You will notice that an option exists to specify whether you are creating a primary XML index. When creating a secondary XML index, you simply employ the USING XML INDEX option to identify which primary XML index you are associating the index to. Then specify the FOR option to say whether your secondary index will be a value, path, or property index.

Let's create a table that stores XML execution plans. We'll use that table as the basis for creating primary and secondary XML indexes. Remember, we need a clustered, primary key on the table in order to create a primary XML index. Here's the table creation statement:

CREATE TABLE execPlans
( execPlanId INT IDENTITY(1,1),
 executionPlan XML

CONSTRAINT [PK_execPlans] PRIMARY KEY CLUSTERED
(
 [execPlanId] ASC
))
GO

4.1. Primary XML Indexes

You must create a primary XML index before creating any secondary indexes. A primary XML index arranges all tags, paths, and values of the data stored in the XML column. A primary XML index breaks an XML string down into multiple rows that represent the nodes of the XML BLOB. The values of the tags are returned when the XML column is queried. Because primary XML indexes contain the values of the tags, using primary XML indexes increases the performance of your queries when looking for values within the XML column.

The following example shows you how to create a primary XML index on the recently created table:

CREATE PRIMARY XML INDEX pindexExecPlan
ON execPlans(executionPlan)

GO

4.2. Secondary XML Indexes

After you have created a primary XML index on a column, you have the option of creating one or more secondary XML indexes on the same column. There are three types of secondary XML indexes that you can create: path, value, and property. The type of queries executed against the column should drive the secondary index types that you create. For example:

Creating a path index may increase the performance of application queries searching for paths within an XML document.
A secondary index on a specific property can help queries that look at specific property values within one or more XML tags.
Create value indexes to support queries that look at specific values enclosed by XML tags.

Now, let's create a secondary index that is a path index because we frequently look for missing index tags within the XML string. Here is our secondary index creation statement:

CREATE XML INDEX sindexExecPlan
ON execPlans(executionPlan)
 USING XML INDEX pindexExecPlan
 FOR PATH

Related -----------------

- SQL Server 2008 : Creating Indexes via T-SQL (part 2) - Creating Filtered Indexes & Creating XML Indexes

- SQL Server 2008 : Creating Indexes via T-SQL (part 1) - Creating Clustered and Nonclustered Indexes & Creating Unique and Primary Key Indexes

Other -----------------

- SQL Server 2008 : Creating Indexes via T-SQL (part 1) - Creating Clustered and Nonclustered Indexes & Creating Unique and Primary Key Indexes

- SQL Server 2008 : Index Vocabulary, Structure, and Concepts

- BizTalk 2009 : The Enterprise Service Bus Toolkit 2.0 - The Architecture (part 2) - Adapter Providers & Mediation Policies

- BizTalk 2009 : The Enterprise Service Bus Toolkit 2.0 - The Architecture (part 1) - Mediation Components & Resolvers

- BizTalk 2009 : The Enterprise Service Bus Toolkit 2.0 - BizTalk and the ESB Concept

- SQL Server 2008 High Availability : Log Shipping (part 2) - SharePoint and Log Shipping

- SQL Server 2008 High Availability : Log Shipping (part 1) - How to Configure Log Shipping

- Windows Server 2008 R2 : Manage Internet Information Services (part 2) - Remotely Manage IIS Servers & Manage IIS with PowerShell

- Windows Server 2008 R2 : Manage Internet Information Services (part 1) - Work with the IIS Management Console

- Microsoft Dynamics CRM 2011 : Merging Account or Contact Records