The first step in query optimization is to analyze each table in the query to identify all search arguments (SARGs), OR clauses, and join clauses. The SARGs, OR clauses, and join clauses are used in the second step, index selection, to select useful indexes to satisfy a query.
Identifying Search Arguments
A SARG is defined as a WHERE clause that compares a column to a constant. The format of a SARG is as follows:
Column operator constant_expression [and...]
SARGs provide a way for the
Query Optimizer to limit the rows searched to satisfy a query. The
general goal is to match a SARG with an index to avoid a table scan.
Valid operators for a SARG are =, >, <, >=, and <=, BETWEEN, and LIKE. Multiple SARGs can be combined with the AND clause. (A single index might match some or all of the SARGs ANDed together.) Following are examples of SARGs:
flag = 7
salary > 100000
city = 'Saratoga' and state = 'NY'
price between $10 and $20 (the same as price > = $10 and price <= $20)
100 between lo_val and hi_val (the same as lo_val <= 100 and hi_val >= 100)
au_lname like 'Sm%' (the same as au_lname >= 'Sm' and au_lname < 'Sn')
In some cases, the column in
a SARG might be compared with a constant expression rather than a
single constant value. The constant expression can be an arithmetic
operation, a built-in function, a string concatenation, a local
variable, or a subquery result. As long as the left side of the SARG
contains a column, it’s considered an optimizable SARG.
Identifying OR Clauses
The next statements the Query Optimizer looks for in the query are OR clauses. OR clauses are SARGable expressions combined with an OR condition rather than an AND condition and are treated differently than standard SARGs. The format of an OR clause is
with all columns involved in the OR belonging to the same table.
This IN statement
column in ( constant1, constant2, ...)
is also treated as an OR clause, becoming this:
column = constant1 or column = constant2 or ...
Some examples of OR clauses are as follows:
where au_lname = 'Smith' or au_fname = 'Fred'
where (type = 'business' and price > $25) or pub_id = "1234"
where au_lname in ('Smith', 'Jones', 'N/A')
An OR
clause is a disjunction; all rows matching either of the two criteria
appear in the result set. Any row matching both criteria should appear
only once.
The main issue is that an OR clause cannot be satisfied by a single index search. Consider the first example just presented:
where au_lname = 'Smith' or au_fname = 'Fred'
An index on au_lname and au_fname helps SQL Server find all the rows where au_lname = 'Smith' AND au_fname = 'Fred', but searching the index tree does not help SQL Server efficiently find all the rows where au_fname = 'Fred' and the last name is any value. Unless an index on au_fname exists as well, the only way to find all rows with au_fname = 'Fred' is to search every row in the table or scan every row in a nonclustered index that contains au_fname as a nonleading index key.
An OR clause can typically be resolved by either a table scan or by using the OR strategy. Using a table scan, SQL Server reads every row in the table and applies each OR criteria to each row. Any row that matches any one of the OR criteria is put into the result set.
A table scan is an expensive way to process a query, so the Query Optimizer looks for an alternative for resolving an OR. If an index can be matched against all SARGs involved in the OR
clause, SQL Server evaluates the possibility of applying the index
union strategy .
Identifying Join Clauses
The next type of clause the
Query Optimizer looks for during the query analysis phase is the join
clause. A join condition is specified in the FROM clause using the JOIN keyword, as follows:
FROM table1 JOIN table2 on table1.column = table2.column
Alternatively, join conditions can be specified in the WHERE clause using the old-style join syntax, as shown in the following example:
Table1.Column Operator Table2.Column
A join clause always involves
two tables, except in the case of a self-join, but even in a self-join,
you must specify the table twice in the query. Here’s an example:
select employee = e.LastName + ', ' + e.FirstName,
manager = m.LastName + ', ' + m.FirstName
from Northwind..Employees e left outer join Northwind..Employees m
on e.ReportsTo = m.EmployeeID
order by 2, 1
SQL Server treats a self-join just like a normal join between two different tables.
In
addition to join clauses, the Query Optimizer also looks for
subqueries, derived tables, and common table expressions and makes the
determination whether they need to be flattened into joins or processed
using a different strategy.