Defining, building, and maintaining an accurate and complete index is the cornerstone of accurate and complete search results.
This section covers the user
interface (UI) management tools for administering the crawl process,
including creating the Search application.
1. Search Support Staff
Most SharePoint farms and
support teams are small and do not have the dedicated Search
administrator that the product really requires and is designed to
support.
In a larger environment or one
in which the importance of search is acknowledged or is the pre-eminent
purpose of the SharePoint farm, search administration would be
accomplished by teams dedicated to
2. Farm-Wide Search Settings
Although most
configurations are unique to the search service instance, the farm-wide
settings are followed by all crawlers. Some settings that are identified
as farm settings are just default settings that can be overwritten by
local services settings. The settings page shown in Figure 1
can be accessed from the Central Administration page by clicking
General Application Settings and then Farm Search Administration under
the Search section.
The proxy settings are
configured the same as for Internet Explorer, with the exception of an
option that directs federated queries to use the same settings. The
default connection timeouts of 60 seconds are for connections to content
sources and for waiting for request acknowledgments. If the option
Ignore SSL Warnings is selected, the browser will treat sites as
legitimate even if their certificate name does not exactly match. If
this setting is not selected, a site with a faulty SSL certificate will
not be crawled.
All search service applications
for the farm will be listed in the lower section of the page. The
Search Administration page for the application can be accessed using the
hyperlinked name. The link to Modify Topology opens the same page as
the link provided on the Search Service Administration page.
3. Managing Crawler Impact Rules
Crawler impact rules are an
optional mechanism to control the rate at which the crawler indexes a
source. These settings are also farm-wide configurations but are applied
individually to each start address within a content source. The
management page can be accessed from the Central Administration page by
clicking General Application Settings and then Crawler Impact Rules,
under the Search section. It can also be opened from Search
Administration for any search service, but the configurations are always
farm-wide. On the Crawler Impact Rules page, click Add Rule to open the
Add Crawler Impact Rule page, shown in Figure 2, or select an existing rule to edit.
Valid crawl rules do not
define the protocol (http://, https://, or file://) because the rule
applies to all connectors. Following are some examples.
If you want to
limit the number of simultaneous requests, you can change the default of
8 to 1, 2, 4, 16, 32, or 64. For example, if you set the default to 16,
what you’re really doing is instructing the crawler to grab the next 16 documents for each start address
when the previous documents are done being processed by the indexer.
So, if you have four start addresses, then the crawler will connect and
download 64 documents (16 from each start address) simultaneously.
Note:
BEST PRACTICES
Even though you can fill a content source with up to 50 start
addresses, best practice is to keep that number much, much lower. The
optimal number of start addresses will vary depending on your server
resources available for indexing plus the available bandwidth between
your indexing servers and the content sources. You can determine a level
of optimization by using a combination of performance monitoring and
the speed at which your indexes can be built.
You can also configure the
crawler to request one document at a time and send the requests to the
queue. There is a large difference between 1 simultaneous request and a
1-second delay. Rarely will you need to set the delay greater than 1
second.
Note:
Reducing the crawl rate can extend the crawl time so much that the crawl does not complete before it’s time to start again.