The index architecture can be scaled out to handle
around 100 million items, custom connectors can be built through the BCS
so the index can access custom content sources, and the user interface
can be customized as needed to help users connect with content. While
the capabilities of SharePoint 2010 Search are extensive, some
organizations may need functionality beyond that which is supported by
the native platform. Depending on the needs, custom development may be
the most appropriate option. If custom development does not match an
organization's time requirements, budget, or available skillset,
third-party tools can be added to build out SharePoint 2010's
functionality. There are, however, certain cases where these options
still do not address the needs of an organization. In these cases, it
may be necessary to replace SharePoint 2010's search architecture.
These include limitations such as the
number of suggestions that can be presented for each refiner category, a
lack of document previews, and the limited ability to customize search
interfaces to user context. These limitations can be overcome through
commercially available SharePoint extensions. Back-end limitations, by
contrast, are directly tied to the core architecture of the search
solution. Back-end limitations include those such as the number of items
that can be indexed, the ease of access into content sources, and
manipulation of relevancy. A summary of the major back-end limitations
can be found at the end of this section in Table 1.
Unlike the front-end user-side features, which can be enhanced through
simple customization or commercial extensions, back-end limitations
require complete replacement of the search architecture and index
pipeline.
1. Replacement Considerations
The decision to replace the
search components in SharePoint 2010 is not one that should be taken
lightly. While there are many search engines available that can
integrate into SharePoint 2010, none does so without consequence. All
search engine replacements will require additional time to set up,
configure, and manage. They can bring advantages; but for many
organizations the disadvantages brought by the complexities of mixing
technologies in one farm do not justify the change. When analyzing
potential replacements for SharePoint 2010 Search, it is important to
fully understand the answers to these questions.
Do I need to index more than 100 million items?
SharePoint 2010 is capable of handling up to 100 million items if
properly scaled. There are few organizations in the world that break
this limit. For organizations that need to index more content than
SharePoint 2010 is capable of supporting, replacing SharePoint 2010's
native search pipeline is necessary. This is the single most compelling
reason to replace SharePoint 2010 Search.
Do I have an enterprise agreement for SharePoint?
Currently, Microsoft's replacement of the native SharePoint 2010 search
engine, FAST Search Server for SharePoint 2010, is available only to
deployments of the enterprise version of SharePoint. Organizations that
purchase the extension must be at the ECAL level.
Do I have the time to manage a more complex enterprise search engine?
While some enterprise search engine replacements are marketed to be
easier to manage, all bring the inherent complexity of another major
solution to manage. Unlike a search extension, which adds onto
SharePoint, replacing the search engine requires management of a second
set of search index architecture and software. The time it will take to
rebuild sites, manage metadata, secure permissions, and maintain
additional physical servers should be taken into account.
Can
the additional user interface features I need be achieved through an
extension of SharePoint or do I need to replace the core search engine?
It makes no sense to replace the entire search engine just to achieve
deep numbered refiners. Comparatively inexpensive commercial solutions
are available to meet these more basic needs. Unfortunately, countless
enterprise search engine replacement projects are started on this
illogical premise. Well-managed organizations don't pay hundreds of
thousands of dollars for the complexity of a search engine replacement
when a simple Web Part will fix the need. Before determining that
SharePoint search needs to be replaced, first look at how it can be
enhanced. Enhancements are generally much less expensive,
time-consuming, and disruptive to users.
Do I have the budget?
Enterprise search engines are expensive. Be prepared to allocate a
budget starting at US$30,000 for a minimal search engine replacement
such as a single-server Google Search Appliance. For enterprise-level
search engines such as Autonomy, Endeca, or FAST Search Server for
SharePoint 2010, be prepared to expect software costs that start at
US$100,000. Then be sure to appropriately budget for additional
hardware, professional services, maintenance, and training.
If, after analyzing answers to
these questions, the organization still needs to replace SharePoint
2010 Search, there are many solutions to consider. Enterprise search
providers such as Endeca, Autonomy, Coveo, Google, and FAST all offer
solutions that can replace SharePoint 2010's search architecture. The
enterprise search engines offered by these companies are designed to be
extremely scalable and crawl a broad range of content sources. In
addition, they allow administrators significantly more influence over
the index pipeline and relevancy. Each enterprise search provider caters
to a slightly different set of needs. Considering the significant
investment and impact, if an organization needs to replace SharePoint
2010 Search, it is best to contact each vendor and conduct a thorough
analysis of the available options.
The most widely popular replacements for SharePoint 2010's native search are the solutions offered by Google and Microsoft.
2. Google Search Appliance
Google's offering, the
Google Search Appliance (GSA), is designed to be a straightforward
plug-and-play solution to enterprise search. It is a packaged
combination of a standard rack-mounted server and administrative
software that can be plugged into a server rack to provide an instantly
scalable on-premise search solution for web sites and intranets.
The GSA found a rise in
popularity in MOSS 2007 for several reasons. Sadly, the most noteworthy
reason for the popularity of the GSA is its brand presence, since it is
offered by the world's leading global search provider. Implementations
on SharePoint also increased on SharePoint due to new migrations from
file shares to MOSS 2007, which saw a drastic jump in popularity over
its predecessors. Many organizations owned Google Search Appliances for
their web sites or file shares, and found a way to justify them in MOSS
2007. Its uptake also greatly benefited from the limited search
architecture scaling available in MOSS 2007. The GSA was able to take
advantage of a brute-force approach to searching massive amounts of
documents spread across many content sources. Simply adding another GSA
decreased crawl times and increased the maximum index size as well as
search speed. It provided benefits over MOSS's search user interface,
which included dynamic navigation, advanced query syntax, query
suggestions, automatic spellcheck, and result groupings based on topic.
It also opened up a broad range of reporting and analysis features
through Google Analytics not available in MOSS 2007. Some of these
analytics can be seen in Figure 1.
With SharePoint 2010, it is
apparent that Microsoft took note of the loss of market share around
search. The majority of the features that made the Google Search
Appliance stand out in the MOSS 2007 era were integrated into native
SharePoint 2010. The search user interface features such as related
queries, query syntax, and refiners have all been integrated into
SP2010. Figure 2
shows the GSA search experience on the platform, and it is helpful for
understanding the basic user interface differences. In addition to the
decreased gap in the user interface, with the capabilities of the BCS,
developers can more easily connect SharePoint to a wide range of content
sources. SharePoint 2010's native search architecture is also
significantly more scalable than MOSS 2007.
While the improvements in
SharePoint 2010 Search greatly reduce the feature gap between the
Google Search Appliance and SharePoint, there are some remaining
benefits to Google. Depending on the GSA model an organization chooses
to implement, the index is marketed to be infinitely scalable to
billions of items. By contrast, SharePoint 2010's index is capped around
100 million items. The GSA does support more content sources out of the
box without knowledge of the connector framework. This is important for
organizations that want to easily connect to content located in EMC
Documentum, IBM FileNet, Hummingbird, Lotus Notes, Oracle Content
Server, and SAP KM.
It is also noteworthy to
mention that Google's relevancy is primarily beneficial in the public
domain. The factor of relevancy combined with a questionable history of
security is partly the reason most GSA deployments can be found on
public sites and not intranets. The technique of crawling global web
sites is quite different than the techniques used to provide relevant
search results on an intranet. Global search engines are used to connect
people with general information scattered around the Web. They function
similarly to the yellow pages, in that people are frequently searching
for general concepts and not specific items. For example, like the
yellow pages, on a global search engine, users may search for general
concepts, such as shoe stores in their city. They are not frequently
searching for a specific pair of shoes located at a specific branch of a
store. If users want to find a specific pair of shoes at a specific
store, they do a global search to find the store's web site, and then
call or search again within the web site using the site's search engine.
The user experience when searching within intranets is quite the
opposite. They are generally looking for a specific item, authored by a
specific person, within a specific site. The ability to present relevant
results based on this specificity is what makes SharePoint's relevancy
shine on intranets.
Although the Google Search
Appliance is one of the least expensive options for replacing
SharePoint 2010 Search, it is still not cheap. Management of search
still requires time and attention. Setup time is slightly less than the
amount of time necessary to set up search in SharePoint 2010, but not
drastically reduced. Pricing for the appliance, which includes both
hardware and software, is based on the number of indexed documents. At
the end of 2010, pricing for the basic appliance model started around
US$30,000 for a two-year contract and the ability to index 500,000
items. The basic model can be scaled to support up to 10 million items,
and the more powerful model can support up to 30 million items per
appliance. Unlike most enterprise search platforms, which charge a
one-time license fee and annual support, Google licenses the GSA in two-
or three-year leases. When the contract period expires, the unit stops
serving data. It can then be returned or replaced with initiation of a
new contract.
NOTE
Google's search
appliance is not the same as Google Mini. Google Mini is a simple search
engine for use with less than 100,000 items, and is primarily designed
for web sites. SharePoint 2010's search capabilities are significantly
more advanced than Google Mini, and as a result it would not be a viable
replacement for SharePoint search.