Not all file types can be crawled by SharePoint 2010
out of the box. To expand the supported file types, Microsoft developed
iFilters, which act as plug-ins for the Windows operating system.
iFilters allow SharePoint to index file formats so that they are
understood by the search engine and are searchable. Without an
appropriate iFilter, SharePoint cannot understand the content of files
and search within them. iFilters allow the index to understand the
file's format, filter out embedded formatting, mine text from the file,
and return it to the search engine.
iFilters are available for
most major file types through a variety of vendors. In addition, there
are usually several vendors offering iFilters for the same file type.
Not all iFilters work the same, and depending on the amount of content
being crawled that includes file types requiring an iFilter, crawl
performance may be drastically different depending on the installed
iFilter. Slower iFilters result in slower crawl time since SharePoint's
index takes longer to understand the content of files. The PDF iFilter,
for example, is undoubtedly the most frequently implemented iFilter. The
crawl time for the three most popular PDF iFilters can be found in Figure 1.
While Adobe does offer a PDF
iFilter at no cost, it is not the most efficient solution. Several
third-party vendors, such as IFilter Shop, Foxit, and PDFlib, offer
their own versions of a PDF iFilter. The most popular is most likely the
Foxit PDF iFilter 2.0, which, according to Microsoft speed tests, works
39 times faster than the free Adobe offering. The most significant
crawl speed differences can be found on machines with multi-core
processors, as Foxit's iFilter makes efficient use of multi-threading
support. In addition to being the fastest PDF iFilter available, unlike
Adobe's iFilter, Foxit provides 24/7 technical support for their
product. Also while Adobe's PDF iFilter can index page contents and file
attributes, Foxit's PDF iFilter can also index PDF bookmarks and PDF
attachments. The license cost for the Foxit PDF iFilter 2.0 is around
US$700 and US$100 for annual maintenance per production server.
Non-production servers cost US$450 per server. A feature comparison of
the PDF iFilters available from Adobe, PDFlib, and Foxit is shown in Table 1.
Table 1. PDF iFilter Feature Comparison
Product Feature | Foxit PDF iFilter | PDFlib TET PDF iFilter | Adobe iFilter |
---|
Extract PDF content | X | X | X |
Extract PDF attributes | X | X | X |
Extract PDF bookmark | X | X | |
Extract PDF attachments | X | X | |
Add log settings | X | X | |
Extract PDF metadata | Some | Yes | Some |
Indexes XMP image metadata | | X | |
Performance Fastest | | Faster | Slow |
NOTE
More information about the PDF iFilters referenced in this section can be found at the following locations.
www.foxitsoftware.com/
www.adobe.com/support/downloads/detail.jsp?ftpID=4025
www.pdflib.com/products/tet-pdf-ifilter/?gclid=CPrc6cbv5aUCFYXD7QodJHeC1A
www.ifiltershop.com/
In addition to indexing PDF
file types, organizations may need to work with countless additional
document types. Companies that focus on engineering, manufacturing, or
design may need to index DWG-format CAD files, for example. Vendors such
as IFilter Shop and Ransdell & Brown, Inc. offer iFilters that
support this. IFilter Shop also offers a wide range of additional
iFilters such as ASPX, MSG, Microsoft Project, PostScript, RAR & ZIP
archives, vCard, Windows Media Audio and Video, and Adobe XMP. While
these are just a few of the vendors that offer iFilters, there is an
entire community of developers and consultants dedicated to enhancing
Microsoft technologies. Ask for their advice, consult the forums of the
company that produces the content type, and compare solutions. No matter
the environment's content needs, there is most likely a solution
available.
NOTE
More information about the iFilters referenced in this section can be found at the following locations.
www.dwgindex.com/DWGFilter.html
www.ifiltershop.com/