Windows IT Pro is the authoritative and independent resource for windows nt, windows 2000, windows 2003, windows xp. Features a collection of resources and magazines for windows IT professionals.
  
  
  Advanced Search 


July 24, 2006

Making Sense of SharePoint Search

Learn the finer points of search architecture to make informed deployment decisions
RSS
Subscribe to Windows IT Pro | See More Administration Articles Here | Reprints | Or get the Monthly Online Pass—only $5.95 a month!
SideBar    Types of SharePoint Sites

In a SharePoint environment, users can perform searches from team sites hosted by Windows SharePoint Services and from portal sites hosted by Microsoft Office SharePoint Portal Server 2003. (For definitions of team sites, portal sites, and other types of sites, see the sidebar "Types of SharePoint Sites.") SharePoint Services and Portal Server use the same search engine, but the different ways they use that engine result in an inconsistent search experience for end users.

Learning about the two products' different search architectures can help you decide whether your users would benefit most by using SharePoint Services, Portal Server, or both together. If you haven't yet deployed SharePoint, understanding SharePoint search architecture will help you plan for an eventual deployment.

Major Search Components
Microsoft Search Service (MSSearch) is the generic search service that Microsoft products, including SharePoint Services and Portal Server, use to varying degrees. It supports three query languages—Query Dialect 1, SQL full-text extensions, and Query Dialect 2—which let users perform varied forms of queries. MSSearch creates full-text indexes on content and properties of structured and semistructured data and allows fast linguistic searches on this data.

For an index to be truly useful, however, it must support the content sources that users are interested in (e.g., file shares, Web content, Exchange Server folders, SharePoint-based sites) and the types of data within those content sources, and it must offer rich retrieval capabilities. The applications that use the index need to provide the content to be indexed, format user queries, and return the results of searches to the requesting clients. Consequently, these applications in effect control the overall search experience, and some exercise that control much better than others. Portal Server, for example, provides content to be indexed from many sources, such as Exchange public folders and IBM Lotus Notes databases, whereas SQL Server full-text indexing provides content only from SQL Server tables.

MSSearch relies on several major components to support full-text indexing. Figure 1 shows the following components:

  • Protocol handlers—access data over a particular protocol or from a particular store. Common protocol handlers can access data from file shares, Web sites, public folders, Lotus Notes, and SQL Server databases. The protocol handler processes URLs passed to it by the gatherer.
  • Gatherer—maintains the queue of URLs to be accessed across protocols. For each document accessed, the gatherer fetches the stream of content from the protocol handler and passes it to the appropriate filter.
  • Filters, also known as IFilters—extract textual information from a specific document format and pass strings of text and property/value pairs to the indexing engine. For example, the Microsoft Office IFilter extracts terms from Microsoft Word, Excel, and PowerPoint files. Other IFilters work with HTML or email messages. Third-party providers offer other specialized filters; for example, Adobe Systems provides a PDF IFilter. Without an IFilter, the text within the file can't be indexed.
  • Word breakers and stemmers—word breakers determine where the word boundaries are in the stream of characters in the query or the document being crawled and break up compound words and phrases for the full-text index. A stemmer extracts the root form of a given word. In some languages, the stemmer expands the root form of a word to alternative forms—for example, providing running, ran, and runner from the root word run .
  • Indexer—prepares an inverted index of content. An inverted index is a data structure with a row for each term. Each row contains information about the documents in which the term appears and the number of occurrences and relative position of the term within each document. The inverted index provides MSSearch with the ability to apply statistical and probabilistic formulas to quickly compute the relevance of documents.
  • Alerts—notify users when new or updated content matches any queries they've stored, as well as when changes have been made to documents, sites, lists, and libraries.
  • AutoCat—provides a way for items in a portal site to be automatically cataloged with existing items in other areas. AutoCat uses a Topic Assistant, which suggests alternative locations where items could logically be added to lists or document libraries.
  • Catalogs—store full-text indexes within the file system on the server that runs the MSSearch service. Applications interact with MSSearch to maintain the indexes and the catalogs.

Individual products also contribute to MSSearch's search architecture. For example, Portal Server adds Alerts to the search architecture, and third parties can provide filters and protocol handlers. However, not all products benefit from all of MSSearch's search components. For example, only Portal Server uses Alerts, but any product that uses the MSSearch architecture will benefit from IFilters.

Now that we understand a little about the search architecture, let's see how SharePoint Services and Portal Server differ in the way they use the architecture. We'll also look at how this difference results in different outcomes for end users searching SharePoint sites.

Searching with SharePoint Services
SharePoint Services relies on SQL Server full-text indexing, which in turn interacts with MSSearch. Thus, the search capabilities that SharePoint Services can offer to team sites are limited by what SQL Server full-text indexing can do. One prerequisite for enabling search within a team site is that SharePoint Services must use SQL Server rather than Windows MSDE (WMSDE) as its storage engine. By default, full-text indexing is disabled, but you can enable it by using the SharePoint Central Administration page. No search links will appear on team sites unless you enable full-text indexing.

SQL Server full-text indexing provides a protocol handler for MSSearch that understands only the content of SQL Server databases and tables. Therefore, the results of a search initiated from a team site include content only from SQL Server. When you enable searching in SharePoint Services, you enable it for all team sites because content from different team sites isn't separated into different tables. For example, the Docs table in a content database contains documents stored in all libraries within all team sites that use that content database.

A search from a team site supports stemming and inflectional forms (i.e., different forms of a word) of the supplied query terms. In SharePoint Services, you're limited to using exceptionally basic search terms, and neither phrase nor Boolean searches are supported. Additionally, you can't narrow a search to within a previous result set, and you won't see any indication of the number of hits. Although you can limit a search's scope to the contents of an individual list, the search is performed against the current view of that list and won't return results for items that are filtered out of that view.

SharePoint Services' search capabilities are fairly inflexible and not designed for enterprise use. SharePoint Services indexes document content only, not any properties that might further describe the document's contents. Even if the search engine did index document properties, the query interface doesn't support advanced search, so users wouldn't find that information. Users are limited to doing a full-text search on content only. Furthermore, SharePoint Services' search capabilities are limited in scale, severely restricting their use in large server farms, which could conceivably contain millions of documents across thousands of team sites.

   Previous  [1]  2  Next 


Reader Comments
best

Bashir January 15, 2007 (Article Rating: )


You must log on before posting a comment.

If you don't have a username & password, please register now.




Top Viewed ArticlesView all articles
Friday at PASS Europe 2006

Kevin talks about the closing day of the event and shares a funny Microsoft film. ...

More fun TechEd 2005 Resources

Kevin points out some more TechEd resources ...

What service packs and fixes are available?

...


Windows OSs Whitepapers Why SaaS is the Right Solution for Log Management

Are You Satisfied?

A Preliminary Look at Deployment Plans for Microsoft Windows Vista

Related Events Check out our list of Free Email Newsletters!

Windows OSs eBooks Understanding and Leveraging Code Signing Technologies

A Guide to Windows Certification and Public Keys

SQL Server Administration for Oracle DBAs

Related Windows OSs Resources Become a VIP member of the Windows IT Pro community!
Get it all with the VIP CD and VIP access. A $500+ value for only $279!

Subscribe to Windows IT Pro!
Solve your toughest technical problems with our experts and access 10,000 + articles online. 30% off

Monthly Online Pass - Only $5.95!
Get instant access to 10,000+ articles from Windows IT Pro Magazine!

TechNet Virtual Labs
Evaluate and test Microsoft's newest products.

Job Openings in IT


ADS BY GOOGLE SPONSORED LINKS FEATURED LINKS

Microsoft Exchange & Windows Connections event returns to Las Vegas Nov 10 - 13
Connections returns to Las Vegas for this exciting event where each attendee will receive SQL Server 2008 standard with 1 CAL. Co-located with Microsoft ASP.NET, SQL Server, and SharePoint Connections with over 250 in-depth sessions.

Free Online Event! Virtualization:Get the Facts!
Register now and attend this free, live in-depth online conference on November 13 and 20, 2008, produced by Windows IT Pro. All registrants are eligible to receive a complimentary one-year digital subscription to Windows IT Pro (a $49.95 value)!

Check Out Hyper-V Video on ITTV
Watch Karen Forster's interview on Hyper-V's performance on ITTV.net.

Ease Your Scripting Pains with the Flexibility of PowerShell!
Join MVP Paul Robichaux on December 11, 2008 at 11:00 AM EDT as he equips you with PowerShell basics in 3 introductory lessons, each followed by a live Q&A session—all on your own computer!

Latest Advancements in SSL Technology
There are a variety of different kinds of SSL to explore to ensure customer data is kept confidential and secure. In this paper, we will discuss some of these SSL advances to help you decide which would be best for your organization.

PASS Community Summit 2008 in Seattle on Nov 18-21
The don’t-miss event for Microsoft SQL Server Professionals. Register now and you’ll enjoy top-notch Microsoft and Community speakers and more.



Solving PST Management Problems
In this white paper, read about the top PST issues and how to administer local/network PST Files.

Get Protected -- Data Protection Manager 2007
Protect your virtualized environment with Data Protection Manager

Order Your SQL Fundamentals CD Today!
Learn how to use SQL Server, understand Office integration techniques and dive into the essentials of SQL Express and Visual Basic with this free SQL Fundamentals CD.

Maximize Your SharePoint Investment: Get Your Data Moving
Watch this web seminar now to learn how to maximize your SharePoint investment! Join us as we take a look at the complex business of securing, accessing and managing vast amounts of information in a global network and various ways to get your data moving.
Windows IT Pro Home Register FAQ for Windows WinInfo News
Europe Edition About Us Contact Us/Customer Service Media Kit Affiliates / Licensing  
SQL Server Magazine Office & SharePoint Pro Windows Dev Pro IT Job Hound ITTV
IT Library Technology Resource Directory Connected Home Windows Excavator Windows SuperSite 
 
 Windows IT Pro is a Division of Penton Media Inc.
 Copyright © 2008 Penton Media, Inc., All rights reserved. Terms and Use | Privacy Statement | Reprints and Licensing