icon Microsoft Search Server
Created by Orckestra

Index various types of commonly used files on Microsoft Search Server

If you want the visitors to your website to search within the contents of your PDF or DOCX files make sure to set up indexing these file types as described below.

In brief, you should:

  1. Install the latest Microsoft Search Server package.
  2. Set up Media Remapper (part of the Microsoft Search Server package) to remap links to media files to include file names and extensions (only for older Composite C1 versions (< 3.0))
  3. Install the program or module that will handle filtering a specific file type
  4. Create a specific crawl rule
  5. Add the required file type on the Search Server
  6. Register file type filters for the search and web servers by editing the System Registry.
  7. Restart the search server

Remapping links to media files

Note: This step is only required for the Composite C1 versions earlier than 3.0 . If you use Composite C1 3.0 or later, you shoud skip this step.

A link to a media file in Composite C1 2.1.1 and earlier looks like: http://<server_name>/Renderers/ShowMedia.ashx?id=32c783df-416a-4189-a7c5-7a90a2992b1b, which can be a link to a PDF, DOCX, JPG or ZIP file. When you set up indexing specific file types such as PDF or DOCX on Microsoft Search Server, you explicitly specify the extension by which such a file is identified and thus indexed.

That's why you need to remap the default links to media files to links that will include their filenames and extensions.

The Microsoft Search Server package comes with a remapping functionality that turns the media file links (as shown above) into links that include filenames and extensions:

http://<server_name>/Renderers/ShowMedia.ashx/SimplePDFfile.pdf?id=32c783df-416a-4189-a7c5-7a90a2992b1b

To turn the remapping on:

  1. Open the Web.config on your C1 website.
  2. Add the following configuration under <httpModules> and <modules> (for IIS 7.0 or later) elements:

<add name="MssRemaper" type="Composite.Search.MicrosoftSearchServer.RemapperHttpModule" />

To view how the media file links will look when remapped, open the page with the links in a Web browser adding "?MicrosoftSearchServer=true" to the page's URL:

http://<server_name>/Home/Documentation.aspx?MicrosoftSearchServer=true

(In Composite C1 3.0, links to media file include paths and file extensions, e.g.:

http://<server_name>/media/95165d34-1ddb-422a-94ff-2e84db7d6221/Documentation/PDF/SimplePDFfile.PDF)

Setting up Adobe PDF iFilter to make PDF files searchable (Microsoft Search Server 2008 Express)

1. Download and install Acrobat Reader 9.1 (http://www.adobe.com/products/acrobat).

2. Make sure the "pdf" file extension is added to the File types crawled by Microsoft Search Server. If not added:

  • Open http://<Your Server Name>/ssp/admin/_layouts/managefiletypes.aspx
  • Click New File Type
  • Add "pdf" as the extension

3. Make sure that the "ashx" file extension is added to the File types crawled by Microsoft Search Server. If not added:

  • Open http://<Your Server Name>/ssp/admin/_layouts/managefiletypes.aspx
  • Click New File Type
  • Add "ashx" as the extension

4. To enable indexing of PDF files uploaded to the Media archive, add a new crawl rule "http://*/ShowMedia.ashx*":

  • Open http://<Your Server Name>/ssp/admin/_layouts/managecrawlrules.aspx
  • Click New Crawl Rule
  • Click to select "Include all items in this path"
  • Click to select "Crawl complex URLs (URLs that contain a question mark (?))"

5. Composite C1_1.2.3321.23866 and previous versions may require changes in the ~\Renderers\ShowMedia.ashx file. Please back up the original file before making changes and after line 59, add this:

if  (Composite.Security.UserValidationFacade.IsLoggedIn() == false)
{
 context.Response.Cache.SetExpires(DateTime.Now.AddMinutes(60));
 context.Response.Cache.SetCacheability(HttpCacheability.Private);
}

6. Make sure that the following System Registry keys are set to the new CLSID of the Adobe iFilter:

  • Locate or create HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
  • Set the Default value to {E8978DA6-047F-4E3D-9C78-CDBE46041603}
  • Repeat this step for the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf

7. Add the Installation directory of Adobe Reader 9.1 to the System Path:

  • Start > Settings > Control Panel > System > Advanced tab > Environment Variables
  • Double-click Path in the System Variables area
  • In the Edit System Variable dialog box, put the cursor at the end of the text in the Variable value box, and then type:
    ;<Your Drive Letter>:\Program Files\Adobe\Reader 9.0\Reader

    Note: For x64 bit versions, the Adobe Reader path is: "<Your Drive Letter>:\Program Files (x86)\Adobe\Reader 9.0\Reader".
  • Click OK in the opened windows to close them and save your changes.

8. Recycle the search service from the command line:

net stop oSearch
net start oSearch

9. Start Full Crawl of your content sources.

Setting up Adobe PDF iFilter to make PDF files searchable (Microsoft Search Server 2010 Express)

1. Download and install Adobe PDF iFilter 9 for 64-bit platforms (http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025 ).

2. Make sure the "pdf" file extension is added to the File types crawled by Microsoft Search Server. If not added:

  • Open http://<Server:Port>/_admin/search/managefiletypes.aspx?appid=<ApplicationGuid>
  • Click New File Type
  • Add "pdf" as the extension

3. Make sure that the "ashx" file extension is added to the File types crawled by Microsoft Search Server. If not added:

  • Open http://<Server:Port>/_admin/search/managefiletypes.aspx?appid=<ApplicationGuid>
  • Click New File Type
  • Add "ashx" as the extension

4. To enable indexing of PDF files uploaded to the Media archive, add a new crawl rule "http://*/ShowMedia.ashx*":

  • Open http://<Server:Port>/_admin/search/managecrawlrules.aspx?appid=<ApplicationGuid>
  • Click New Crawl Rule
  • Click to select "Include all items in this path"
  • Click to select "Crawl complex URLs (URLs that contain a question mark (?))"

5. Composite C1_1.2.3321.23866 and previous versions may require changes in the ~\Renderers\ShowMedia.ashx file. Please back up the original file before making changes and after line 59, add this:

if  (Composite.Security.UserValidationFacade.IsLoggedIn() == false)
{
 context.Response.Cache.SetExpires(DateTime.Now.AddMinutes(60));
 context.Response.Cache.SetCacheability(HttpCacheability.Private);
}

6. Make sure that the following System Registry keys are set:

  • Locate or create \\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\Filters\.pdf
  • Create and/or set the following values:
  • <REG_SZ> Extension = pdf
  • <REG_DWORD> FileTypeBucket = 1
  • <REG_SZ> MimeTypes = application/pdf
  • Repeat this step for the key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\14.0\Search\Setup\Filters\.pdf

7. Make sure that the following System Registry keys are set to the new CLSID of the Adobe iFilter:

  • Locate or create HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
  • Set the Default value to {E8978DA6-047F-4E3D-9C78-CDBE46041603}
  • Repeat this step for the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\14.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf

8. Add the Installation directory of Adobe PDF iFilter 9 for 64-bit platforms to the System Path:

  • Start > Settings > Control Panel > System > Advanced tab > Environment Variables
  • Double-click Path in the System Variables area
  • In the Edit System Variable dialog box, put the cursor at the end of the text in the Variable value box, and then type:
    ;<Your Drive Letter>:\Program Files\Adobe\Reader 9.0\Reader

    Note: For x64 bit versions, the Adobe Reader path is: "<Your Drive Letter>:\Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin\".
  • Click OK in the opened windows to close them and save your changes.

9. Recycle the search service from the command line:

net stop oSearch14
net start oSearch14

10. Restart IIS from the command line

iireset

11. Start Full Crawl of your content sources.

Having DOCX files indexed by Search Server

Note: Perform this procedure only onMicrosoft Search Server 2008 Express.You do not need to install the Filter Pack and set up indexing of DOCX files on Microsoft Search Server 2010 Express. It is already preconfigured.

1. Install Microsoft Filter Pack.

2. Make sure that the "docx" extension is added to the File types crawled by Microsoft Search Server. If not added:

  • Open http://<Your Server Name>/ssp/admin/_layouts/managefiletypes.aspx
  • Click New File Type
  • Add "docx" as the extension

3. Make sure that the "ashx" file extension is added to the File types crawled by Microsoft Search Server:

  • Open http://<Your Server Name>/ssp/admin/_layouts/managefiletypes.aspx
  • Click New File Type
  • Add "ashx" as the extension

4. To enable indexing of DOCX files uploaded to the Media archive, add a new crawl rule "http://*/ShowMedia*":

  • Open http://<Your Server Name>/ssp/admin/_layouts/managecrawlrules.aspx
  • Click New Crawl Rule
  • Click to select "Include all items in this path"
  • Click to select "Crawl complex URLs (URLs that contain a question mark (?))"

5. Modify the following System Registry keys by changing their Default value to the new CLSID:

  • Locate or create HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.docx
  • Set the Default value to {5A98B233-3C59-4B31-944C-0E560D85E6C3}
  • Repeat this step for the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.docx

6. Recycle the search service from the command line:

net stop oSearch
net start oSearch

7. Start Full Crawl of your content sources.

Having other file types indexed by Search Server

You can have some other file types (available via Microsoft Filter Pack) indexed by Microsoft Search Server. The procedure is similar to that described for DOCX above.

Note: You should perform this procedure only on Microsoft Search Server 2008 Express. You do not need to install the Filter Pack and set up indexing of DOCX files on Microsoft Search Server 2010 Express. It is already preconfigured.

Important Notes

  1. For x64 bit platforms, the Adobe Reader path for Microsoft Search Server 2007 Express is: "<Your Drive Letter>:\Program Files (x86)\Adobe\Reader 9.0\Reader".

See also

Back to top