Meta Tags-MS.Category

Example: 
 
Recommendation:
 
Complete Syntax: 
 
Length:  Minimum     n/a                     Maximum     n/a                                     Recommended    n/a
Usage:
 
Description:
 
Comments:
 
Examples:
 
Google-Comments:
Yahoo-Comments:
MSN-Comments:
AOL-Comments:
Ask Jeeves-Comments:
AltaVista-Comments:
Excite-Comments:
HotBot-Comments:
Itomi-Comments:
InfoSeek-Comments:
Lycos-Comments:
NorthernLight-Comments:
 
USA  Usage/Comments:
UK    Usage/Comments:
CDN Usage/Comments:
DCMI Usage/Comments:
Other International/Comments:
 
Commerical Usage/Comments:
Governmental Usage/Comments:
Education Usage/Comments:
Non-profit Usage/Comments:
 
HTML 1.0
HTML 2.0
HTML 3.2
HTML 4.0
XHTML
DHTML
eGMS
PICS
DCMI
W3C
 
 
 
 
From: Kristie
Date: 09/17/05 17:21:53
Subject:
 
 please dont rip this site  Microsoft Index Server: Filter DLLs
... property set. <meta name="ms.category" content="fiction">, The category property
in the document summary information property set. The ...
massmind.org/techref/language/asp/ix/ixfildll.htm - 27k - Supplemental Result -

Filter DLLs

A filter DLL “understands” one or more document formats and is capable of extracting text and properties out of those document types. A filter DLL implements the IFilter ActiveX interface. The CiDaemon process uses the IFilter interface to extract the text out of a document. To track down a problem with a filter DLL, an administrator needs to know where to look to determine the filter DLL for a particular document. Editing the registry is also a good way to avoid filtering documents with no useful content.

This topic contains:

Pre-installed Filters

The list of document types for which filters are pre-installed is given below:

  • HTML version 3.0 or lower
  • Microsoft® Word
  • Microsoft® Excel
  • Microsoft® PowerPoint®
  • Plain Text
  • Binary Files

HTML Filter

The HTML filter will not index any of the contents or properties of an HTML file if the HTML file contains the following meta tag:

<meta name="robots" content="noindex">

A Webmaster can add this meta tag to selectively avoid indexing certain HTML files.

If an HTML file contains the following meta tag, the content field specifies the language code:

<meta name="ms.locale" content="EN">

The file is filtered by the language resources for that particular language (if available).

The content field in the tag can also specify the locale by a decimal number, such as 1033, which is the locale ID for U.S. English.

Some meta tag properties are mapped onto the Microsoft® Office property sets to allow users to mark HTML pages with the same properties in the Office property set. The list of properties that are mapped are:

Property example Mapped to
<meta name="author" content="ruth"> The author property in the summary information property set.
<meta name="subject" content="word processing"> The subject property in the summary information property set.
<meta name="keywords" content="fonts, serif"> The keyword property in the summary information property set.
<meta name="ms.category" content="fiction"> The category property in the document summary information property set.

The HTML filter extracts text from the content field of a meta element. For example, if an HTML file has this line:

<META NAME="DESCRIPTION" CONTENT="Sample query form for Microsoft Index Server">

Then a user can query the information in the content field, namely “Sample query form for Microsoft Index Server”, by using the HTML meta property. The GUID for the meta property is D1B5D3F0-C0B3-11CF-9A92-00A0C908DBF1 and the property name is specified by the name field, or the HTTP-EQUIV field. In the above example, the property name is DESCRIPTION. Thus a friendly name, for example MetaDescription, for the meta property can be defined as

MetaDescription(DBTYPE_WSTR|DBTYPE_BYREF) = D1B5D3F0-C0B3-11CF-9A92-00A0C908DBF1 description

The GUID for meta property is a registry parameter located at

HKEY_LOCAL_MACHINE
 \System
  \CurrentControlSet
   \Control\HtmlFilter
    \MetaTagClsid

The HTML filter emits scripting code embedded in an HTML page as a script property with the GUID 31F400A0-FD07-11CF-B9BD-00AA003DB18E. The property name of the script is specified by the language field of the script tag, for example:

<script language="vbscript">

In this example, the property name is vbscript. If no language field is specified, then the language field of an earlier script tag in the HTML page is used. If no earlier script tag is specified, then the property name defaults to javascript. The GUID for the script property is a registry parameter located at

HKEY_LOCAL_MACHINE
 \System
  \CurrentControlSet
   \Control\HtmlFilter
    \ScriptTagClsid

Document types and the associated filter DLL entries are specified in the registry under the \HKEY_LOCAL_MACHINE\Software\Classes tree. To find out the filter DLL associated with a particular document type, navigate through the registry entries in the \HKEY_LOCAL_MACHINE\Software\Classes tree.

Binary Files — NULL Filter

When a registered binary file is encountered, the NULL filter is used. The NULL filter retrieves only the system properties. The contents of a binary file are not filtered. Examples of system properties are the FileName, last Write time, file Size, Attributes, and so on.

For more information about binary files, see Registering File Types as Binary Files

Default Filter

In Index Server, a default filter filters both the system properties (such as file name) and the contents of a file. The default filter does not “understand” any document formats; when filtering the contents of a file, it treats the file as a sequence of characters. Index Server uses the default filter when the file-name extension of a file has no association in the registry, and if the value of the registry setting FilterFilesWithUnknownExtensions is 1.

Note   The default filter filters plain text and files of unknown origin. It assumes all text to be in the default code page of the server.

Corrupted Files

If a file is corrupted, the filter may not be able to properly interpret the contents of that file. To learn how to get a list of files that could not be filtered, see Unfiltered Files. An event is also written to the event log. Sometimes a file cannot be filtered because of a defective third-party filter. After verifying the contents of a file, an administrator should report the problems to the filter vendor. Files protected by passwords are not filtered.

Maximum Retries

If a document cannot be filtered, it will be retried a certain maximum number of times. If the document still cannot be filtered, then it will be considered to be an unfiltered file. The registry key FilterRetries controls the maximum number of retries for a document.

To get a list of all the files that could not be filtered

 

-->
  1. Click Start, point to Programs, point to Windows NT 4.0 Option Pack, point to to Microsoft Index Server, and click Index Server Manager (HTML).
  2. In the View unfiltered documents field, click Start.

Unknown Extensions

A file with an extension that does not have an association in the registry is treated as an Unknown Extension. The behavior of Index Server depends upon the registry setting FilterFilesWithUnknownExtensions. If this value is set to 0, then the NULL Filter is used to filter those files. Otherwise, the default filter is used to filter the contents.

Filtering Directories

By default, directories are not filtered and will not appear in query results. To filter directories, set the registry key FilterDirectories to 1. When directories are filtered, their system properties are filtered.

Characterization

CiDaemon process is capable of automatically generating a summary or characterization (also called abstract) for each document. If the registry key GenerateCharacterization is set to 1, the characterization will be automatically generated. The maximum number of characters in the generated characterization is controlled by the registry key MaxCharacterization.

If the characterization is set to be generated automatically, Index Server takes by default the first 320 characters of a document and copies that block of text for the summary. You can override this automatic selection by inserting a meta tag in each document with your own customized summary. Put all meta tags within the header of an HTML file, as shown in the following example.


<head>
<META NAME="DESCRIPTION" CONTENT="This text will appear on the results page 
as the document's summary.">
</head>

Adding Filter DLLs

To add new filter DLLs, please refer to the documentation provided with the filter DLLs. You can register and unregister DLLs with the registry utility (Regsvr32.exe).

Removing Filter DLLs

To remove a filter DLL, the IFilter PersistentHandler entry associated with a document type and the filter DLL entry must be deleted. See Finding the Filter DLL for a Document. Once you have found the correct IFilter PersistentHandler entry, you can unregister it with the following syntax:


Regsvr32.exe /u

For an example, see Removing a Filter.

Finding the Filter DLL for a Document

The following example shows how to find out the filter DLL for a document. This example is for HTML files.

Step 1: Determine the CLSID

Find the CLSID associated with the document type under the registry key \HKEY_LOCAL_MACHINE\SOFTWARE\Classes. Let this be <Value1>.

\HKEY_LOCAL_MACHINE\SOFTWARE\Classes
    htmlfile
        = Class for WWW HTML files
        CLSID
            = {25336920-03F9-11CF-8FD0-00AA00686F13}

Step 2: Determine the Persistent Handler

Using <Value1> found out in Step 1, find the PersistentHandler value for the \HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\<Value1> key. Let this be <Value2>.

\HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID
        {25336920-03F9-11CF-8FD0-00AA00686F13}
            = WWW HTML files
            PersistentHandler
                = {EEC97550-47A9-11CF-B952-00AA0051FE20}

Step 3: Determine the IFilter Persistent Handler GUID

Using <Value2> determined in Step 2, find the IFilter Persistent Handler GUID for the document type. The value under the key \HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\<Value2>\PersistentAddinsRegistered\
89BCB740-6119-101A-BCB7-00DD010655AF yields the IFilter Persistent Handler GUID for this document type. Let this be <Value3>. 89BCB740-6119-101A-BCB7-00DD010655AF is the IFilter interface GUID.

\Registry\Machine\Software\Classes\CLSID
      {EEC97550-47A9-11CF-B952-00AA0051FE20}
           = REG_SZ HTML File Persistent Handler
        PersistentAddinsRegistered
            {89BCB740-6119-101A-BCB7-00DD010655AF}
                = REG_SZ {E0CA5340-4534-11CF-B952-00AA0051FE20}

Step 4: Determine the Filter DLL

Using <Value3> determined in Step 3, the filter DLL can be found under the entry \HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\<Value3>\InprocServer32.

\Registry\Machine\Software\Classes\CLSID
     {E0CA5340-4534-11CF-B952-00AA0051FE20}
        = REG_SZ HTML Filter
        InprocServer32
            = REG_SZ nlhtml.dll

In this example, the filter DLL for HTML documents is nlhtml.dll.


  

    •  
     
    •  
      

     
     

     

    Space  Contents Previous Next


     

     Microsoft Index Server Release Notes
    ... <meta name="ms.category" content="fiction">, The category property in the
    document summary information property set. Hit Highlighting. ...
    www.rialto.k12.ca.us/INDEXSRVR/srchadm/help/README.HTM - 101k - Supplemental Result -

    Microsoft Index Server
    Release Notes


    Thank you for downloading and installing Microsoft® Index Server version 1.1 for Windows NT® Server. This file lists the changes made to Index Server since its beta release. There are also several installation items to note. These notes are mainly for users who have a previous version of Index Server installed on their computers and are upgrading to the latest version. These changes and notes are summarized on this page.


    To TopVisit the Index Server Home Page

    For more information about Index Server and related features, see the home page at the following address:

    http://www.microsoft.com/ntserver/search


    To TopSample Files

    The sample files (such as Query.htm) were replaced. If you modified any of the sample files and did not move or rename them, they were overwritten.


    To TopInstalling Index Server

    Installing Index Server will reset the registry settings to their defaults. If you have modified the registry settings for Index Server, you will have to reset the values to your preferences after installation.


    To TopSupport for Microsoft Internet News Service

    If Microsoft Internet News Server has been installed on a server along with Index Server version 1.1, then news articles can be indexed. You can find additional sample query forms written for a news server on the Index Server home page.

    The virtual paths produced by Internet Information Server (IIS) convert the dot between newsgroup components to a slash. For example:

    News Group Converted To
    comp.os.ms-windows.advocacy /comp/os/ms-windows/advocacy

    Note   The path /comp/os/ms-windows/advocacy is not a valid virtual path in IIS.

    New Default Properties

    The followingproperties are always available for queries to newsgroups.

    Friendly Name Datatype Property
    NewsGroup DBTYPE_WSTR | DBTYPE_BYREF Newsgroup to which article was posted.
    NewsGroups DBTYPE_WSTR | DBTYPE_BYREF Full set of newsgroups to which article was cross-posted.
    NewsSubject DBTYPE_WSTR | DBTYPE_BYREF Subject line of news article.
    NewsFrom DBTYPE_WSTR | DBTYPE_BYREF Author of news article.
    NewsMsgId DBTYPE_WSTR | DBTYPE_BYREF Globally unique message ID of article.

    Special Requirement for Hit Highlighting

    The hit highlighter (Webhits.exe) is a Common Gateway Interface (CGI) application that must be stored in a valid virtual path with Execute permission. If you want to highlight hits in news articles, add virtual roots, each beginning with /$CiNews and corresponding to every root in the news server. Make sure that virtual roots in IIS beginning with /$CiNews have both Read and Execute permissions turned off.

    For example, if rec.sports.* was being stored at C:\Sports and the default (home) news root was C:\Inetpub\Nntproot, two new virtual roots would be added. /$CiNews/rec/sports=D:\Sports and /$CiNews=C:\Inetpub\Nntproot. The Read and Execute permissions are not enabled for these virtual roots.

    When running Webhits.exe, be sure to put the virtual path /$CiNews/<%vpath%> into the .htx file in the call to Webhits.exe.

    Note   The hit highlighter does not check Read permissions for virtual roots beginning with /$CiNews/.

    NNTP Virtual Roots with UNC Shares

    If a virtual root on a news server points to a universal naming convention (UNC) share, administrators must add a virtual root in IIS. The Network News Transfer Protocol (NNTP) virtual root must be prepended with /$CiNews to highlight the news articles stored on that UNC share by using Webhits.

    Example

    Assume the following in the news server setup:

    • /rec.food points to \\Server1\Share1\Dir1
    • The user ID is Gourmet\Chef1 (in the form domain\username)
    • The password is Marinade

    In IIS, set up a virtual root with the following properties:

    • /$CiNews/rec.food pointing to \\Server1\Share1\Dir1
    • The user ID is Gourmet\Chef1
    • The password is Marinade
    • Both Read and Execute permissions are turned off

    Important   Be sure to turn off the Read and Execute permissions on virtual roots prepended with /$CiNews.


    To TopChanges to the Documentation

    This section details changes and additions to the existing documentation.

    Basic Administration

    In the sections that discuss the variables PROOT_virtual and INDEX_virtual root, (Enabling Indexing of a Virtual Root and Forcing a Scan of a Virtual Root), if the root is a news root, these variables are PROOT_NNTP_virtual and INDEX_NNTP_virtual root.

    List of Virtual Roots

    You can determine the type of a virtual root while making the VIRTUAL_ROOTS query. Look at the value of the special property StorageType—(DBTYPE_UI4) = b725f130-47ef-101a-a5f1-02608c9eebac 4. The value 0 identifies a Web root. The value 1 identifies a news root.

    Error Messages

    This section lists addtions and corrections to the Index Server error messages, contained on the Error Messages page.

    Event Log Messages

    Message Explanation
    Account user-id does not have interactive logon privilege on this computer. You can give user-id interactive logon privilege on this computer using the user manager adminstrative tool. The specified does not have interactive logon permisson on the computer running Index Server. Give the user-id interactive logon privilege through the User Manager for Domains.

    Results Page

    At the bottom of a results page, you may periodically see the following message:

    Message Explanation
    The index is out of date. Files have been modified since the last time the scope of your query was indexed. Whenever files in a scope are modifed, Index Server re-indexes them automatically whenever system resources are available. If you see this message at the bottom of a results page, wait a few minutes and retry your query.

    Webhits Errors

    Message Explanation
    There are too many copies of hit highlighter running. Please try later. There are more simultaneous instances of Webhits than the maximum number set in the MaxRunningWebhits registry key. Try executing your query later, when the server is less busy.
    Hit highlighting took too long to execute and was timed out. Webhits has taken longer than the allotted time to process a document, and the server has timed out. The document may be too big or it may be corrupted. Ask the administrator to check the document.

    Virtual Roots

    Message Explanation
    Added virtual root <root> to index. The message “Mapped to <path>” is added to the event log when a virtual root is indexed.
    Removed virtual root <root> from index. This message is written to the event log when a virtual root is deleted from the index.
    Added scope <path> to index. This message is added to the event log when a new physical scope is indexed.
    Removed scope <path> from index. This message is written to the event log when a new physical scope is deleted from the index.

    Note   When virtual roots point to positions below each other, adding and removing virtual roots may have no effect on the physical scopes in the index. For example, some sites such as www.microsoft.com are branded with virtual roots in a marketing sense of the word. So if a user wants information on Windows NT Server, the user follows the path http://www.microsoft.com/NTServer, whcn http://www.microsoft.com/products/backoffice/ntserver is also a valid path. In this example, even if you removed the lower virtual root (/NTServer), the pages will still be indexed because they are include in another path, http://www.microsoft.com/products/backoffice/ntserver in this example.

    Filtering

    HTML Filter

    The HTML filter will not index any of the contents or properties of an HTML file if the HTML file contains the following meta tag:

    <meta name="robots" content="noindex">

    A Webmaster can add this meta tag to selectively avoid indexing certain HTML files.

    If an HTML file contains the following meta tag, the content field specifies the language code:

    <meta name="ms.locale" content="EN">

    The file is filtered by the language resources for that particular language (if available).

    The content field in the tag can also specify the locale by a decimal number, such as 1033, which is the locale ID for U.S. English.

    Some meta tag properties are mapped onto the Microsoft® Office property sets to allow users to mark HTML pages with the same properties in the Office property set. The list of properties that are mapped are:

    Property Mapped to
    <meta name="author" content="ruth"> The author property in the summary information property set.
    <meta name="subject" content="word processing"> The subject property in the summary information property set.
    <meta name="keywords" content="fonts, serif"> The keyword property in the summary information property set.
    <meta name="ms.category" content="fiction"> The category property in the document summary information property set.

    Hit Highlighting

    In the “Webhits Parameters” section, the paragraph under the CiQueryFile parameter should say virtual path instead of physical path. The paragraph should read as follows:

    Format: CiQueryFile=Virtual path

    This parameter is optional. If it is passed, CiQueryFile specifies the virtual path of the .idq file containing the [Names] section describing the custom properties.You must pass this parameter for all queries involving custom properties. If you try to hit-highlight a document with a query that has a custom property and you do not specify the appropriate .idq file, the error message “No such property” will be displayed.

    The following parameters have been added to the “Webhits Parameters” section:

    CiBeginHilite
    CiEndHilite
    Format: CiBeginHilite=BeginTags&CiEndHilite=EndTags
    These two parameters together customize highlighted words in the query results. If you specify these tags, Index Server ignores all other formatting parameters CiBold, CiHiliteColor, CiItalic, and so on.

    Important   You must match the BeginTags and EndTags with correct HTML formating. Failure to do so will produce unpredictable results. When you specify these parameters in the query template file (.htx file), you must properly escape the tags. For example:

    CiBeginHilite=<%escapeURL <font color="#FF0000"><em>%>&CiEndHilite=<%escapeURL </em></font> %>


    The two parameters together in the above example make the highlighted words in the search results appear in red italics.

     

    CiHiliteType
    Format: CiHiliteType=[Full|Summary]
    This parameter is optional. If not specified, Summary is the default.

    Summary   The summary feature can generate small excerpts of a document around the words that match the query specification.

    Full   When full highlighting is chosen as the option, the whole document is highlighted and returned. Note that this does not do full-fidelity highlighting. Only the text part of the document is extracted and highlighted. This option is mainly for documents that contain mostly text. It also tags the hits with bookmarks, allowing navigation between the hits. The first hit is bookmarked as #CiTag0 and the top of the generated document is tagged as #CiTag-1. To help in navigation, double-angle bracket tags (<< and >>) surround each hit. Click the << tag to go to the previous hit, and click the >> tag to go to the next hit.

     

    CiLocale
    Format: CiLocale =LocaleString
    This parameter is optional. If specified, the given locale will be used to interpret the CiRestriction string. Output will also be generated using this locale. Valid values for the CiLocale string are in the “Variables in .idq and .htx Files” page.

     

    CiMaxLineLength
    Format: CiMaxLineLength=Number
    This parameter is optional. When this parameter is specified, Webhits preformats the text with the <pre> and </pre> HTML tags. If a line length exceeds the specified number, it is broken at the next word boundary. This option works best when full hit-highlighting is chosen.

     

    CiTemplateFile
    Format: CiTemplateFile=Virtual path
    This parameter is optional, but highly recommended. It specifies the virtual path of the template file that generates Webhits output. The recommended extension for a Webhits template file is .htw. This template file lets you customize the output like the template files used for queries. It has a header section, a detail section, and a footer section. The template file format used by Webhits is same as the template file for queries, with the following differences:

    The only replaceable parameters allowed are <%CiUrl%>, <%CiRestriction%>, <%CiUserParam1 %>, <%CiUserParam2>, and so on up to <%CiUserParam10%>.

    There is no support for if-then-else processing.

    The detail section is used only as a placeholder for hit-highlighting data. In the current release, Webhits ignores the text between <%BeginDetail%> and <%EndDetail%>. It is, however, important to specify <%BeginDetail%> and <%EndDetail%>.

    EscapeHTML, EscapeURL, and EscapeRAW are supported as in query template files.

    Sample template files for Webhits output formatting are included in the installed samples as:

    /Scripts/Samples/Search/Qfullhit.htw
    /Scripts/Samples/Search/Qsumrhit.htw

    CiUrl   The virtual path of the document being highlighted replaces this parameter.

    CiRestriction   The value specified for Webhits in the CiRestriction parameter replaces this parameter.

    CiUserParamNumber   Where Number is a number from 1 to 10. The corresponding value specified in the CiUserParamNumber parameter replaces this parameter.

     

    CiUserParamNumber
    Format: CiUserParamNumber=value, where value can be any non-null string.
    CiUserParamNumber is any parameter that can be specified for Webhits and that can be replaced in CiTemplateFile. In CiUserParamNumber, Number is any number from 1 to 10. For example, CiUserParam1, CiUserParam3, CiUserParam5, and so on.

    In the Files Used section, the text should read as follows:

    Webhits installs the following files:

    /Scripts/Samples/Search/Webhits.exe
    /Scripts/Samples/Search/Queryhit.htx
    /Scripts/Samples/Search/Queryhit.idq
    /Scripts/Samples/Search/QSumrhit.htw
    /Scripts/Samples/Search/QFullhit.htw
    /Samples/Search/Queryhit.htm

    All files above demonstrate summary and full-text hit-highlighting.

    Internet Data Query Files

    The following paragraphs have been added to the Names Section.

    The HTML filter emits scripting code embedded in an HTML page as a script property with the GUID 31F400A0-FD07-11CF-B9BD-00AA003DB18E. The property name of the script is specified by the language field of the script tag, for example:

    <script language="vbscript">

    In this example, the property name is vbscript. If no language field is specified, then the language field of an earlier script tag in the HTML page is used. If no earlier script tag is specified, then the property name defaults to javascript. The GUID for the script property is a registry parameter located at

    HKEY_LOCAL_MACHINE
     \System
      \CurrentControlSet
       \Control\HtmlFilter
        \ScriptTagClsid

    The following example shows you how to name a custom property for Microsoft Office by adding globally unique identifier (GUID) to the Names section of the Internet Data Query (.idq) file:

    Custom_Text ( DBTYPE_STR|DBTYPE_BYREF ) = D5CDD505-2E9C-101B-9397-08002B2CF9AE "Custom_Text"

    In this example, Custom_Text can be any string. The value of Custom_Text does not have to be the same at the beginning and end of the line. The one at the beginning is the friendly name, and the one at the end (in quotation marks) is the Microsoft Office property name.

    Query Language

    In the “Boolean and Proximity Operators” section, the following note adds important information about the NEAR operator:

    Note   The NEAR operator can be applied only to words or phrases.

    Some documented properties are unavailable. The documentation incorrectly states that the following property names can be used:

    DocCategory
    DocCompany
    DocManager

    To use these properties, you must list them in a [Names] section in the .idq file. To use these properties in a restriction, sort specification, or as a retrieved column, you have to add the following definitions to the .idq file:

    [Names]
    #Office document properties which are not in the standard list
    DocCategory ( DBTYPE_STR ) = D5CDD502-2E9C-101B-9397-08002B2CF9AE 0x2
    DocManager ( DBTYPE_STR ) = D5CDD502-2E9C-101B-9397-08002B2CF9AE 0xE
    DocCompany ( DBTYPE_STR ) = D5CDD502-2E9C-101B-9397-08002B2CF9AE 0xF

    Registry Parameters

    All keys are in the following path:

    HKEY_LOCAL_MACHINE
    \SYSTEM
     \CurrentControlSet
      \Control
       \contentindex

    The following parameters have been added:

    CiCatalogFlags REG_DWORD
    Default: 0
    Range: 0 - 2
    Controls Index Server behavior based on certain flags. Set the value 1 to turn off notifications on all remote UNC paths. Set this flag if Index Server is configured to index documents on a wide area network (WAN) over slow links. Set the value to 2 to turn off notifications on all local paths. When either of these flags is set, Index Server triggers periodic scans for the paths for which notifications have been disabled. The registry parameter ForcedNetPathScanInterval controls the frequency of paths.

     

    MasterMergeCheckpointInterval REG_DWORD
    Units: Kilobytes
    Default: 256
    Range: 256 - 4096
    Specifies the interval after which a new index is flushed as a master merge proceeds.

     

    MaxRunningWebhits REG_DWORD
    Default: 20
    Range: 1 - 200
    Specifies the maximum number of concurrent instances of Webhits. When this value is exceeded, the following error message is generated, and the user is asked to try again later. Increase this value for computers with more memory or processors.

     

    MaxShadowFreeForceMerge REG_DWORD
    Units: Percentage of free disk space
    Default: 20
    Range: 5 - 4,000,000,000
    Specifies the percentage of free disk space occupied by shadow indexes on a catalog drive. If this percentage exceeds the value set for this parameter and if the total free disk space falls below the minimum set in the MinDiskFreeForceMerge, a master merge begins. For example, if this parameter is set to 500, the amount of free disk space is 10 megabytes and the amount of space occupied by shadow indexes is 40 megabytes, no master merge takes place (40*100/10 is less than 500). However, if the value of this parameter is set to 300, a master merge begins because 40*100/10 is greater than 300.

     

    MaxWebhitsCpuTime REG_DWORD
    Units: Seconds
    Default: 30
    Range: 5 - 7200
    Specifies the timeout value for Webhits in CPU seconds. If Webhits does not process a document in the stipulated amount of time, it will return an error message that the allowed time has been exceeded.

    Variables in .idq and .htx Files

    The following variables have been added as read-only variables for .htx files.

    Variable Name Meaning
    CiVersionMajor The major version of Index Server.
    CiVersionMinor The minor version of Index Server.

    For other variables, see Read-Only Variables Available in .htx Files on the “Variables in .idq and .htx Files” page.


    To TopRemoving Index Server

    This section tells you how to delete Index Server from your computer.

    To remove Index Server

    1. Stop Microsoft Internet Information Server or Microsoft Peer Web Services.

    2. Delete the following files from the %SystemRoot%\System32 directory:

      Cidaemon.exe
      Htmlfilt.dll
      Idq.dll
      Infosoft.dll
      Kppp.dll
      Kppp7.dll
      Kpw6.dll
      Kpword.dll
      Kpxl5.dll
      Qperf.dll
      Query.dll
      Sccfa.dll
      Sccfi.dll
      Sccifilt.dll
      Sccut.dll

      Noise.* (where * is one or more of dat, deu, eng, enu, esn, fra, ita, nld, sve)
      Wbcache.* (where * is one or more of deu, eng, enu, esn, fra, ita, nld, sve)
      Wbdbase.* (where * is one or more of deu, eng, enu, esn, fra, ita, nld, sve)

    3. In the registry, delete the following keys and/or values:

      HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\contentindex
      HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ContentIndex
      HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ContentFilter
      HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ISAPISearch
      HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\W3SVC\Parameters\Script Map\.ida
      HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\W3SVC\Parameters\Script Map\.idq

    4. Delete all Catalog.wci directories (referenced from the CiCatalog parameter of an .idq file).

    5. Through the Windows NT Explorer, delete all files pointed to by the virtual roots /Samples/Search, /Srchadm, /Scripts/Srchadm, and /Scripts/Samples/Search. Then, through the Internet Service Manager, you can optionally remove these virtual roots if they exist.

    6. (optional) Delete all references under HKEY_CLASSES_ROOT to PersistentHandler, including all links to classes referenced from a PersistentHandler value.


     Contents Previous To Top Next


    © 1996 by Microsoft Corporation. All rights reserved.

    Space  Contents Previous Next


    Microsoft Index Server
    Release Notes


    Thank you for downloading and installing Microsoft® Index Server version 1.1 for Windows NT® Server. This file lists the changes made to Index Server since its beta release. There are also several installation items to note. These notes are mainly for users who have a previous version of Index Server installed on their computers and are upgrading to the latest version. These changes and notes are summarized on this page.


    To TopVisit the Index Server Home Page

    For more information about Index Server and related features, see the home page at the following address:

    http://www.microsoft.com/ntserver/search


    To TopSample Files

    The sample files (such as Query.htm) were replaced. If you modified any of the sample files and did not move or rename them, they were overwritten.


    To TopInstalling Index Server

    Installing Index Server will reset the registry settings to their defaults. If you have modified the registry settings for Index Server, you will have to reset the values to your preferences after installation.


    To TopSupport for Microsoft Internet News Service

    If Microsoft Internet News Server has been installed on a server along with Index Server version 1.1, then news articles can be indexed. You can find additional sample query forms written for a news server on the Index Server home page.

    The virtual paths produced by Internet Information Server (IIS) convert the dot between newsgroup components to a slash. For example:

    News Group Converted To
    comp.os.ms-windows.advocacy /comp/os/ms-windows/advocacy

    Note   The path /comp/os/ms-windows/advocacy is not a valid virtual path in IIS.

    New Default Properties

    The followingproperties are always available for queries to newsgroups.

    Friendly Name Datatype Property
    NewsGroup DBTYPE_WSTR | DBTYPE_BYREF Newsgroup to which article was posted.
    NewsGroups DBTYPE_WSTR | DBTYPE_BYREF Full set of newsgroups to which article was cross-posted.
    NewsSubject DBTYPE_WSTR | DBTYPE_BYREF Subject line of news article.
    NewsFrom DBTYPE_WSTR | DBTYPE_BYREF Author of news article.
    NewsMsgId DBTYPE_WSTR | DBTYPE_BYREF Globally unique message ID of article.

    Special Requirement for Hit Highlighting

    The hit highlighter (Webhits.exe) is a Common Gateway Interface (CGI) application that must be stored in a valid virtual path with Execute permission. If you want to highlight hits in news articles, add virtual roots, each beginning with /$CiNews and corresponding to every root in the news server. Make sure that virtual roots in IIS beginning with /$CiNews have both Read and Execute permissions turned off.

    For example, if rec.sports.* was being stored at C:\Sports and the default (home) news root was C:\Inetpub\Nntproot, two new virtual roots would be added. /$CiNews/rec/sports=D:\Sports and /$CiNews=C:\Inetpub\Nntproot. The Read and Execute permissions are not enabled for these virtual roots.

    When running Webhits.exe, be sure to put the virtual path /$CiNews/<%vpath%> into the .htx file in the call to Webhits.exe.

    Note   The hit highlighter does not check Read permissions for virtual roots beginning with /$CiNews/.

    NNTP Virtual Roots with UNC Shares

    If a virtual root on a news server points to a universal naming convention (UNC) share, administrators must add a virtual root in IIS. The Network News Transfer Protocol (NNTP) virtual root must be prepended with /$CiNews to highlight the news articles stored on that UNC share by using Webhits.

    Example

    Assume the following in the news server setup:

    • /rec.food points to \\Server1\Share1\Dir1
    • The user ID is Gourmet\Chef1 (in the form domain\username)
    • The password is Marinade

    In IIS, set up a virtual root with the following properties:

    • /$CiNews/rec.food pointing to \\Server1\Share1\Dir1
    • The user ID is Gourmet\Chef1
    • The password is Marinade
    • Both Read and Execute permissions are turned off

    Important   Be sure to turn off the Read and Execute permissions on virtual roots prepended with /$CiNews.


    To TopChanges to the Documentation

    This section details changes and additions to the existing documentation.

    Basic Administration

    In the sections that discuss the variables PROOT_virtual and INDEX_virtual root, (Enabling Indexing of a Virtual Root and Forcing a Scan of a Virtual Root), if the root is a news root, these variables are PROOT_NNTP_virtual and INDEX_NNTP_virtual root.

    List of Virtual Roots

    You can determine the type of a virtual root while making the VIRTUAL_ROOTS query. Look at the value of the special property StorageType—(DBTYPE_UI4) = b725f130-47ef-101a-a5f1-02608c9eebac 4. The value 0 identifies a Web root. The value 1 identifies a news root.

    Error Messages

    This section lists addtions and corrections to the Index Server error messages, contained on the Error Messages page.

    Event Log Messages

    Message Explanation
    Account user-id does not have interactive logon privilege on this computer. You can give user-id interactive logon privilege on this computer using the user manager adminstrative tool. The specified does not have interactive logon permisson on the computer running Index Server. Give the user-id interactive logon privilege through the User Manager for Domains.

    Results Page

    At the bottom of a results page, you may periodically see the following message:

    Message Explanation
    The index is out of date. Files have been modified since the last time the scope of your query was indexed. Whenever files in a scope are modifed, Index Server re-indexes them automatically whenever system resources are available. If you see this message at the bottom of a results page, wait a few minutes and retry your query.

    Webhits Errors

    Message Explanation
    There are too many copies of hit highlighter running. Please try later. There are more simultaneous instances of Webhits than the maximum number set in the MaxRunningWebhits registry key. Try executing your query later, when the server is less busy.
    Hit highlighting took too long to execute and was timed out. Webhits has taken longer than the allotted time to process a document, and the server has timed out. The document may be too big or it may be corrupted. Ask the administrator to check the document.

    Virtual Roots

    Message Explanation
    Added virtual root <root> to index. The message “Mapped to <path>” is added to the event log when a virtual root is indexed.
    Removed virtual root <root> from index. This message is written to the event log when a virtual root is deleted from the index.
    Added scope <path> to index. This message is added to the event log when a new physical scope is indexed.
    Removed scope <path> from index. This message is written to the event log when a new physical scope is deleted from the index.

    Note   When virtual roots point to positions below each other, adding and removing virtual roots may have no effect on the physical scopes in the index. For example, some sites such as www.microsoft.com are branded with virtual roots in a marketing sense of the word. So if a user wants information on Windows NT Server, the user follows the path http://www.microsoft.com/NTServer, whcn http://www.microsoft.com/products/backoffice/ntserver is also a valid path. In this example, even if you removed the lower virtual root (/NTServer), the pages will still be indexed because they are include in another path, http://www.microsoft.com/products/backoffice/ntserver in this example.

    Filtering

    HTML Filter

    The HTML filter will not index any of the contents or properties of an HTML file if the HTML file contains the following meta tag:

    <meta name="robots" content="noindex">

    A Webmaster can add this meta tag to selectively avoid indexing certain HTML files.

    If an HTML file contains the following meta tag, the content field specifies the language code:

    <meta name="ms.locale" content="EN">

    The file is filtered by the language resources for that particular language (if available).

    The content field in the tag can also specify the locale by a decimal number, such as 1033, which is the locale ID for U.S. English.

    Some meta tag properties are mapped onto the Microsoft® Office property sets to allow users to mark HTML pages with the same properties in the Office property set. The list of properties that are mapped are:

    Property Mapped to
    <meta name="author" content="ruth"> The author property in the summary information property set.
    <meta name="subject" content="word processing"> The subject property in the summary information property set.
    <meta name="keywords" content="fonts, serif"> The keyword property in the summary information property set.
    <meta name="ms.category" content="fiction"> The category property in the document summary information property set.

    Hit Highlighting

    In the “Webhits Parameters” section, the paragraph under the CiQueryFile parameter should say virtual path instead of physical path. The paragraph should read as follows:

    Format: CiQueryFile=Virtual path

    This parameter is optional. If it is passed, CiQueryFile specifies the virtual path of the .idq file containing the [Names] section describing the custom properties.You must pass this parameter for all queries involving custom properties. If you try to hit-highlight a document with a query that has a custom property and you do not specify the appropriate .idq file, the error message “No such property” will be displayed.

    The following parameters have been added to the “Webhits Parameters” section:

    CiBeginHilite
    CiEndHilite
    Format: CiBeginHilite=BeginTags&CiEndHilite=EndTags
    These two parameters together customize highlighted words in the query results. If you specify these tags, Index Server ignores all other formatting parameters CiBold, CiHiliteColor, CiItalic, and so on.

    Important   You must match the BeginTags and EndTags with correct HTML formating. Failure to do so will produce unpredictable results. When you specify these parameters in the query template file (.htx file), you must properly escape the tags. For example:

    CiBeginHilite=<%escapeURL <font color="#FF0000"><em>%>&CiEndHilite=<%escapeURL </em></font> %>


    The two parameters together in the above example make the highlighted words in the search results appear in red italics.

     

    CiHiliteType
    Format: CiHiliteType=[Full|Summary]
    This parameter is optional. If not specified, Summary is the default.

    Summary   The summary feature can generate small excerpts of a document around the words that match the query specification.

    Full   When full highlighting is chosen as the option, the whole document is highlighted and returned. Note that this does not do full-fidelity highlighting. Only the text part of the document is extracted and highlighted. This option is mainly for documents that contain mostly text. It also tags the hits with bookmarks, allowing navigation between the hits. The first hit is bookmarked as #CiTag0 and the top of the generated document is tagged as #CiTag-1. To help in navigation, double-angle bracket tags (<< and >>) surround each hit. Click the << tag to go to the previous hit, and click the >> tag to go to the next hit.

     

    CiLocale
    Format: CiLocale =LocaleString
    This parameter is optional. If specified, the given locale will be used to interpret the CiRestriction string. Output will also be generated using this locale. Valid values for the CiLocale string are in the “Variables in .idq and .htx Files” page.

     

    CiMaxLineLength
    Format: CiMaxLineLength=Number
    This parameter is optional. When this parameter is specified, Webhits preformats the text with the <pre> and </pre> HTML tags. If a line length exceeds the specified number, it is broken at the next word boundary. This option works best when full hit-highlighting is chosen.

     

    CiTemplateFile
    Format: CiTemplateFile=Virtual path
    This parameter is optional, but highly recommended. It specifies the virtual path of the template file that generates Webhits output. The recommended extension for a Webhits template file is .htw. This template file lets you customize the output like the template files used for queries. It has a header section, a detail section, and a footer section. The template file format used by Webhits is same as the template file for queries, with the following differences:

    The only replaceable parameters allowed are <%CiUrl%>, <%CiRestriction%>, <%CiUserParam1 %>, <%CiUserParam2>, and so on up to <%CiUserParam10%>.

    There is no support for if-then-else processing.

    The detail section is used only as a placeholder for hit-highlighting data. In the current release, Webhits ignores the text between <%BeginDetail%> and <%EndDetail%>. It is, however, important to specify <%BeginDetail%> and <%EndDetail%>.

    EscapeHTML, EscapeURL, and EscapeRAW are supported as in query template files.

    Sample template files for Webhits output formatting are included in the installed samples as:

    /Scripts/Samples/Search/Qfullhit.htw
    /Scripts/Samples/Search/Qsumrhit.htw

    CiUrl   The virtual path of the document being highlighted replaces this parameter.

    CiRestriction   The value specified for Webhits in the CiRestriction parameter replaces this parameter.

    CiUserParamNumber   Where Number is a number from 1 to 10. The corresponding value specified in the CiUserParamNumber parameter replaces this parameter.

     

    CiUserParamNumber
    Format: CiUserParamNumber=value, where value can be any non-null string.
    CiUserParamNumber is any parameter that can be specified for Webhits and that can be replaced in CiTemplateFile. In CiUserParamNumber, Number is any number from 1 to 10. For example, CiUserParam1, CiUserParam3, CiUserParam5, and so on.

    In the Files Used section, the text should read as follows:

    Webhits installs the following files:

    /Scripts/Samples/Search/Webhits.exe
    /Scripts/Samples/Search/Queryhit.htx
    /Scripts/Samples/Search/Queryhit.idq
    /Scripts/Samples/Search/QSumrhit.htw
    /Scripts/Samples/Search/QFullhit.htw
    /Samples/Search/Queryhit.htm

    All files above demonstrate summary and full-text hit-highlighting.

    Internet Data Query Files

    The following paragraphs have been added to the Names Section.

    The HTML filter emits scripting code embedded in an HTML page as a script property with the GUID 31F400A0-FD07-11CF-B9BD-00AA003DB18E. The property name of the script is specified by the language field of the script tag, for example:

    <script language="vbscript">

    In this example, the property name is vbscript. If no language field is specified, then the language field of an earlier script tag in the HTML page is used. If no earlier script tag is specified, then the property name defaults to javascript. The GUID for the script property is a registry parameter located at

    HKEY_LOCAL_MACHINE
     \System
      \CurrentControlSet
       \Control\HtmlFilter
        \ScriptTagClsid

    The following example shows you how to name a custom property for Microsoft Office by adding globally unique identifier (GUID) to the Names section of the Internet Data Query (.idq) file:

    Custom_Text ( DBTYPE_STR|DBTYPE_BYREF ) = D5CDD505-2E9C-101B-9397-08002B2CF9AE "Custom_Text"

    In this example, Custom_Text can be any string. The value of Custom_Text does not have to be the same at the beginning and end of the line. The one at the beginning is the friendly name, and the one at the end (in quotation marks) is the Microsoft Office property name.

    Query Language

    In the “Boolean and Proximity Operators” section, the following note adds important information about the NEAR operator:

    Note   The NEAR operator can be applied only to words or phrases.

    Some documented properties are unavailable. The documentation incorrectly states that the following property names can be used:

    DocCategory
    DocCompany
    DocManager

    To use these properties, you must list them in a [Names] section in the .idq file. To use these properties in a restriction, sort specification, or as a retrieved column, you have to add the following definitions to the .idq file:

    [Names]
    #Office document properties which are not in the standard list
    DocCategory ( DBTYPE_STR ) = D5CDD502-2E9C-101B-9397-08002B2CF9AE 0x2
    DocManager ( DBTYPE_STR ) = D5CDD502-2E9C-101B-9397-08002B2CF9AE 0xE
    DocCompany ( DBTYPE_STR ) = D5CDD502-2E9C-101B-9397-08002B2CF9AE 0xF

    Registry Parameters

    All keys are in the following path:

    HKEY_LOCAL_MACHINE
    \SYSTEM
     \CurrentControlSet
      \Control
       \contentindex

    The following parameters have been added:

    CiCatalogFlags REG_DWORD
    Default: 0
    Range: 0 - 2
    Controls Index Server behavior based on certain flags. Set the value 1 to turn off notifications on all remote UNC paths. Set this flag if Index Server is configured to index documents on a wide area network (WAN) over slow links. Set the value to 2 to turn off notifications on all local paths. When either of these flags is set, Index Server triggers periodic scans for the paths for which notifications have been disabled. The registry parameter ForcedNetPathScanInterval controls the frequency of paths.

     

    MasterMergeCheckpointInterval REG_DWORD
    Units: Kilobytes
    Default: 256
    Range: 256 - 4096
    Specifies the interval after which a new index is flushed as a master merge proceeds.

     

    MaxRunningWebhits REG_DWORD
    Default: 20
    Range: 1 - 200
    Specifies the maximum number of concurrent instances of Webhits. When this value is exceeded, the following error message is generated, and the user is asked to try again later. Increase this value for computers with more memory or processors.

     

    MaxShadowFreeForceMerge REG_DWORD
    Units: Percentage of free disk space
    Default: 20
    Range: 5 - 4,000,000,000
    Specifies the percentage of free disk space occupied by shadow indexes on a catalog drive. If this percentage exceeds the value set for this parameter and if the total free disk space falls below the minimum set in the MinDiskFreeForceMerge, a master merge begins. For example, if this parameter is set to 500, the amount of free disk space is 10 megabytes and the amount of space occupied by shadow indexes is 40 megabytes, no master merge takes place (40*100/10 is less than 500). However, if the value of this parameter is set to 300, a master merge begins because 40*100/10 is greater than 300.

     

    MaxWebhitsCpuTime REG_DWORD
    Units: Seconds
    Default: 30
    Range: 5 - 7200
    Specifies the timeout value for Webhits in CPU seconds. If Webhits does not process a document in the stipulated amount of time, it will return an error message that the allowed time has been exceeded.

    Variables in .idq and .htx Files

    The following variables have been added as read-only variables for .htx files.

    Variable Name Meaning
    CiVersionMajor The major version of Index Server.
    CiVersionMinor The minor version of Index Server.

    For other variables, see Read-Only Variables Available in .htx Files on the “Variables in .idq and .htx Files” page.


    To TopRemoving Index Server

    This section tells you how to delete Index Server from your computer.

    To remove Index Server

    1. Stop Microsoft Internet Information Server or Microsoft Peer Web Services.

    2. Delete the following files from the %SystemRoot%\System32 directory: