Example:
|
Recommendation:
|
Complete
Syntax:
|
| Length: |
Minimum
n/a |
Maximum
n/a |
Recommended
n/a |
Usage:
|
Description:
|
Comments:
|
Examples:
|
|
Google-Comments: |
|
Yahoo-Comments: |
|
MSN-Comments: |
|
AOL-Comments: |
| Ask
Jeeves-Comments: |
|
AltaVista-Comments: |
|
Excite-Comments: |
| HotBot-Comments: |
| Itomi-Comments: |
| InfoSeek-Comments: |
| Lycos-Comments: |
| NorthernLight-Comments: |
| |
| USA Usage/Comments: |
| UK
Usage/Comments: |
| CDN Usage/Comments: |
| DCMI Usage/Comments: |
| Other International/Comments: |
| |
| Commerical Usage/Comments: |
| Governmental Usage/Comments: |
| Education Usage/Comments: |
| Non-profit Usage/Comments: |
| |
| HTML 1.0 |
| HTML 2.0 |
| HTML 3.2 |
| HTML 4.0 |
| XHTML |
| DHTML |
| eGMS |
| PICS |
| DCMI |
| W3C |
| |
| |
| |
| |
Date: 09/17/05 17:21:53
Subject:
Microsoft Index Server: Filter DLLs
... property set. <meta name="ms.category"
content="fiction">, The category property
in the document summary information property set. The ...
massmind.org/techref/language/asp/ix/ixfildll.htm
- 27k - Supplemental Result -
A filter DLL “understands” one or more document formats and
is capable of extracting text and properties out of those
document types. A filter DLL implements the IFilter
ActiveX interface. The CiDaemon process uses the IFilter
interface to extract the text out of a document. To track down a
problem with a filter DLL, an administrator needs to know where
to look to determine the filter DLL for a particular document.
Editing the registry is also a good way to avoid filtering
documents with no useful content.
This topic contains:
The list of document types for which filters are
pre-installed is given below:
- HTML version 3.0 or lower
- Microsoft® Word
- Microsoft® Excel
- Microsoft® PowerPoint®
- Plain Text
- Binary Files
The HTML filter will not index any of the contents or
properties of an HTML file if the HTML file contains the
following meta tag:
<meta name="robots" content="noindex">
A Webmaster can add this meta tag to selectively avoid
indexing certain HTML files.
If an HTML file contains the following meta tag, the content
field specifies the language code:
<meta name="ms.locale" content="EN">
The file is filtered by the language resources for that
particular language (if available).
The content field in the tag can also specify the locale by a
decimal number, such as 1033, which is the locale ID for U.S.
English.
Some meta tag properties are mapped onto the Microsoft®
Office property sets to allow users to mark HTML pages with the
same properties in the Office property set. The list of
properties that are mapped are:
|
Property example |
Mapped to |
<meta name="author" content="ruth"> |
The author property in the summary
information property set. |
<meta name="subject" content="word processing"> |
The subject property in the summary
information property set. |
<meta name="keywords" content="fonts, serif"> |
The keyword property in the summary
information property set. |
<meta name="ms.category" content="fiction"> |
The category property in the document
summary information property set. |
The HTML filter extracts text from the content field of a
meta element. For example, if an HTML file has this line:
<META NAME="DESCRIPTION" CONTENT="Sample query form for Microsoft Index Server">
Then a user can query the information in the content field,
namely “Sample query form for Microsoft Index Server”, by using
the HTML meta property. The
GUID for the meta property is
D1B5D3F0-C0B3-11CF-9A92-00A0C908DBF1 and the property name is
specified by the name field, or the HTTP-EQUIV field. In the
above example, the property name is DESCRIPTION.
Thus a friendly name, for example MetaDescription, for the meta
property can be defined as
MetaDescription(DBTYPE_WSTR|DBTYPE_BYREF) = D1B5D3F0-C0B3-11CF-9A92-00A0C908DBF1 description
The GUID for meta property is a registry parameter located at
HKEY_LOCAL_MACHINE
\System
\CurrentControlSet
\Control\HtmlFilter
\MetaTagClsid
The HTML filter emits scripting code embedded in an HTML page
as a script property with the GUID
31F400A0-FD07-11CF-B9BD-00AA003DB18E. The property name of the
script is specified by the language field of the script tag, for
example:
<script language="vbscript">
In this example, the property name is vbscript. If
no language field is specified, then the language field of an
earlier script tag in the HTML page is used. If no earlier
script tag is specified, then the property name defaults to
javascript. The GUID for the script property is a registry
parameter located at
HKEY_LOCAL_MACHINE
\System
\CurrentControlSet
\Control\HtmlFilter
\ScriptTagClsid
Document types and the associated filter DLL entries are
specified in the registry under the
\HKEY_LOCAL_MACHINE\Software\Classes tree. To find out the
filter DLL associated with a particular document type, navigate
through the registry entries in the
\HKEY_LOCAL_MACHINE\Software\Classes tree.
When a registered binary file is encountered, the
NULL filter is used. The NULL filter retrieves only the
system properties. The contents of a binary file are
not filtered. Examples of system properties are the
FileName, last Write time, file
Size, Attributes, and so on.
For more information about binary files, see
Registering File Types as Binary Files
In Index Server, a default filter filters both the system
properties (such as file name) and the contents of a file. The
default filter does not “understand” any document formats; when
filtering the contents of a file, it treats the file as a
sequence of characters. Index Server uses the default filter
when the file-name extension of a file has no association in the
registry, and if the value of the registry setting
FilterFilesWithUnknownExtensions is 1.
Note The default filter filters plain text
and files of unknown origin. It assumes all text to be in the
default code page of the server.
If a file is corrupted, the filter may not be able to
properly interpret the contents of that file. To learn how to
get a list of files that could not be filtered, see
Unfiltered Files. An
event is also written to the event log. Sometimes a file
cannot be filtered because of a defective third-party filter.
After verifying the contents of a file, an administrator should
report the problems to the filter vendor. Files protected by
passwords are not filtered.
If a document cannot be filtered, it will be retried a
certain maximum number of times. If the document still cannot be
filtered, then it will be considered to be an unfiltered
file. The registry key
FilterRetries controls the maximum number of
retries for a document.
To get a list of all the files that could
not be filtered
-->
- Click Start, point to Programs,
point to Windows NT 4.0 Option Pack, point
to to Microsoft Index Server, and click
Index Server Manager (HTML).
- In the View unfiltered documents field,
click Start.
A file with an extension that does not have an association in
the registry is treated as an Unknown Extension.
The behavior of Index Server depends upon the registry setting
FilterFilesWithUnknownExtensions. If this value is
set to 0, then the NULL Filter is used to filter those files.
Otherwise, the
default filter is used to filter the contents.
By default, directories are not filtered and will
not appear in query results. To filter directories, set the
registry key
FilterDirectories to 1. When directories are
filtered, their system properties are filtered.
CiDaemon process is capable of automatically generating a
summary or
characterization (also called abstract)
for each document. If the registry key
GenerateCharacterization is set to 1, the
characterization will be automatically generated. The maximum
number of characters in the generated characterization is
controlled by the registry key
MaxCharacterization.
If the characterization is set to be generated automatically,
Index Server takes by default the first 320 characters of a
document and copies that block of text for the summary. You can
override this automatic selection by inserting a meta tag in
each document with your own customized summary. Put all meta
tags within the header of an HTML file, as shown in the
following example.
<head>
<META NAME="DESCRIPTION" CONTENT="This text will appear on the results page
as the document's summary.">
</head>
To add new filter DLLs, please refer to the documentation
provided with the filter DLLs. You can register and unregister
DLLs with the registry utility (Regsvr32.exe).
To remove a filter DLL, the IFilter PersistentHandler
entry associated with a document type and the filter DLL
entry must be deleted. See
Finding the Filter DLL for a Document. Once you have found
the correct IFilter PersistentHandler entry, you can
unregister it with the following syntax:
Regsvr32.exe /u
For an example, see
Removing a Filter.
The following example shows how to find out the filter DLL
for a document. This example is for HTML files.
Step 1: Determine the CLSID
Find the CLSID associated with the document type
under the registry key \HKEY_LOCAL_MACHINE\SOFTWARE\Classes. Let
this be <Value1>.
\HKEY_LOCAL_MACHINE\SOFTWARE\Classes
htmlfile
= Class for WWW HTML files
CLSID
= {25336920-03F9-11CF-8FD0-00AA00686F13}
Step 2: Determine the Persistent Handler
Using <Value1> found out in Step 1, find the
PersistentHandler value for the
\HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\<Value1>
key. Let this be <Value2>.
\HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID
{25336920-03F9-11CF-8FD0-00AA00686F13}
= WWW HTML files
PersistentHandler
= {EEC97550-47A9-11CF-B952-00AA0051FE20}
Step 3: Determine the IFilter Persistent Handler GUID
Using <Value2> determined in Step 2, find the
IFilter Persistent Handler GUID for the document type. The
value under the key \HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\<Value2>\PersistentAddinsRegistered\
89BCB740-6119-101A-BCB7-00DD010655AF yields the IFilter
Persistent Handler GUID for this document type.
Let this be <Value3>.
89BCB740-6119-101A-BCB7-00DD010655AF is the IFilter
interface GUID.
\Registry\Machine\Software\Classes\CLSID
{EEC97550-47A9-11CF-B952-00AA0051FE20}
= REG_SZ HTML File Persistent Handler
PersistentAddinsRegistered
{89BCB740-6119-101A-BCB7-00DD010655AF}
= REG_SZ {E0CA5340-4534-11CF-B952-00AA0051FE20}
Step 4: Determine the Filter DLL
Using <Value3> determined in Step 3, the filter DLL
can be found under the entry
\HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\<Value3>\InprocServer32.
\Registry\Machine\Software\Classes\CLSID
{E0CA5340-4534-11CF-B952-00AA0051FE20}
= REG_SZ HTML Filter
InprocServer32
= REG_SZ nlhtml.dll
In this example, the filter DLL for HTML documents is
nlhtml.dll.
|
Microsoft
Index Server Release Notes
... <meta
name="ms.category"
content="fiction">, The category property in the
document summary information property set. Hit
Highlighting. ...
www.rialto.k12.ca.us/INDEXSRVR/srchadm/help/README.HTM
- 101k - Supplemental Result -
Microsoft Index Server
Release Notes
Thank you for downloading and installing
Microsoft® Index Server version 1.1 for Windows NT®
Server. This file lists the changes made to Index
Server since its beta release. There are also
several installation items to note. These notes are
mainly for users who have a previous version of
Index Server installed on their computers and are
upgrading to the latest version. These changes and
notes are summarized on this page.
For more information about Index Server and
related features, see the home page at the following
address:
http://www.microsoft.com/ntserver/search
The sample files (such as Query.htm) were
replaced. If you modified any of the sample files
and did not move or rename them, they were
overwritten.
Installing Index Server will reset the registry
settings to their defaults. If you have modified the
registry settings for Index Server, you will have to
reset the values to your preferences after
installation.
If Microsoft Internet News Server has been
installed on a server along with Index Server
version 1.1, then news articles can be indexed. You
can find additional sample query forms written for a
news server on the
Index Server home page.
The virtual paths produced by Internet
Information Server (IIS) convert the dot between
newsgroup components to a slash. For example:
|
News Group |
Converted To |
|
comp.os.ms-windows.advocacy |
/comp/os/ms-windows/advocacy |
Note The path
/comp/os/ms-windows/advocacy is not a valid virtual
path in IIS.
New Default Properties
The followingproperties are always available for
queries to newsgroups.
|
Friendly Name |
Datatype |
Property |
|
NewsGroup |
DBTYPE_WSTR |
DBTYPE_BYREF |
Newsgroup to which
article was posted. |
|
NewsGroups |
DBTYPE_WSTR |
DBTYPE_BYREF |
Full set of newsgroups to
which article was cross-posted. |
|
NewsSubject |
DBTYPE_WSTR |
DBTYPE_BYREF |
Subject line of news
article. |
|
NewsFrom |
DBTYPE_WSTR |
DBTYPE_BYREF |
Author of news article. |
|
NewsMsgId |
DBTYPE_WSTR |
DBTYPE_BYREF |
Globally unique message
ID of article. |
Special Requirement for Hit Highlighting
The hit highlighter (Webhits.exe) is a Common
Gateway Interface (CGI) application that must be
stored in a valid virtual path with Execute
permission. If you want to highlight hits in news
articles, add virtual roots, each beginning with /$CiNews
and corresponding to every root in the news server.
Make sure that virtual roots in IIS beginning with
/$CiNews have both Read and Execute permissions
turned off.
For example, if rec.sports.* was being stored at
C:\Sports and the default (home) news root was
C:\Inetpub\Nntproot, two new virtual roots
would be added. /$CiNews/rec/sports=D:\Sports and
/$CiNews=C:\Inetpub\Nntproot. The Read and
Execute permissions are not enabled for
these virtual roots.
When running Webhits.exe, be sure to put the
virtual path /$CiNews/<%vpath%>
into the .htx file in the call to Webhits.exe.
Note The hit highlighter does
not check Read permissions for virtual roots
beginning with /$CiNews/.
NNTP Virtual Roots with UNC Shares
If a virtual root on a news server points to a
universal naming convention (UNC) share,
administrators must add a virtual root in IIS. The
Network News Transfer Protocol (NNTP) virtual root
must be prepended with /$CiNews to highlight the
news articles stored on that UNC share by using
Webhits.
Example
Assume the following in the news server setup:
- /rec.food points to \\Server1\Share1\Dir1
- The user ID is Gourmet\Chef1 (in the form
domain\username)
- The password is Marinade
In IIS, set up a virtual root with the following
properties:
- /$CiNews/rec.food pointing to
\\Server1\Share1\Dir1
- The user ID is Gourmet\Chef1
- The password is Marinade
- Both Read and Execute permissions are turned
off
Important Be
sure to turn off the Read and Execute permissions on
virtual roots prepended with /$CiNews.
This section details changes and additions to the
existing documentation.
Basic Administration
In the sections that discuss the variables
PROOT_virtual and INDEX_virtual root,
(Enabling
Indexing of a Virtual Root and
Forcing a Scan of a Virtual Root), if the root
is a news root, these variables are
PROOT_NNTP_virtual and INDEX_NNTP_virtual
root.
List of Virtual Roots
You can determine the type of a virtual root
while making the VIRTUAL_ROOTS query. Look at the
value of the special property
StorageType—(DBTYPE_UI4) =
b725f130-47ef-101a-a5f1-02608c9eebac 4. The value 0
identifies a Web root. The value 1 identifies a news
root.
Error Messages
This section lists addtions and corrections to
the Index Server error messages, contained on the
Error Messages page.
Event Log Messages
|
Message |
Explanation |
|
Account user-id
does not have interactive logon
privilege on this computer. You can give
user-id interactive logon
privilege on this computer using the
user manager adminstrative tool. |
The specified does not
have
interactive logon
permisson on
the computer running Index Server. Give
the user-id interactive logon
privilege through the User Manager for
Domains. |
Results Page
At the bottom of a results page, you may
periodically see the following message:
|
Message |
Explanation |
|
The index is out of date. |
Files have been modified
since the last time the scope of your
query was indexed. Whenever files in a
scope are modifed, Index Server
re-indexes them automatically whenever
system resources are available. If you
see this message at the bottom of a
results page, wait a few minutes and
retry your query. |
Webhits Errors
|
Message |
Explanation |
|
There are too many copies
of hit highlighter running. Please try
later. |
There are more
simultaneous instances of Webhits than
the maximum number set in the
MaxRunningWebhits registry key.
Try executing your query later, when the
server is less busy. |
|
Hit highlighting took too
long to execute and was timed out. |
Webhits has taken longer
than the allotted time to process a
document, and the server has timed out.
The document may be too big or it may be
corrupted. Ask the administrator to
check the document. |
Virtual Roots
|
Message |
Explanation |
|
Added virtual root <root>
to index. |
The message “Mapped to <path>”
is added to the event log when a virtual
root is indexed. |
|
Removed virtual root <root>
from index. |
This message is written
to the event log when a virtual root is
deleted from the index. |
|
Added scope <path>
to index. |
This message is added to
the event log when a new physical scope
is indexed. |
|
Removed scope <path>
from index. |
This message is written
to the event log when a new physical
scope is deleted from the index. |
Note When virtual roots point
to positions below each other, adding and removing
virtual roots may have no effect on the physical
scopes in the index. For example, some sites such as
www.microsoft.com are branded with virtual roots in
a marketing sense of the word. So if a user wants
information on Windows NT Server, the user follows
the path
http://www.microsoft.com/NTServer, whcn
http://www.microsoft.com/products/backoffice/ntserver
is also a valid path. In this example, even if you
removed the lower virtual root (/NTServer), the
pages will still be indexed because they are include
in another path,
http://www.microsoft.com/products/backoffice/ntserver
in this example.
Filtering
HTML Filter
The HTML filter will not index any of the
contents or properties of an HTML file if the HTML
file contains the following meta tag:
<meta name="robots" content="noindex">
A Webmaster can add this meta tag to selectively
avoid indexing certain HTML files.
If an HTML file contains the following meta tag,
the content field specifies the language code:
<meta name="ms.locale" content="EN">
The file is filtered by the language resources
for that particular language (if available).
The content field in the tag can also specify the
locale by a decimal number, such as 1033, which is
the locale ID for U.S. English.
Some meta tag properties are mapped onto the
Microsoft® Office property sets to allow users to
mark HTML pages with the same properties in the
Office property set. The list of properties that are
mapped are:
|
Property |
Mapped to |
|
<meta name="author" content="ruth"> |
The author property in the summary
information property set. |
|
<meta name="subject" content="word
processing"> |
The subject property in the summary
information property set. |
|
<meta name="keywords" content="fonts,
serif"> |
The keyword property in the summary
information property set. |
|
<meta name="ms.category"
content="fiction"> |
The category property in the document
summary information property set. |
Hit Highlighting
In the “Webhits Parameters” section, the
paragraph under the CiQueryFile
parameter should say virtual path instead
of physical path. The paragraph should read
as follows:
Format: CiQueryFile=Virtual
path
This parameter is optional. If it is passed,
CiQueryFile specifies the
virtual path of the .idq file containing the
[Names] section describing the custom properties.You
must pass this parameter for all queries involving
custom properties. If you try to hit-highlight a
document with a query that has a custom property and
you do not specify the appropriate .idq file, the
error message “No such property” will be displayed.
The following parameters have been added to the
“Webhits Parameters” section:
CiBeginHilite
CiEndHilite |
| Format:
CiBeginHilite=BeginTags&CiEndHilite=EndTags |
These two parameters
together customize highlighted words in the
query results. If you specify these tags,
Index Server ignores all other formatting
parameters CiBold,
CiHiliteColor, CiItalic,
and so on.
Important You
must match the BeginTags and
EndTags with correct HTML formating.
Failure to do so will produce unpredictable
results. When you specify these parameters
in the query template file (.htx file), you
must properly escape the tags. For example:
CiBeginHilite=<%escapeURL
<font color="#FF0000"><em>%>&CiEndHilite=<%escapeURL
</em></font> %>
The two parameters together in the above
example make the highlighted words in the
search results appear in red italics. |
| CiHiliteType |
| Format:
CiHiliteType=[Full|Summary] |
| This parameter is optional.
If not specified, Summary is the default.
Summary The summary
feature can generate small excerpts of a
document around the words that match the
query specification.
Full When
full highlighting is chosen as the option,
the whole document is highlighted and
returned. Note that this does not do
full-fidelity highlighting. Only the text
part of the document is extracted and
highlighted. This option is mainly for
documents that contain mostly text. It also
tags the hits with bookmarks, allowing
navigation between the hits. The first hit
is bookmarked as #CiTag0 and the top of the
generated document is tagged as #CiTag-1. To
help in navigation, double-angle bracket
tags (<< and >>) surround each hit. Click
the << tag to go to the previous hit, and
click the >> tag to go to the next hit. |
| CiLocale |
| Format: CiLocale
=LocaleString |
| This parameter is optional.
If specified, the given locale will be used
to interpret the CiRestriction
string. Output will also be generated using
this locale. Valid values for the
CiLocale string are in the
“Variables in .idq and .htx Files” page. |
| CiMaxLineLength |
| Format:
CiMaxLineLength=Number |
| This parameter is optional.
When this parameter is specified, Webhits
preformats the text with the <pre> and
</pre> HTML tags. If a line length exceeds
the specified number, it is broken at the
next word boundary. This option works best
when
full hit-highlighting is chosen. |
| CiTemplateFile |
| Format:
CiTemplateFile=Virtual path |
This parameter is optional,
but highly recommended. It specifies the
virtual path of the template file that
generates Webhits output. The recommended
extension for a Webhits template file is
.htw. This template file lets you customize
the output like the template files used for
queries. It has a header section, a detail
section, and a footer section. The template
file format used by Webhits is same as the
template file for queries, with the
following differences:
The only replaceable
parameters allowed are
<%CiUrl%>,
<%CiRestriction%>,
<%CiUserParam1
%>,
<%CiUserParam2>,
and so on up to <%CiUserParam10%>.
There is no support for
if-then-else processing.
The detail section is
used only as a placeholder for
hit-highlighting data. In the current
release, Webhits ignores the text
between <%BeginDetail%>
and <%EndDetail%>.
It is, however, important to specify
<%BeginDetail%>
and <%EndDetail%>.
EscapeHTML, EscapeURL,
and EscapeRAW are supported as in query
template files.
Sample template files for Webhits output
formatting are included in the installed
samples as:
/Scripts/Samples/Search/Qfullhit.htw
/Scripts/Samples/Search/Qsumrhit.htw
CiUrl The
virtual path of the document being
highlighted replaces this parameter.
CiRestriction The
value specified for Webhits in the
CiRestriction parameter replaces
this parameter.
CiUserParamNumber Where
Number is a number from 1 to 10.
The corresponding value specified in the
CiUserParamNumber parameter
replaces this parameter. |
|
CiUserParamNumber |
| Format: CiUserParamNumber=value,
where value can be any non-null
string. |
| CiUserParamNumber
is any parameter that can be specified for
Webhits and that can be replaced in
CiTemplateFile. In
CiUserParamNumber,
Number is any number from 1 to 10. For
example, CiUserParam1,
CiUserParam3,
CiUserParam5, and so on. |
In the
Files Used section, the text should read as
follows:
Webhits installs the following files:
/Scripts/Samples/Search/Webhits.exe
/Scripts/Samples/Search/Queryhit.htx
/Scripts/Samples/Search/Queryhit.idq
/Scripts/Samples/Search/QSumrhit.htw
/Scripts/Samples/Search/QFullhit.htw
/Samples/Search/Queryhit.htm
All files above demonstrate summary and full-text
hit-highlighting.
Internet Data Query Files
The following paragraphs have been added to the
Names Section.
The HTML filter emits scripting code embedded in
an HTML page as a script property with the GUID
31F400A0-FD07-11CF-B9BD-00AA003DB18E. The property
name of the script is specified by the language
field of the script tag, for example:
<script language="vbscript">
In this example, the property name is
vbscript. If no language field is specified,
then the language field of an earlier script tag in
the HTML page is used. If no earlier script tag is
specified, then the property name defaults to
javascript. The GUID for the script property is
a registry parameter located at
HKEY_LOCAL_MACHINE
\System
\CurrentControlSet
\Control\HtmlFilter
\ScriptTagClsid
The following example shows you how to name a
custom property for Microsoft Office by adding
globally unique identifier (GUID) to the Names
section of the Internet Data Query (.idq) file:
Custom_Text ( DBTYPE_STR|DBTYPE_BYREF ) =
D5CDD505-2E9C-101B-9397-08002B2CF9AE
"Custom_Text"
In this example, Custom_Text can be any
string. The value of Custom_Text does not have to be
the same at the beginning and end of the line. The
one at the beginning is the friendly name, and the
one at the end (in quotation marks) is the Microsoft
Office property name.
Query Language
In the “Boolean and Proximity Operators” section,
the following note adds important information about
the NEAR operator:
Note The NEAR operator can be
applied only to words or phrases.
Some documented properties are unavailable. The
documentation incorrectly states that the following
property names can be used:
DocCategory
DocCompany
DocManager
To use these properties, you must list them in a
[Names] section in the .idq file. To use these
properties in a restriction, sort specification, or
as a retrieved column, you have to add the following
definitions to the .idq file:
[Names]
#Office document properties which are not in the
standard list
DocCategory ( DBTYPE_STR ) =
D5CDD502-2E9C-101B-9397-08002B2CF9AE 0x2
DocManager ( DBTYPE_STR ) =
D5CDD502-2E9C-101B-9397-08002B2CF9AE 0xE
DocCompany ( DBTYPE_STR ) =
D5CDD502-2E9C-101B-9397-08002B2CF9AE 0xF
Registry Parameters
All keys are in the following path:
HKEY_LOCAL_MACHINE
\SYSTEM
\CurrentControlSet
\Control
\contentindex
The following parameters have been added:
|
CiCatalogFlags REG_DWORD |
Default: 0
Range: 0 - 2 |
|
Controls Index Server
behavior based on certain flags. Set the
value 1 to turn off notifications on all
remote UNC paths. Set this flag if Index
Server is configured to index documents on a
wide area network (WAN) over slow links. Set
the value to 2 to turn off notifications on
all local paths. When either of these flags
is set, Index Server triggers periodic scans
for the paths for which notifications have
been disabled. The registry parameter
ForcedNetPathScanInterval
controls the frequency of paths. |
|
MasterMergeCheckpointInterval REG_DWORD |
Units: Kilobytes
Default: 256
Range: 256 - 4096 |
|
Specifies the
interval after which a new index is flushed
as a master merge proceeds. |
|
MaxRunningWebhits REG_DWORD |
Default: 20
Range: 1 - 200 |
|
Specifies the
maximum number of concurrent instances of
Webhits. When this value is exceeded, the
following error message is generated, and
the user is asked to try again later.
Increase this value for computers with more
memory or processors. |
|
MaxShadowFreeForceMerge REG_DWORD |
Units:
Percentage of free disk space
Default: 20
Range: 5 - 4,000,000,000 |
|
Specifies the
percentage of free disk space occupied by
shadow indexes on a catalog drive. If this
percentage exceeds the value set for this
parameter and if the total free disk space
falls below the minimum set in the
MinDiskFreeForceMerge, a master
merge begins. For example, if this parameter
is set to 500, the amount of free disk space
is 10 megabytes and the amount of space
occupied by shadow indexes is 40 megabytes,
no master merge takes place (40*100/10 is
less than 500). However, if the value of
this parameter is set to 300, a master merge
begins because 40*100/10 is greater than
300. |
|
MaxWebhitsCpuTime REG_DWORD |
Units: Seconds
Default: 30
Range: 5 - 7200 |
|
Specifies the
timeout value for Webhits in CPU seconds. If
Webhits does not process a document in the
stipulated amount of time, it will return an
error message that the allowed time has been
exceeded. |
Variables in .idq and .htx Files
The following variables have been added as
read-only variables for .htx files.
|
Variable Name |
Meaning |
|
CiVersionMajor |
The major version of Index Server. |
|
CiVersionMinor |
The minor version of Index Server. |
For other variables, see
Read-Only Variables Available in .htx Files on
the “Variables in .idq and .htx Files” page.
This section tells you how to delete Index Server
from your computer.
To remove Index Server
-
Stop Microsoft Internet
Information Server or Microsoft Peer Web
Services.
-
Delete the following files from
the %SystemRoot%\System32 directory:
Cidaemon.exe
Htmlfilt.dll
Idq.dll
Infosoft.dll
Kppp.dll
Kppp7.dll
Kpw6.dll
Kpword.dll
Kpxl5.dll
Qperf.dll
Query.dll
Sccfa.dll
Sccfi.dll
Sccifilt.dll
Sccut.dll
Noise.* (where * is one or more
of dat, deu, eng, enu, esn, fra, ita, nld, sve)
Wbcache.* (where * is one or more of deu, eng,
enu, esn, fra, ita, nld, sve)
Wbdbase.* (where * is one or more of deu, eng,
enu, esn, fra, ita, nld, sve)
-
In the registry, delete the
following keys and/or values:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\contentindex
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ContentIndex
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ContentFilter
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ISAPISearch
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\W3SVC\Parameters\Script
Map\.ida
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\W3SVC\Parameters\Script
Map\.idq
-
Delete all Catalog.wci
directories (referenced from the CiCatalog
parameter of an .idq file).
-
Through the Windows NT Explorer,
delete all files pointed to by the virtual roots
/Samples/Search, /Srchadm, /Scripts/Srchadm, and
/Scripts/Samples/Search. Then, through the
Internet Service Manager, you can optionally
remove these virtual roots if they exist.
-
(optional) Delete all references
under HKEY_CLASSES_ROOT to
PersistentHandler, including all links
to classes referenced from a PersistentHandler
value.
© 1996 by Microsoft
Corporation. All rights reserved. |
|
Microsoft Index Server
Release Notes
Thank you for downloading and installing
Microsoft® Index Server version 1.1 for Windows NT®
Server. This file lists the changes made to Index
Server since its beta release. There are also
several installation items to note. These notes are
mainly for users who have a previous version of
Index Server installed on their computers and are
upgrading to the latest version. These changes and
notes are summarized on this page.
For more information about Index Server and
related features, see the home page at the following
address:
http://www.microsoft.com/ntserver/search
The sample files (such as Query.htm) were
replaced. If you modified any of the sample files
and did not move or rename them, they were
overwritten.
Installing Index Server will reset the registry
settings to their defaults. If you have modified the
registry settings for Index Server, you will have to
reset the values to your preferences after
installation.
If Microsoft Internet News Server has been
installed on a server along with Index Server
version 1.1, then news articles can be indexed. You
can find additional sample query forms written for a
news server on the
Index Server home page.
The virtual paths produced by Internet
Information Server (IIS) convert the dot between
newsgroup components to a slash. For example:
|
News Group |
Converted To |
|
comp.os.ms-windows.advocacy |
/comp/os/ms-windows/advocacy |
Note The path
/comp/os/ms-windows/advocacy is not a valid virtual
path in IIS.
New Default Properties
The followingproperties are always available for
queries to newsgroups.
|
Friendly Name |
Datatype |
Property |
|
NewsGroup |
DBTYPE_WSTR |
DBTYPE_BYREF |
Newsgroup to which
article was posted. |
|
NewsGroups |
DBTYPE_WSTR |
DBTYPE_BYREF |
Full set of newsgroups to
which article was cross-posted. |
|
NewsSubject |
DBTYPE_WSTR |
DBTYPE_BYREF |
Subject line of news
article. |
|
NewsFrom |
DBTYPE_WSTR |
DBTYPE_BYREF |
Author of news article. |
|
NewsMsgId |
DBTYPE_WSTR |
DBTYPE_BYREF |
Globally unique message
ID of article. |
Special Requirement for Hit Highlighting
The hit highlighter (Webhits.exe) is a Common
Gateway Interface (CGI) application that must be
stored in a valid virtual path with Execute
permission. If you want to highlight hits in news
articles, add virtual roots, each beginning with
/$CiNews and corresponding to every root in the news
server. Make sure that virtual roots in IIS
beginning with /$CiNews have both Read and Execute
permissions turned off.
For example, if rec.sports.* was being stored at
C:\Sports and the default (home) news root was
C:\Inetpub\Nntproot, two new virtual roots
would be added. /$CiNews/rec/sports=D:\Sports and
/$CiNews=C:\Inetpub\Nntproot. The Read and
Execute permissions are not enabled for
these virtual roots.
When running Webhits.exe, be sure to put the
virtual path /$CiNews/<%vpath%>
into the .htx file in the call to Webhits.exe.
Note The hit highlighter does
not check Read permissions for virtual roots
beginning with /$CiNews/.
NNTP Virtual Roots with UNC Shares
If a virtual root on a news server points to a
universal naming convention (UNC) share,
administrators must add a virtual root in IIS. The
Network News Transfer Protocol (NNTP) virtual root
must be prepended with /$CiNews to highlight the
news articles stored on that UNC share by using
Webhits.
Example
Assume the following in the news server setup:
- /rec.food points to \\Server1\Share1\Dir1
- The user ID is Gourmet\Chef1 (in the form
domain\username)
- The password is Marinade
In IIS, set up a virtual root with the following
properties:
- /$CiNews/rec.food pointing to
\\Server1\Share1\Dir1
- The user ID is Gourmet\Chef1
- The password is Marinade
- Both Read and Execute permissions are turned
off
Important Be
sure to turn off the Read and Execute permissions on
virtual roots prepended with /$CiNews.
This section details changes and additions to the
existing documentation.
Basic Administration
In the sections that discuss the variables
PROOT_virtual and INDEX_virtual root,
(Enabling
Indexing of a Virtual Root and
Forcing a Scan of a Virtual Root), if the root
is a news root, these variables are
PROOT_NNTP_virtual and INDEX_NNTP_virtual
root.
List of Virtual Roots
You can determine the type of a virtual root
while making the VIRTUAL_ROOTS query. Look at the
value of the special property
StorageType—(DBTYPE_UI4) =
b725f130-47ef-101a-a5f1-02608c9eebac 4. The value 0
identifies a Web root. The value 1 identifies a news
root.
Error Messages
This section lists addtions and corrections to
the Index Server error messages, contained on the
Error Messages page.
Event Log Messages
|
Message |
Explanation |
|
Account user-id
does not have interactive logon
privilege on this computer. You can give
user-id interactive logon
privilege on this computer using the
user manager adminstrative tool. |
The specified does not
have
interactive logon
permisson on
the computer running Index Server. Give
the user-id interactive logon
privilege through the User Manager for
Domains. |
Results Page
At the bottom of a results page, you may
periodically see the following message:
|
Message |
Explanation |
|
The index is out of date. |
Files have been modified
since the last time the scope of your
query was indexed. Whenever files in a
scope are modifed, Index Server
re-indexes them automatically whenever
system resources are available. If you
see this message at the bottom of a
results page, wait a few minutes and
retry your query. |
Webhits Errors
|
Message |
Explanation |
|
There are too many copies
of hit highlighter running. Please try
later. |
There are more
simultaneous instances of Webhits than
the maximum number set in the
MaxRunningWebhits registry key.
Try executing your query later, when the
server is less busy. |
|
Hit highlighting took too
long to execute and was timed out. |
Webhits has taken longer
than the allotted time to process a
document, and the server has timed out.
The document may be too big or it may be
corrupted. Ask the administrator to
check the document. |
Virtual Roots
|
Message |
Explanation |
|
Added virtual root <root>
to index. |
The message “Mapped to <path>”
is added to the event log when a virtual
root is indexed. |
|
Removed virtual root <root>
from index. |
This message is written
to the event log when a virtual root is
deleted from the index. |
|
Added scope <path>
to index. |
This message is added to
the event log when a new physical scope
is indexed. |
|
Removed scope <path>
from index. |
This message is written
to the event log when a new physical
scope is deleted from the index. |
Note When virtual roots point
to positions below each other, adding and removing
virtual roots may have no effect on the physical
scopes in the index. For example, some sites such as
www.microsoft.com are branded with virtual roots in
a marketing sense of the word. So if a user wants
information on Windows NT Server, the user follows
the path
http://www.microsoft.com/NTServer, whcn
http://www.microsoft.com/products/backoffice/ntserver
is also a valid path. In this example, even if you
removed the lower virtual root (/NTServer), the
pages will still be indexed because they are include
in another path,
http://www.microsoft.com/products/backoffice/ntserver
in this example.
Filtering
HTML Filter
The HTML filter will not index any of the
contents or properties of an HTML file if the HTML
file contains the following meta tag:
<meta name="robots" content="noindex">
A Webmaster can add this meta tag to selectively
avoid indexing certain HTML files.
If an HTML file contains the following meta tag,
the content field specifies the language code:
<meta name="ms.locale" content="EN">
The file is filtered by the language resources
for that particular language (if available).
The content field in the tag can also specify the
locale by a decimal number, such as 1033, which is
the locale ID for U.S. English.
Some meta tag properties are mapped onto the
Microsoft® Office property sets to allow users to
mark HTML pages with the same properties in the
Office property set. The list of properties that are
mapped are:
|
Property |
Mapped to |
|
<meta name="author" content="ruth"> |
The author property in the summary
information property set. |
|
<meta name="subject" content="word
processing"> |
The subject property in the summary
information property set. |
|
<meta name="keywords" content="fonts,
serif"> |
The keyword property in the summary
information property set. |
|
<meta name="ms.category"
content="fiction"> |
The category property in the document
summary information property set. |
Hit Highlighting
In the “Webhits Parameters” section, the
paragraph under the CiQueryFile
parameter should say virtual path instead
of physical path. The paragraph should read
as follows:
Format: CiQueryFile=Virtual
path
This parameter is optional. If it is passed,
CiQueryFile specifies the
virtual path of the .idq file containing the
[Names] section describing the custom properties.You
must pass this parameter for all queries involving
custom properties. If you try to hit-highlight a
document with a query that has a custom property and
you do not specify the appropriate .idq file, the
error message “No such property” will be displayed.
The following parameters have been added to the
“Webhits Parameters” section:
CiBeginHilite
CiEndHilite |
| Format:
CiBeginHilite=BeginTags&CiEndHilite=EndTags |
These two parameters
together customize highlighted words in the
query results. If you specify these tags,
Index Server ignores all other formatting
parameters CiBold,
CiHiliteColor, CiItalic,
and so on.
Important You
must match the BeginTags and
EndTags with correct HTML formating.
Failure to do so will produce unpredictable
results. When you specify these parameters
in the query template file (.htx file), you
must properly escape the tags. For example:
CiBeginHilite=<%escapeURL
<font color="#FF0000"><em>%>&CiEndHilite=<%escapeURL
</em></font> %>
The two parameters together in the above
example make the highlighted words in the
search results appear in red italics. |
| CiHiliteType |
| Format:
CiHiliteType=[Full|Summary] |
| This parameter is optional.
If not specified, Summary is the default.
Summary The summary
feature can generate small excerpts of a
document around the words that match the
query specification.
Full When
full highlighting is chosen as the option,
the whole document is highlighted and
returned. Note that this does not do
full-fidelity highlighting. Only the text
part of the document is extracted and
highlighted. This option is mainly for
documents that contain mostly text. It also
tags the hits with bookmarks, allowing
navigation between the hits. The first hit
is bookmarked as #CiTag0 and the top of the
generated document is tagged as #CiTag-1. To
help in navigation, double-angle bracket
tags (<< and >>) surround each hit. Click
the << tag to go to the previous hit, and
click the >> tag to go to the next hit. |
| CiLocale |
| Format: CiLocale
=LocaleString |
| This parameter is optional.
If specified, the given locale will be used
to interpret the CiRestriction
string. Output will also be generated using
this locale. Valid values for the
CiLocale string are in the
“Variables in .idq and .htx Files” page. |
| CiMaxLineLength |
| Format:
CiMaxLineLength=Number |
| This parameter is optional.
When this parameter is specified, Webhits
preformats the text with the <pre> and
</pre> HTML tags. If a line length exceeds
the specified number, it is broken at the
next word boundary. This option works best
when
full hit-highlighting is chosen. |
| CiTemplateFile |
| Format:
CiTemplateFile=Virtual path |
This parameter is optional,
but highly recommended. It specifies the
virtual path of the template file that
generates Webhits output. The recommended
extension for a Webhits template file is
.htw. This template file lets you customize
the output like the template files used for
queries. It has a header section, a detail
section, and a footer section. The template
file format used by Webhits is same as the
template file for queries, with the
following differences:
The only replaceable
parameters allowed are
<%CiUrl%>,
<%CiRestriction%>,
<%CiUserParam1
%>,
<%CiUserParam2>,
and so on up to <%CiUserParam10%>.
There is no support for
if-then-else processing.
The detail section is
used only as a placeholder for
hit-highlighting data. In the current
release, Webhits ignores the text
between <%BeginDetail%>
and <%EndDetail%>.
It is, however, important to specify
<%BeginDetail%>
and <%EndDetail%>.
EscapeHTML, EscapeURL,
and EscapeRAW are supported as in query
template files.
Sample template files for Webhits output
formatting are included in the installed
samples as:
/Scripts/Samples/Search/Qfullhit.htw
/Scripts/Samples/Search/Qsumrhit.htw
CiUrl The
virtual path of the document being
highlighted replaces this parameter.
CiRestriction The
value specified for Webhits in the
CiRestriction parameter replaces
this parameter.
CiUserParamNumber Where
Number is a number from 1 to 10.
The corresponding value specified in the
CiUserParamNumber parameter
replaces this parameter. |
|
CiUserParamNumber |
| Format: CiUserParamNumber=value,
where value can be any non-null
string. |
| CiUserParamNumber
is any parameter that can be specified for
Webhits and that can be replaced in
CiTemplateFile. In
CiUserParamNumber,
Number is any number from 1 to 10. For
example, CiUserParam1,
CiUserParam3,
CiUserParam5, and so on. |
In the
Files Used section, the text should read as
follows:
Webhits installs the following files:
/Scripts/Samples/Search/Webhits.exe
/Scripts/Samples/Search/Queryhit.htx
/Scripts/Samples/Search/Queryhit.idq
/Scripts/Samples/Search/QSumrhit.htw
/Scripts/Samples/Search/QFullhit.htw
/Samples/Search/Queryhit.htm
All files above demonstrate summary and full-text
hit-highlighting.
Internet Data Query Files
The following paragraphs have been added to the
Names Section.
The HTML filter emits scripting code embedded in
an HTML page as a script property with the GUID
31F400A0-FD07-11CF-B9BD-00AA003DB18E. The property
name of the script is specified by the language
field of the script tag, for example:
<script language="vbscript">
In this example, the property name is
vbscript. If no language field is specified,
then the language field of an earlier script tag in
the HTML page is used. If no earlier script tag is
specified, then the property name defaults to
javascript. The GUID for the script property is
a registry parameter located at
HKEY_LOCAL_MACHINE
\System
\CurrentControlSet
\Control\HtmlFilter
\ScriptTagClsid
The following example shows you how to name a
custom property for Microsoft Office by adding
globally unique identifier (GUID) to the Names
section of the Internet Data Query (.idq) file:
Custom_Text ( DBTYPE_STR|DBTYPE_BYREF ) =
D5CDD505-2E9C-101B-9397-08002B2CF9AE
"Custom_Text"
In this example, Custom_Text can be any
string. The value of Custom_Text does not have to be
the same at the beginning and end of the line. The
one at the beginning is the friendly name, and the
one at the end (in quotation marks) is the Microsoft
Office property name.
Query Language
In the “Boolean and Proximity Operators” section,
the following note adds important information about
the NEAR operator:
Note The NEAR operator can be
applied only to words or phrases.
Some documented properties are unavailable. The
documentation incorrectly states that the following
property names can be used:
DocCategory
DocCompany
DocManager
To use these properties, you must list them in a
[Names] section in the .idq file. To use these
properties in a restriction, sort specification, or
as a retrieved column, you have to add the following
definitions to the .idq file:
[Names]
#Office document properties which are not in the
standard list
DocCategory ( DBTYPE_STR ) =
D5CDD502-2E9C-101B-9397-08002B2CF9AE 0x2
DocManager ( DBTYPE_STR ) =
D5CDD502-2E9C-101B-9397-08002B2CF9AE 0xE
DocCompany ( DBTYPE_STR ) =
D5CDD502-2E9C-101B-9397-08002B2CF9AE 0xF
Registry Parameters
All keys are in the following path:
HKEY_LOCAL_MACHINE
\SYSTEM
\CurrentControlSet
\Control
\contentindex
The following parameters have been added:
|
CiCatalogFlags REG_DWORD |
Default: 0
Range: 0 - 2 |
|
Controls Index Server
behavior based on certain flags. Set the
value 1 to turn off notifications on all
remote UNC paths. Set this flag if Index
Server is configured to index documents on a
wide area network (WAN) over slow links. Set
the value to 2 to turn off notifications on
all local paths. When either of these flags
is set, Index Server triggers periodic scans
for the paths for which notifications have
been disabled. The registry parameter
ForcedNetPathScanInterval
controls the frequency of paths. |
|
MasterMergeCheckpointInterval REG_DWORD |
Units: Kilobytes
Default: 256
Range: 256 - 4096 |
|
Specifies the
interval after which a new index is flushed
as a master merge proceeds. |
|
MaxRunningWebhits REG_DWORD |
Default: 20
Range: 1 - 200 |
|
Specifies the
maximum number of concurrent instances of
Webhits. When this value is exceeded, the
following error message is generated, and
the user is asked to try again later.
Increase this value for computers with more
memory or processors. |
|
MaxShadowFreeForceMerge REG_DWORD |
Units:
Percentage of free disk space
Default: 20
Range: 5 - 4,000,000,000 |
|
Specifies the
percentage of free disk space occupied by
shadow indexes on a catalog drive. If this
percentage exceeds the value set for this
parameter and if the total free disk space
falls below the minimum set in the
MinDiskFreeForceMerge, a master
merge begins. For example, if this parameter
is set to 500, the amount of free disk space
is 10 megabytes and the amount of space
occupied by shadow indexes is 40 megabytes,
no master merge takes place (40*100/10 is
less than 500). However, if the value of
this parameter is set to 300, a master merge
begins because 40*100/10 is greater than
300. |
|
MaxWebhitsCpuTime REG_DWORD |
Units: Seconds
Default: 30
Range: 5 - 7200 |
|
Specifies the
timeout value for Webhits in CPU seconds. If
Webhits does not process a document in the
stipulated amount of time, it will return an
error message that the allowed time has been
exceeded. |
Variables in .idq and .htx Files
The following variables have been added as
read-only variables for .htx files.
|
Variable Name |
Meaning |
|
CiVersionMajor |
The major version of Index Server. |
|
CiVersionMinor |
The minor version of Index Server. |
For other variables, see
Read-Only Variables Available in .htx Files on
the “Variables in .idq and .htx Files” page.
This section tells you how to delete Index Server
from your computer.
To remove Index Server
-
Stop Microsoft Internet
Information Server or Microsoft Peer Web
Services.
-
Delete the following files from
the %SystemRoot%\System32 directory:
| | |