Index Set Search Options

This Help topic refers to the following editions:

þ Enterprise þProfessional þ Personal þ Small Business

 

Set global default Index Set Search Options in the Home  | Options | Index Set Search item of the Desktop menu bar.

 

Index Set Search is the ability of the program to search on index data that has been associated with a document. This data most often entered by a user is the most effective method of searching for documents. To properly search for documents this feature should always be enabled. The purpose of the Application Options dialog is to refine the methods by which the Index Search function operates.

 

Setting application defaults is an Administrator function and can only be changed by users with Administrator permissions.

 

These defaults affect only new DocuXplorer Cabinets. Any Cabinet created prior to a change in these defaults will not be affected. Click here for more  information on creating Full Text Search statements.

 

Index Set Search options can be changed for each Cabinet as a property of the object.

 

Set default options for Index Set Search

 

Default settings are as follows;

 

Conditional Drop Characters - Conditional drop characters are a special type of drop character. These characters are dropped only if they are at the beginning or the end of a word. This provides a mechanism for allowing certain characters to be maintained if they appear inside words. The default conditional drop characters are (,.?!;:@#$%^&()-_).

For example:

One of the default conditional characters is the period. This means that periods on the ends of words will be stripped from the text, but if they are in the middle of a word (e.g., in a number such as 48.5), then they will be maintained.

 

Drop Characters - Drop characters are a set of characters that are simply ignored by the FTS (Full Text Search) engine. Drop characters are ignored in both the text and in search strings. The default drop characters are the double quote, the apostrophe (single quote), and the back quote (also known as the grave accent). Administrators can add additional Drop Characters in the Drop Characters/Additional field.

For example:

If the defaults are used and a name such as O'Malley is in the text, the FTS engine will store OMalley (without the apostrophe) as the key value. A search word of either OMalley or O'Malley will find it because the apostrophe would be stripped out of the search word as well.

 

Noise Words - The noise words are words that are ignored by the FTS engine. Once a word is recognized according to the other rules (after obeying delimiters, drop characters, minimum word length, etc.), the word is checked against the noise word list and is ignored if it is found in that list.

The default noise word list includes the following words:

about after all also and another any are because been before being between both but came can come could did does each else for from get got had has have her here him himself his how into its just like make many might more most much must never now only other our out over said same see should since some still such take than that the their them then there these they this those through too under use very want was way well were what when where which while who will with would you your

Delimiters - Delimiters are the set of characters that define word boundaries. The default set of delimiters includes the white space characters, which are the space (ASCII decimal 32), backspace (ASCII decimal 8), tab (ASCII decimal 9), newline (ASCII decimal 10), vertical pipe (ASCII decimal 11), form feed (ASCII decimal 12), and carriage return (ASCII decimal 13). This works for most standard text documents. WordPerfect documents are an example of documents that use delimiters to define boundaries. DocuXplorer has been programmed with the symbol ( highbit) character € that WordPerfect uses to define its boundaries.

Delimiters are always case sensitive. If, for example, you want to use "x" (ASCII dceimal 120) and "X" (ASCII decimal 88) as delimiters, then you must specify both characters as delimiters regardless of the case sensitivity option.

 

Index Options

Minimum Word Length - This option specifies a minimum cut-off point for word recognition. Any word that is shorter than the specified minimum length is simply ignored; these words will not be in the FTS index nor will they be used if they are given in a search condition. When creating an FTS index via SQL, the default minimum word length is 3.

 

Maximum Word Length - The maximum word length specifies the maximum word size that can be stored in the FTS index. This is effectively the key length of the index. In general, you should try to choose a length that is longer than most or all words that are in the information being indexed. The default maximum word length is 30. If a user enters a word with a length greater than that allowed the word will be truncated to the length allowed.

 

Protect Numbers - This option covers a very specific situation. If it is given and the comma and/or period is given as a delimiter character, then numbers that contain commas and/or periods will not be broken into multiple words on those delimiters. With "normal" text, the default delimiters and conditional drop characters will suffice. Using all default settings, the comma and period are not delimiter characters (they are conditional drop characters). Text such as "1,423.99" would be treated as a single word. If you created an FTS index with the period and comma as delimiter characters, then that text would be broken up into three words "1, 423, and 99". If you use the Protect Numbers option, then this would not occur. This option may be useful, for example, if the text contains words that have only commas between them (with no other delimiters). In that case, it may be desirable to treat the comma as a delimiter.

 

Maintain Automatically - DocuXplorer can automatically maintain the Index Set data.

In the <Tools/Options/Index Set Search Dialog> the option's default is "Checked". This allows new Cabinets to automatically inherit the "Checked" default. The Drawer object of any new Cabinet is automatically set to "Unchecked" to facilitate the speed of adding documents to Drawers where document content indexing of electronic documents is not required. The administrator will need to set a Drawer's Document Content Search Property to Enable and Maintain Automatically to "Checked" to allow document content to be indexed. To index the full text of an Image document the document must be through the  OCR process.