19.10.3 Supporting search for non-ANSI text

When you type in a search string that contains non-ANSI characters, Windows does not give you UTF-8; it gives you the character in the current code page for the system locale. That is not much of a problem for western European languages, but it does mean that you would need a different search file for each non-Western locale you want to support. Otherwise, the OmniHelp viewer would have to include code-page conversion, which would require a huge library on each Help user's system.

To make sure OmniHelp full-text search finds terms that include non-ANSI characters:

[OmniHelpOptions]
; UnicodeFTS = No (default, use normal word-break rules for ANSI text,
;  or Yes (use the ICU rules for any language including CJK)
UnicodeFTS = Yes
; UnicodeLocale = formal identifier of language, default en-US
UnicodeLocale = en-US

You will also need two ICU DLL files: icudt40.dll (13 MB) and icuuc40.dll (1 MB). These DLLs are available in archive icu401.zip (6 MB), which you can download from the Omni Systems Web site.

To install the ICU code pages, extract the DLLs from icu401.zip, and copy them to the following locations:

When UnicodeFTS=Yes, DITA2Go will use these DLLs to prepare your OmniHelp output, depending on the value you specify for UnicodeLocale.

Previous Topic:  19.10.2 Generating search data

Next Topic:  19.10.4 Specifying length of search terms

Parent Topic:  19.10 Configuring full-text search for OmniHelp

Sibling Topics:

19.10.1 Understanding how OmniHelp FTS works

19.10.2 Generating search data

19.10.4 Specifying length of search terms

19.10.5 Excluding search terms

19.10.6 Excluding content from being searched

19.10.7 Using regular expressions in search

19.10.8 Highlighting search terms found in topics