18.2 Understanding why Unicode is not the answer

Microsoft HTML Help does not use Unicode; instead it uses Windows code pages. This means that characters with glyphs that are not present in the default code page (for Western languages this is ANSI code page 1252) might not display correctly, and will interfere with use of TOC, index, and search functions

People often think they can get away with using Unicode encoding instead of code-page encoding, because the HTML Help viewer uses Internet Explorer to display the topic pane, and Internet Explorer does understand Unicode. However, if you use any non-ANSI (above U+007F) characters, search will not work right, and if any of your non-ANSI characters appear in titles or in index terms, the TOC and index will not work right, either. If you are processing a language with accented characters, such as German, you cannot get away with Unicode in the topic pane. For example, Unicode represents code points from hexadecimal A0 to FF as two-byte UTF-8 sequences, and code page 1252 represents them as single characters. So even though the code points are the same, and the display looks fine, search fails because the single byte in the search string does not match the two bytes in the UTF-8 encoding.

With a few isolated symbols, you might get away with Unicode content, but it is not good practice. DITA2Go goes to considerable lengths to convert from Unicode to code page for HTML Help. It is not trivial; for Asian languages, DITA2Go uses enormous look-up tables and dozens of lines of C++ code. It is a Bad Idea to blow it off and use Unicode in any form (including numeric character references) instead.

It might be easy to dismiss all this when your language is English, but the rest of the world feels differently.

See also:

§18.12 Generating HTML Help in non-Western languages

§30.4 Mapping special characters

Previous Topic:  18.1 Understanding how DITA2Go produces HTML Help

Next Topic:  18.3 Setting up an HTML Help project

Parent Topic:  18. Generating Microsoft HTML Help

Sibling Topics:

18.1 Understanding how DITA2Go produces HTML Help

18.3 Setting up an HTML Help project

18.4 Customizing HTML Help display features

18.5 Creating pop-ups for HTML Help

18.6 Creating links and hypertext jumps in HTML Help

18.7 Creating related-topic links for HTML Help

18.8 Using secondary windows in HTML Help

18.9 Generating contents and index for HTML Help

18.10 Providing full-text search (FTS) for HTML Help

18.11 Setting up CSH for HTML Help

18.12 Generating HTML Help in non-Western languages

18.13 Compiling and testing HTML Help

18.14 Mapping and merging CHM files