22.4.3 Specifying character encoding for HTML

HTML is based on Unicode. DITA2Go does not directly support non-Unicode double-byte languages (except for Asian and Cyrillic code pages for HTML Help), nor right-to-left languages such as Hebrew and Arabic.

Character encoding determines what method is used to represent double-byte characters in the <body> section of HTML output. To specify encoding or, alternatively, numeric references:

[HTMLOptions]
; Encoding = ISO-8859-1 (HTML default, numeric refs),
;  or None (write 0x80-0xFF as single characters)
Encoding=ISO-8859-1
; QuotedEncoding = No (default, W3C usage, required for JavaHelp),
;  or Yes (put encoding in meta tag in single quotes, needed by some
;  older browsers)
QuotedEncoding=No
; NumericCharRefs = Yes (default, always use &#nnn;)
;  or No (use UTF-8 for XML)
NumericCharRefs=Yes

For XHTML, the DITA2Go default is to claim UTF-8 as the encoding, but to use numeric references of the form &#nnn; for all characters that would have to be encoded; this satisfies all browsers. That is, DITA2Go does not actually produce any characters with values greater than 127 using the UTF-8 encoding; instead, DITA2Go uses entities for such characters, readable under any eight-bit encoding scheme.

For XHTML, you can specify a value for XMLEncoding (see §23.2.3 Specifying character encoding for generic XML) other than the default UTF-8. If you set Encoding=UTF-8, you get real UTF-8 encoding (two characters) in place of the numeric character references. However, you can still force use of numeric references by also setting NumericCharRefs=Yes.

While Encoding=None is not strictly compliant, this setting can be useful in places like Russia, where almost the entire text would otherwise consist of numeric character references. Encoding=None provides a 6:1 reduction in such references.

To direct DITA2Go to supply single quotes around the charset attribute value, specify QuotedEncoding=Yes:

<meta http-equiv="Content-type" content="text/html; charset='ISO-8859-1'">

The default is not to enclose the value in quotes.

See also:

§23.2.3 Specifying character encoding for generic XML

Previous Topic:  22.4.2 Specifying namespace and language

Next Topic:  22.4.4 Including or omitting HTML/XML generator information

Parent Topic:  22.4 Supplying values for the <head> element

Sibling Topics:

22.4.1 Specifying HTML/XML version, DOCTYPE, and DTD

22.4.2 Specifying namespace and language

22.4.4 Including or omitting HTML/XML generator information

22.4.5 Specifying page titles for HTML output files

22.4.6 Supplying content for the <meta> tag

22.4.7 Specifying nonstandard values for declarations