23.2.3 Specifying character encoding for generic XML

Character encoding determines the method used to represent character value greater than 0x7F (decimal 127). Such double-byte characters constitute the “high ASCII” set. The default for XML output is UTF-8:

[HTMLOptions]
; Encoding = UTF-8 (XML default), ISO-8859-1 (HTML default, numeric
;  refs), or None (write 0x80-0xFF as single characters)
Encoding=UTF-8
; XMLEncoding default is "UTF-8", entities are used for ANSI chars
XMLEncoding=UTF-8
; NumericCharRefs = Yes (default, always use &#nnn;)
;  or No (use UTF-8 for XML)
NumericCharRefs=No

Entity references for browsers

If your XML output is to be rendered by Web browsers, be aware that even though UTF-8 is the XML standard encoding, many browsers do not support it. The DITA2Go default is to claim UTF-8 as the encoding, but to use numeric references of the form &#nnn; for all characters that would have to be encoded; this satisfies all browsers. That is, with default settings, DITA2Go does not actually produce any characters with values greater than 127 using the UTF-8 encoding; instead, DITA2Go uses entities for such characters, readable under any eight-bit encoding scheme.

The setting for XMLEncoding controls the content of the encoding attribute of the XML declaration. If you set Encoding=UTF-8, you get real UTF-8 encoding (two characters) in place of the numeric character references. However, you can still force use of numeric references by also setting NumericCharRefs=Yes.

While Encoding=None is not strictly compliant, this setting can be useful in places like Russia, where almost the entire text would otherwise consist of numeric character references. Encoding=None provides a 6:1 reduction in such references.

See also:

§22.3 Including starting code and entity references

§22.4.3 Specifying character encoding for HTML

Previous Topic:  23.2.2 Changing output XML version or file extension

Next Topic:  23.2.4 Specifying the root element and content type

Parent Topic:  23.2 Specifying generic XML output settings

Sibling Topics:

23.2.1 Creating a generic XML project

23.2.2 Changing output XML version or file extension

23.2.4 Specifying the root element and content type

23.2.5 Preventing arbitrary line breaks in XML text elements

23.2.6 Specifying a starting topic for generic XML