DBCS Fix for CHM Builds Plug-In
DBCS stands for Double Byte Character Set. Several languages, most notably East Asian languages and
Russian, cause problems for HTML Help 1 (.chm) builds because the help compiler is not Unicode-aware. This can
cause the table of contents, index, and in some cases the help topics to display incorrect characters. This
plug-in is designed to correct these issues.
This topic contains the following sections:
The plug-in modifies the topic files and the build in the following ways:
It converts unsupported high-order characters to 7-bit ASCII equivalents.
It converts several unsupported high-order characters to named entities.
It replaces the UTF-8 reference in the CHARSET meta tag in each HTML
page with a character set reference appropriate to the selected language.
It saves each HTML page using the encoding appropriate to the selected language.
It uses the SBAppLocale tool from Steel Bytes to run the HTML Help 1 compiler under the appropriate locale for the selected language. This
saves you from having to manually change the language in your Regional Settings, reboot, and perform the help
compile step.
To use the plug-in, add it to the project and configure it. The only option needed is the location
of the SBAppLocale.exe application. If left blank, only the changes made by the Sandcastle
HTML Extract tool will be applied to the topics.
If the SBAppLocale.exe tool is not used, the text in the table of contents,
index, and search pages may not appear correctly if the language of the help file does not match the locale of
the system on which it is built (i.e. building a Russian help file on an system with an English locale). If the
locales match, there typically will not be any issues.
Note that if your code uses Unicode characters in the member names, it may not be possible to use
the MemberName option for the NamingMethod property. If you
receive build errors that indicate that it cannot find certain files and the names do not look correct, change
the naming method to use either the GUID or HashedMemberName
option.
If you add HTML pages as additional content or use a topic transformation file to create HTML
additional content pages, you should ensure that a meta tag specifying UTF-8 encoding
appears in each file. This will ensure that all additional content pages are also properly encoded if necessary.
For example:
<head>
<title>My Additional Content<title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
The tool that performs the encoding translation relies on the SandcastleHtmlExtract.config
file to provide information about the language locale ID, code page, and character set encoding name. The
configuration file is located in the root help file builder installation folder. If the project's locale ID
cannot be found in the file, a warning is issued in the log file and a default code page and character set
encoding will be used based on the language selected in the project. If the defaults are not appropriate, you
can add new entries to the configuration file.
The settings consist of a set of language elements, one for each language
identified by locale ID, that specify the settings. The id attribute refers to the
locale ID (LCID) for the language. The codepage attribute is the code page to use when
determining the encoding for the files based on the given locale ID. The charset
attribute value will be written to the HTML files in place of the UTF-8 value when localizing the files for use
with the HTML Help 1 compiler.
If you do add entries to this file, please report them so that they can be added to a future
release of the help file builder.