Open XML Document Help File Format
The Open XML file format is used to produce word processing documents that can be opened by Microsoft
Word and Open Office. Files in this format are convertible to other file formats such as PDF using other
third-party tools and applications.
The Open XML format works best for projects with roughly 1,000 topics or less. Bear in mind
that topic count is no indication of actual page count as many topics can span several pages. For example, the
format works quite well for the XML Comments Guide (180 topics resulting in 277 pages) and the MAML Guide (90
topics resulting in 136 pages). A test case project with 508 topics generated 722 pages and was still quite
manageable. However, the Sandcastle Help File Builder help project generated over 2,800 topics which generated
over 6,100 pages. While the file could be loaded, it took a while and was rather unwieldy. Saving to other
formats was not possible as Word ran out of memory before it could finish saving the converted file. If you have
a large project with 1,000 or more topics, this is probably not a good option and you should consider one of the
other file formats instead.
Unlike HTML, the Open XML format is completely unforgiving with regard to ill-formed or illegal
content. The build process tries to fix up a number of common issues but it may miss some cases resulting in a
document that states it is corrupted or has issues when opened. You will need to track down the invalid markup
and fix it (i.e. wrap text in paragraphs, remove unsupported HTML markup, etc.). See the following section for
tips on troubleshooting corrupted or invalid documents.
The physical page layout and page numbering relies on several factors all of which are
controlled entirely by the application that consumes the resulting document not the one
that produces it. As such, the help file builder cannot determine page layout nor page
numbers at build time.
Some of the side-effects of this limitation are as follows:
There may be blank pages between topics. These will need to be manually removed after
generating the document.
Syntax sections and code examples may wrap lines due to the page margins. If necessary,
you may need to reformat the code examples to fix up any unwanted line wraps.
Since there is no way to determine valid page numbers, a table of contents is not added to
the document. For similar reasons and also because tagging index words is rather problematic while generating
Open XML, an index is not added to the document either. These can be added and generated after the document has
been produced using the word processing application of your choice.
There are many different bibliography formats available in applications such as Microsoft Word.
The bibliography XML comments and MAML elements are supported but produce a bibliography section within the topic
in which they appear. If desired, add a bibliography in the desired format to the document after it is produced
with the word processing application of your choice.
Code colorization is supported. However, line numbering and collapsible regions are not
supported and those options are ignored.
Obviously, the language filter from the HTML help formats is not supported. As such,
language-specific text is shown using the generic, multi-language style. Likewise, syntax sections and code
blocks are shown in a sequential fashion similar to the topic previewer.
The MAML markup element is supported for passing through Open XML
markup directly into the document. HTML markup is not supported in Open XML documents.
Any HTML in MAML markup elements will likely corrupt the document.
A few HTML elements are recognized and used in the transformations as a requirement of
the presentation style to handle certain cases that cannot be taken care of in the transformations and to
support certain build component output and localized resource items. The Open XML file builder task translates
these elements into Open XML elements when the topics are merged into a single document body.
Those elements are: a (anchor), br (line
break), img (image), span (used for a limited set of named
styles), ul/li (used to define lists). If these elements
appear in a markup element, they will be passed through and processed as if the
transformations or a build component had added them. Note that they will only be processed based on the
conditions expected by the Open XML file builder task. Additional attributes on the elements other than those
expected will be ignored. As always, it is best to avoid use of the markup element
whenever possible.
Unlike MAML, HTML elements are prevalent in XML comments and the presentation style will make
its best effort to translate as many HTML elements in the XML comments as possible to their Open XML equivalents.
However, the end result may need fixing up in the generated document. Better results are obtained with well
formed HTML content. Note that styling attributes are ignored as it is not possible to translate those to an
Open XML equivalent. As with MAML, staying with the standard XML comments elements whenever possible will
produce the best results.
It is common in XML comments to omit the containing paragraph element in simple summary and
remarks content. Normally this is not an issue and the content will be converted to a paragraph with the
expected formatting. In some cases, such as when nested elements appear within the content, it may not wrap as
expected and unintended line breaks may appear in the generated document. In most cases, the solution to this
problem is to wrap the content in a paragraph element in the XML comments so that the help file builder does not
have to guess the intended layout.
Unlike HTML, self-closing and empty paragraphs will be rendered in the document and will
consume space in both MAML topics and XML comments. They cannot be removed as more often than not it ends up
combining text into a single paragraph that is not intended to be combined. The fix is to wrap the text in the
paragraph elements and not use self-closing paragraphs which gets the expected results regardless of file format.
Headers and footers are separate document parts in Open XML content and localized resource
items cannot be used within them. A basic header containing the help title and a basic footer with a page number
are included by default. The following project properties are ignored:
Additional header content
Additional footer content
Copyright notice URL
Copyright notice text
Feedback e-mail address
Feedback e-mail link text
If you want this information to appear in the header and footer, it must be added to the
generated document manually. Since adding all of the information may unnecessarily clutter the header and footer
it may be better to add it in a single topic somewhere near the start of the document.
As in other help file formats, stock content file overrides are supported. Including a
word\header.xml and/or word\footer.xml file containing appropriate,
valid content in your project with the Build Action set to Content will override the
default header and/or footer in the generated document. Similarly, including a word\styles.xml
file containing valid content with the Build Action set to Content will override the
default style sheet used in the document.
The SDK Link Target project property is ignored. Links to external
content are always opened in a new browser window or the related application that handles the given link type.
As mentioned earlier, the Open XML file format is extremely strict and requires that all content
conform to the Open XML schema. Deviation from the expected format usually results in the consuming application
reporting that the file is corrupted in some way. This section covers how to diagnose the issue.
The consuming application, such as Microsoft Word, will typically display a dialog box stating that
the content is corrupt. In Microsoft Word, clicking the Details button may provide more information
about the problem and gives a line number within the XML content. However, most recent versions of Word simply
state that the content is invalid and give the line number as zero.
To diagnose issues with Open XML files, it is useful to install the Open XML Productivity Tool which allows you to open, validate, and inspect the structure of an Open XML document. Install
both the SDK and the productivity tool. To use it:
Open the invalid document in the productivity tool.
In the Settings menu, select the Validate Options item, and make sure that
it is using the Against Microsoft Office 2010 Formats option.
In the Document Explorer pane, expand the "/word/document.xml" node, then the
"w:document (Document)" node, and finally the "w:body (Body)" node. You will then see the elements that make
up the content of the document body. Note that for a large document, it can take a while to expand the body
node.
Click the Validate button in the tool bar. A Validation Results tab will
appear in the right-hand pane. Click the icon in the tab to enable syncing the pane with the selected document
node. The tooltip on the tab will state that it will sync the content. The icon on the tab will show green
angle brackets if it is syncing to the selected node. If a small red "X" appears over the brackets in the tab
icon, it will not sync to the selected node and you will have to manually click the Validate button to update the
state for the selected node.
Select a document node in the left-hand pane to view any validation warnings for the selected
node. This part boils down to searching nodes for with issues. Clicking on a higher level node to see its
children with issues can sometimes give you an idea of where to look. There is usually a node number although it
may not be of much use when there are many nodes of the same type.
If your project is quite large, using the API filter to limit the number of members included
in the output and disabling topics in the content layout file can reduce the size of the resulting document XML
and may help narrow down the location of the failure.
Once you find a node with an issue, you can right click on it and select the Reflect code
option. This will show another tab with the XML and related code that produces it. You can ignore the code as
it will not be relevant to how the file is produced. What is important is the XML that is shown. Using it along
with the validation error message, you should be able to determine the cause of the problem. By selecting the
nodes around the one with the issue and viewing their content you should be able to locate text that will help
you identify the member or topic that contains the text causing the problem. With that information you can
usually track down and fix the problem in the XML comments or the MAML topic.
Another common issue is missing images. While this may not necessarily result in a corrupted
document, it does result in invalid content since there is only a placeholder where the image should appear.
Check the build log as it will contain warning messages about image files that were referenced in the content
but could not be found. The causes of missing images are usually an invalid path or the image not being marked
as content so that it is included in the build output.