Open XML Document Help File Format

The Open XML file format is used to produce word processing documents that can be opened by Microsoft Word and Open Office. Files in this format are convertible to other file formats such as PDF using other third-party tools and applications.

Knowns Issues and Limitations

  • The Open XML format works best for projects with roughly 1,000 topics or less. Bear in mind that topic count is no indication of actual page count as many topics can span several pages. For example, the format works quite well for the XML Comments Guide (180 topics resulting in 277 pages) and the MAML Guide (90 topics resulting in 136 pages). A test case project with 508 topics generated 722 pages and was still quite manageable. However, the Sandcastle Help File Builder help project generated over 2,800 topics which generated over 6,100 pages. While the file could be loaded, it took a while and was rather unwieldy. Saving to other formats was not possible as Word ran out of memory before it could finish saving the converted file. If you have a large project with 1,000 or more topics, this is probably not a good option and you should consider one of the other file formats instead.

  • Unlike HTML, the Open XML format is completely unforgiving with regard to ill-formed or illegal content. The build process tries to fix up a number of common issues but it may miss some cases resulting in a document that states it is corrupted or has issues when opened. You will need to track down the invalid markup and fix it (i.e. wrap text in paragraphs, remove unsupported HTML markup, etc.). See the following section for tips on troubleshooting corrupted or invalid documents.

  •   Important

    The physical page layout and page numbering relies on several factors all of which are controlled entirely by the application that consumes the resulting document not the one that produces it. As such, the help file builder cannot determine page layout nor page numbers at build time.

    Some of the side-effects of this limitation are as follows:

    • There may be blank pages between topics. These will need to be manually removed after generating the document.

    • Syntax sections and code examples may wrap lines due to the page margins. If necessary, you may need to reformat the code examples to fix up any unwanted line wraps.

    • Since there is no way to determine valid page numbers, a table of contents is not added to the document. For similar reasons and also because tagging index words is rather problematic while generating Open XML, an index is not added to the document either. These can be added and generated after the document has been produced using the word processing application of your choice.

  • There are many different bibliography formats available in applications such as Microsoft Word. The bibliography XML comments and MAML elements are supported but produce a bibliography section within the topic in which they appear. If desired, add a bibliography in the desired format to the document after it is produced with the word processing application of your choice.

  • Code colorization is supported. However, line numbering and collapsible regions are not supported and those options are ignored.

  • Obviously, the language filter from the HTML help formats is not supported. As such, language-specific text is shown using the generic, multi-language style. Likewise, syntax sections and code blocks are shown in a sequential fashion similar to the topic previewer.

  • The MAML markup element is supported for passing through Open XML markup directly into the document. HTML markup is not supported in Open XML documents. Any HTML in MAML markup elements will likely corrupt the document.

      Tip

    A few HTML elements are recognized and used in the transformations as a requirement of the presentation style to handle certain cases that cannot be taken care of in the transformations and to support certain build component output and localized resource items. The Open XML file builder task translates these elements into Open XML elements when the topics are merged into a single document body.

    Those elements are: a (anchor), br (line break), img (image), span (used for a limited set of named styles), ul/li (used to define lists). If these elements appear in a markup element, they will be passed through and processed as if the transformations or a build component had added them. Note that they will only be processed based on the conditions expected by the Open XML file builder task. Additional attributes on the elements other than those expected will be ignored. As always, it is best to avoid use of the markup element whenever possible.

  • Unlike MAML, HTML elements are prevalent in XML comments and the presentation style will make its best effort to translate as many HTML elements in the XML comments as possible to their Open XML equivalents. However, the end result may need fixing up in the generated document. Better results are obtained with well formed HTML content. Note that styling attributes are ignored as it is not possible to translate those to an Open XML equivalent. As with MAML, staying with the standard XML comments elements whenever possible will produce the best results.

  • It is common in XML comments to omit the containing paragraph element in simple summary and remarks content. Normally this is not an issue and the content will be converted to a paragraph with the expected formatting. In some cases, such as when nested elements appear within the content, it may not wrap as expected and unintended line breaks may appear in the generated document. In most cases, the solution to this problem is to wrap the content in a paragraph element in the XML comments so that the help file builder does not have to guess the intended layout.

  • Unlike HTML, self-closing and empty paragraphs will be rendered in the document and will consume space in both MAML topics and XML comments. They cannot be removed as more often than not it ends up combining text into a single paragraph that is not intended to be combined. The fix is to wrap the text in the paragraph elements and not use self-closing paragraphs which gets the expected results regardless of file format.

  • Headers and footers are separate document parts in Open XML content and localized resource items cannot be used within them. A basic header containing the help title and a basic footer with a page number are included by default. The following project properties are ignored:

    • Additional header content

    • Additional footer content

    • Copyright notice URL

    • Copyright notice text

    • Feedback e-mail address

    • Feedback e-mail link text

    If you want this information to appear in the header and footer, it must be added to the generated document manually. Since adding all of the information may unnecessarily clutter the header and footer it may be better to add it in a single topic somewhere near the start of the document.

      Tip

    As in other help file formats, stock content file overrides are supported. Including a word\header.xml and/or word\footer.xml file containing appropriate, valid content in your project with the Build Action set to Content will override the default header and/or footer in the generated document. Similarly, including a word\styles.xml file containing valid content with the Build Action set to Content will override the default style sheet used in the document.

  • The SDK Link Target project property is ignored. Links to external content are always opened in a new browser window or the related application that handles the given link type.

Troubleshooting Corrupted Output

As mentioned earlier, the Open XML file format is extremely strict and requires that all content conform to the Open XML schema. Deviation from the expected format usually results in the consuming application reporting that the file is corrupted in some way. This section covers how to diagnose the issue.

The consuming application, such as Microsoft Word, will typically display a dialog box stating that the content is corrupt. In Microsoft Word, clicking the Details button may provide more information about the problem and gives a line number within the XML content. However, most recent versions of Word simply state that the content is invalid and give the line number as zero.

To diagnose issues with Open XML files, it is useful to install the Open XML Productivity Tool which allows you to open, validate, and inspect the structure of an Open XML document. Install both the SDK and the productivity tool. To use it:

  • Open the invalid document in the productivity tool.

  • In the Settings menu, select the Validate Options item, and make sure that it is using the Against Microsoft Office 2010 Formats option.

  • In the Document Explorer pane, expand the "/word/document.xml" node, then the "w:document (Document)" node, and finally the "w:body (Body)" node. You will then see the elements that make up the content of the document body. Note that for a large document, it can take a while to expand the body node.

  • Click the Validate button in the tool bar. A Validation Results tab will appear in the right-hand pane. Click the icon in the tab to enable syncing the pane with the selected document node. The tooltip on the tab will state that it will sync the content. The icon on the tab will show green angle brackets if it is syncing to the selected node. If a small red "X" appears over the brackets in the tab icon, it will not sync to the selected node and you will have to manually click the Validate button to update the state for the selected node.

  • Select a document node in the left-hand pane to view any validation warnings for the selected node. This part boils down to searching nodes for with issues. Clicking on a higher level node to see its children with issues can sometimes give you an idea of where to look. There is usually a node number although it may not be of much use when there are many nodes of the same type.

      Tip

    If your project is quite large, using the API filter to limit the number of members included in the output and disabling topics in the content layout file can reduce the size of the resulting document XML and may help narrow down the location of the failure.

  • Once you find a node with an issue, you can right click on it and select the Reflect code option. This will show another tab with the XML and related code that produces it. You can ignore the code as it will not be relevant to how the file is produced. What is important is the XML that is shown. Using it along with the validation error message, you should be able to determine the cause of the problem. By selecting the nodes around the one with the issue and viewing their content you should be able to locate text that will help you identify the member or topic that contains the text causing the problem. With that information you can usually track down and fix the problem in the XML comments or the MAML topic.

Another common issue is missing images. While this may not necessarily result in a corrupted document, it does result in invalid content since there is only a placeholder where the image should appear. Check the build log as it will contain warning messages about image files that were referenced in the content but could not be found. The causes of missing images are usually an invalid path or the image not being marked as content so that it is included in the build output.

See Also