Note
Since it resides in an XML file, any special characters in the expression such as <, >, &, ", and ' must be encode as shown in the example below. The regular expression is matched case-insensitively.
This element defines the regular expression used to extract the body of the HTML document.
By default it extracts the <body> (.html files) or <bodyText> (.topic files) element content. The "body" part of the regular expression must be a named group called Body.
One example where you might want to modify this is if your document bodies contain several sections contained within div elements. You can alter the expression to extract the specific div that contains just the body text thus excluding the other unwanted parts of the document.
<!-- Note: Lines wrapped for display purposes -->
<BodyExtract expression="<\s*div\s*class="Main"[^>]*?>
(?<Body>.*?)<\s*/\s*div?\s*>" />