The BodyExtract Element
This element defines the regular expression used to extract the body of the HTML document.
Rule Processing
By default it extracts the <body> (.html files) or <bodyText> (.topic files) element content. The "body" part of the regular expression must be a named group called Body.
One example where you might want to modify this is if your document bodies contain several sections contained within div elements. You can alter the expression to extract the specific div that contains just the body text thus excluding the other unwanted parts of the document.
Example div Extract
<!-- Note: Lines wrapped for display purposes -->
<BodyExtract expression="<\s*div\s*class="Main"[^>]*?>
(?<Body>.*?)<\s*/\s*div?\s*>" />