Class HTMLWriter
java.lang.Object
org.xml.sax.helpers.XMLFilterImpl
org.dom4j.io.XMLWriter
org.dom4j.io.HTMLWriter
- All Implemented Interfaces:
ContentHandler, DTDHandler, EntityResolver, ErrorHandler, LexicalHandler, XMLFilter, XMLReader
HTMLWriter
takes a DOM4J tree and formats it to a stream as
HTML. This formatter is similar to XMLWriter but it outputs the text of CDATA
and Entity sections rather than the serialised format as in XML, it has an
XHTML mode, it retains whitespace in certain elements such as <PRE>,
and it supports certain elements which have no corresponding close tag such
as for <BR> and <P>.
The OutputFormat passed in to the constructor is checked for isXHTML() and
isExpandEmptyElements(). See OutputFormat
for details.
Here are the rules for this class based on an OutputFormat, "format",
passed in to the constructor:
- If an element is in
getOmitElementCloseSet
, then it is treated specially:- It never expands, since some browsers treat this as two separate Horizontal Rules: <HR></HR>
- If
format.isXHTML()
, then it has a space before the closing single-tag slash, since Netscape 4.x- treats this: <HR /> as an element named "HR" with an attribute named "/", but that's better than when it refuses to recognize this: <hr/> which it thinks is an element named "HR/".
- If
format.isXHTML()
, all elements must have either a close element, or be a closed single tag. - If
format.isExpandEmptyElements()
() is true, all elements are expanded except as above.
<myelement><![CDATA[My data]]></myelement>Otherwise, they look like this:
<myelement>My data</myelement>Basically,
OutputFormat.isXHTML()
==
true
will produce valid XML, while OutputFormat.isExpandEmptyElements()
determines whether empty elements are
expanded if isXHTML is true, excepting the special HTML single tags.
Also, HTMLWriter handles tags whose contents should be preformatted, that is,
whitespace-preserved. By default, this set includes the tags <PRE>,
<SCRIPT>, <STYLE>, and <TEXTAREA>, case insensitively. It
does not include <IFRAME>. Other tags, such as <CODE>,
<KBD>, <TT>, <VAR>, are usually rendered in a different
font in most browsers, but don't preserve whitespace, so they also don't
appear in the default list. HTML Comments are always whitespace-preserved.
However, the parser you use may store comments with linefeed-only text nodes
(\n) even if your platform uses another line.separator character, and
HTMLWriter outputs Comment nodes exactly as the DOM is set up by the parser.
See examples and discussion here: setPreformattedTags(java.util.Set)
Examples
Pretty Printing
This example shows how to pretty print a string containing a valid HTML
document to a string. You can also just call the static methods of this
class: prettyPrintHTML(String)
or prettyPrintHTML(String,boolean,boolean,boolean,boolean)
or prettyPrintXHTML(String)
for XHTML (note the X)
String testPrettyPrint(String html) { StringWriter sw = new StringWriter(); OutputFormat format = OutputFormat.createPrettyPrint(); // These are the default values for createPrettyPrint, // so you needn't set them: // format.setNewlines(true); // format.setTrimText(true);</font> format.setXHTML(true); HTMLWriter writer = new HTMLWriter(sw, format); Document document = DocumentHelper.parseText(html); writer.write(document); writer.flush(); return sw.toString(); }This example shows how to create a "squeezed" document, but one that will work in browsers even if the browser line length is limited. No newlines are included, no extra whitespace at all, except where it it required by
setPreformattedTags
.
String testCrunch(String html) { StringWriter sw = new StringWriter(); OutputFormat format = OutputFormat.createPrettyPrint(); format.setNewlines(false); format.setTrimText(true); format.setIndent(""); format.setXHTML(true); format.setExpandEmptyElements(false); format.setNewLineAfterNTags(20); org.dom4j.io.HTMLWriter writer = new HTMLWriter(sw, format); org.dom4j.Document document = DocumentHelper.parseText(html); writer.write(document); writer.flush(); return sw.toString(); }
- Version:
- $Revision: 1.21 $
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final OutputFormat
private Stack
<HTMLWriter.FormatState> private String
private static String
private int
Used to store the qualified element names which should have no close element tagprivate int
Fields inherited from class XMLWriter
DEFAULT_FORMAT, lastOutputNodeType, LEXICAL_HANDLER_NAMES, preserve, writer
-
Constructor Summary
ConstructorsConstructorDescriptionHTMLWriter
(OutputStream out) HTMLWriter
(OutputStream out, OutputFormat format) HTMLWriter
(Writer writer) HTMLWriter
(Writer writer, OutputFormat format) HTMLWriter
(OutputFormat format) -
Method Summary
Modifier and TypeMethodDescriptionvoid
endCDATA()
A clone of the Set of elements that can have their close-tags omitted.boolean
isPreformattedTag
(String qualifiedName) DOCUMENT ME!private String
justSpaces
(String text) private void
protected void
loadOmitElementCloseSet
(Set<String> set) protected boolean
omitElementClose
(String qualifiedName) static String
prettyPrintHTML
(String html) Convenience method to just get a String result.static String
prettyPrintHTML
(String html, boolean newlines, boolean trim, boolean isXHTML, boolean expandEmpty) DOCUMENT ME!static String
prettyPrintXHTML
(String html) Convenience method to just get a String result, but As XHTML .void
setOmitElementCloseSet
(Set<String> newSet) To use the empty set, pass an empty Set, or null:void
setPreformattedTags
(Set<String> newSet) Override the default set, which includes PRE, SCRIPT, STYLE, and TEXTAREA, case insensitively.void
protected void
writeCDATA
(String text) protected void
writeClose
(String qualifiedName) Overriden method to not close certain element names to avoid wierd behaviour from browsers for versions up to 5.xprotected void
This will write the declaration to the given Writer.protected void
writeElement
(Element element) This override handles any elements that should not remove whitespace, such as <PRE>, <SCRIPT>, <STYLE>, and <TEXTAREA>.protected void
writeEmptyElementClose
(String qualifiedName) protected void
writeEntity
(Entity entity) protected void
writeString
(String text) Methods inherited from class XMLWriter
characters, close, comment, createWriter, defaultMaximumAllowedCharacter, endDocument, endDTD, endElement, endEntity, endPrefixMapping, escapeAttributeEntities, escapeElementEntities, flush, getLexicalHandler, getMaximumAllowedCharacter, getOutputFormat, getProperty, handleException, ignorableWhitespace, indent, installLexicalHandler, isElementSpacePreserved, isEscapeText, isExpandEmptyElements, isNamespaceDeclaration, notationDecl, parse, println, processingInstruction, resolveEntityRefs, setDocumentLocator, setEscapeText, setIndentLevel, setLexicalHandler, setMaximumAllowedCharacter, setOutputStream, setProperty, setResolveEntityRefs, setWriter, shouldEncodeChar, startDocument, startDTD, startElement, startEntity, startPrefixMapping, unparsedEntityDecl, write, write, write, write, write, write, write, write, write, write, write, write, write, writeAttribute, writeAttribute, writeAttribute, writeAttributes, writeAttributes, writeClose, writeComment, writeDocType, writeDocType, writeElementContent, writeEntityRef, writeEscapeAttributeEntities, writeNamespace, writeNamespace, writeNamespaces, writeNamespaces, writeNode, writeNodeText, writeOpen, writePrintln, writeProcessingInstruction
Methods inherited from class XMLFilterImpl
error, fatalError, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getParent, parse, resolveEntity, setContentHandler, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setParent, skippedEntity, warning
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface ContentHandler
declaration
-
Field Details
-
lineSeparator
-
DEFAULT_PREFORMATTED_TAGS
-
DEFAULT_HTML_FORMAT
-
formatStack
-
lastText
-
tagsOuput
private int tagsOuput -
newLineAfterNTags
private int newLineAfterNTags -
preformattedTags
-
omitElementCloseSet
-
-
Constructor Details
-
HTMLWriter
-
HTMLWriter
-
HTMLWriter
- Throws:
UnsupportedEncodingException
-
HTMLWriter
- Throws:
UnsupportedEncodingException
-
HTMLWriter
- Throws:
UnsupportedEncodingException
-
HTMLWriter
- Throws:
UnsupportedEncodingException
-
-
Method Details
-
startCDATA
- Specified by:
startCDATA
in interfaceLexicalHandler
- Overrides:
startCDATA
in classXMLWriter
- Throws:
SAXException
-
endCDATA
- Specified by:
endCDATA
in interfaceLexicalHandler
- Overrides:
endCDATA
in classXMLWriter
- Throws:
SAXException
-
writeCDATA
- Overrides:
writeCDATA
in classXMLWriter
- Throws:
IOException
-
writeEntity
- Overrides:
writeEntity
in classXMLWriter
- Throws:
IOException
-
writeDeclaration
Description copied from class:XMLWriter
This will write the declaration to the given Writer. Assumes XML version 1.0 since we don't directly know.
- Overrides:
writeDeclaration
in classXMLWriter
- Throws:
IOException
- DOCUMENT ME!
-
writeString
- Overrides:
writeString
in classXMLWriter
- Throws:
IOException
-
writeClose
Overriden method to not close certain element names to avoid wierd behaviour from browsers for versions up to 5.x- Overrides:
writeClose
in classXMLWriter
- Parameters:
qualifiedName
- DOCUMENT ME!- Throws:
IOException
- DOCUMENT ME!
-
writeEmptyElementClose
- Overrides:
writeEmptyElementClose
in classXMLWriter
- Throws:
IOException
-
omitElementClose
-
internalGetOmitElementCloseSet
-
loadOmitElementCloseSet
-
getOmitElementCloseSet
-
setOmitElementCloseSet
-
getPreformattedTags
-
setPreformattedTags
Override the default set, which includes PRE, SCRIPT, STYLE, and TEXTAREA, case insensitively. Setting Preformatted Tags Pass in a Set of Strings, one for each tag name that should be treated like a PRE tag. You may pass in null or an empty Set to assign the empty set, in which case no tags will be treated as preformatted, except that HTML Comments will continue to be preformatted. If a tag is included in the set of preformatted tags, all whitespace within the tag will be preserved, including whitespace on the same line preceding the close tag. This will generally make the close tag not line up with the start tag, but it preserves the intention of the whitespace within the tag. The browser considers leading whitespace before the close tag to be significant, but leading whitespace before the open tag to be insignificant. For example, if the HTML author doesn't put the close TEXTAREA tag flush to the left margin, then the TEXTAREA control in the browser will have spaces on the last line inside the control. This may be the HTML author's intent. Similarly, in a PRE, the browser treats a flushed left close PRE tag as different from a close tag with leading whitespace. Again, this must be left up to the HTML author. Examples Here is an example of how you can set the PreformattedTags list using setPreformattedTags to include IFRAME, as well as the default set, if you have an instance of this class named myHTMLWriter:Set current = myHTMLWriter.getPreformattedTags(); current.add("IFRAME"); myHTMLWriter.setPreformattedTags(current); //The set is now <b>PRE, SCRIPT, STYLE, TEXTAREA, IFRAME</b>
Similarly, you can simply replace it with your own:HashSet newset = new HashSet(); newset.add("PRE"); newset.add("TEXTAREA"); myHTMLWriter.setPreformattedTags(newset); //The set is now <b>{PRE, TEXTAREA}</b>
You can remove all tags from the preformatted tags list, with an empty set, like this:myHTMLWriter.setPreformattedTags(new HashSet()); //The set is now <b>{}</b>
or with null, like this:myHTMLWriter.setPreformattedTags(null); //The set is now <b>{}</b>
- Parameters:
newSet
- DOCUMENT ME!
-
isPreformattedTag
DOCUMENT ME!- Parameters:
qualifiedName
- DOCUMENT ME!- Returns:
- true if the qualifiedName passed in matched (case-insensitively) a tag in the preformattedTags set, or false if not found or if the set is empty or null.
- See Also:
-
writeElement
This override handles any elements that should not remove whitespace, such as <PRE>, <SCRIPT>, <STYLE>, and <TEXTAREA>. Note: the close tags won't line up with the open tag, but we can't alter that. See javadoc note at setPreformattedTags.- Overrides:
writeElement
in classXMLWriter
- Parameters:
element
- DOCUMENT ME!- Throws:
IOException
- When the stream could not be written to.- See Also:
-
justSpaces
-
lazyInitNewLinesAfterNTags
private void lazyInitNewLinesAfterNTags() -
prettyPrintHTML
public static String prettyPrintHTML(String html) throws IOException, UnsupportedEncodingException, DocumentException Convenience method to just get a String result.- Parameters:
html
- DOCUMENT ME!- Returns:
- a pretty printed String from the source string, preserving whitespace in the defaultPreformattedTags set, and leaving the close tags off of the default omitElementCloseSet set. Use one of the write methods if you want stream output.
- Throws:
IOException
- DOCUMENT ME!UnsupportedEncodingException
- DOCUMENT ME!DocumentException
- DOCUMENT ME!
-
prettyPrintXHTML
public static String prettyPrintXHTML(String html) throws IOException, UnsupportedEncodingException, DocumentException Convenience method to just get a String result, but As XHTML .- Parameters:
html
- DOCUMENT ME!- Returns:
- a pretty printed String from the source string, preserving whitespace in the defaultPreformattedTags set, but conforming to XHTML: no close tags are omitted (though if empty, they will be converted to XHTML empty tags: <HR/> Use one of the write methods if you want stream output.
- Throws:
IOException
- DOCUMENT ME!UnsupportedEncodingException
- DOCUMENT ME!DocumentException
- DOCUMENT ME!
-
prettyPrintHTML
public static String prettyPrintHTML(String html, boolean newlines, boolean trim, boolean isXHTML, boolean expandEmpty) throws IOException, UnsupportedEncodingException, DocumentException DOCUMENT ME!- Parameters:
html
- DOCUMENT ME!newlines
- DOCUMENT ME!trim
- DOCUMENT ME!isXHTML
- DOCUMENT ME!expandEmpty
- DOCUMENT ME!- Returns:
- a pretty printed String from the source string, preserving whitespace in the defaultPreformattedTags set, and leaving the close tags off of the default omitElementCloseSet set. This override allows you to specify various formatter options. Use one of the write methods if you want stream output.
- Throws:
IOException
- DOCUMENT ME!UnsupportedEncodingException
- DOCUMENT ME!DocumentException
- DOCUMENT ME!
-