org.cyberneko.html

Class HTMLTagBalancer

Implemented Interfaces:
XMLComponent, XMLDocumentFilter, HTMLComponent

public class HTMLTagBalancer
extends java.lang.Object
implements XMLDocumentFilter, HTMLComponent

Balances tags in an HTML document. This component receives document events and tries to correct many common mistakes that human (and computer) HTML document authors make. This tag balancer can:

This component recognizes the following features:

This component recognizes the following properties:

Version:
$Id: HTMLTagBalancer.java,v 1.20 2005/02/14 04:06:22 andyc Exp $
Author:
Andy Clark
See Also:
HTMLElements

Nested Class Summary

static class
HTMLTagBalancer.Info
Element info for each start element.
static class
HTMLTagBalancer.InfoStack
Unsynchronized stack of element information.

Field Summary

protected static String
AUGMENTATIONS
Include infoset augmentations.
protected static String
DOCUMENT_FRAGMENT
Document fragment balancing only.
protected static String
DOCUMENT_FRAGMENT_DEPRECATED
Document fragment balancing only (deprecated).
protected static String
ERROR_REPORTER
Error reporter.
protected static String
IGNORE_OUTSIDE_CONTENT
Ignore outside content.
protected static String
NAMESPACES
Namespaces.
protected static String
NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.
protected static String
NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.
protected static short
NAMES_LOWERCASE
Lowercase HTML names.
protected static short
NAMES_MATCH
Match HTML element names.
protected static short
NAMES_NO_CHANGE
Don't modify HTML names.
protected static short
NAMES_UPPERCASE
Uppercase HTML names.
protected static String
REPORT_ERRORS
Report errors.
protected static HTMLEventInfo
SYNTHESIZED_ITEM
Synthesized event info item.
protected boolean
fAugmentations
Include infoset augmentations.
protected boolean
fDocumentFragment
Document fragment balancing only.
protected XMLDocumentHandler
fDocumentHandler
The document handler.
protected XMLDocumentSource
fDocumentSource
The document source.
protected HTMLTagBalancer.InfoStack
fElementStack
The element stack.
protected HTMLErrorReporter
fErrorReporter
Error reporter.
protected boolean
fIgnoreOutsideContent
Ignore outside content.
protected HTMLTagBalancer.InfoStack
fInlineStack
The inline stack.
protected short
fNamesAttrs
Modify HTML attribute names.
protected short
fNamesElems
Modify HTML element names.
protected boolean
fNamespaces
Namespaces.
protected boolean
fReportErrors
Report errors.
protected boolean
fSeenAnything
True if seen anything.
protected boolean
fSeenBodyElement
True if seen <body< element.
protected boolean
fSeenDoctype
True if root element has been seen.
protected boolean
fSeenHeadElement
True if seen <head< element.
protected boolean
fSeenRootElement
True if root element has been seen.
protected boolean
fSeenRootElementEnd
True if seen the end of the document element.

Method Summary

protected void
callEndElement(QName element, Augmentations augs)
Call document handler end element.
protected void
callStartElement(QName element, XMLAttributes attrs, Augmentations augs)
Call document handler start element.
void
characters(XMLString text, Augmentations augs)
Characters.
void
comment(XMLString text, Augmentations augs)
Comment.
void
doctypeDecl(String rootElementName, String publicId, String systemId, Augmentations augs)
Doctype declaration.
protected XMLAttributes
emptyAttributes()
Returns a set of empty attributes.
void
emptyElement(QName elem, XMLAttributes attrs, Augmentations augs)
Empty element.
void
endCDATA(Augmentations augs)
End CDATA section.
void
endDocument(Augmentations augs)
End document.
void
endElement(QName element, Augmentations augs)
End element.
void
endGeneralEntity(String name, Augmentations augs)
End entity.
void
endPrefixMapping(String prefix, Augmentations augs)
End prefix mapping.
XMLDocumentHandler
getDocumentHandler()
Returns the document handler.
XMLDocumentSource
getDocumentSource()
Returns the document source.
protected HTMLElements.Element
getElement(String name)
Returns an HTML element.
protected int
getElementDepth(HTMLElements.Element element)
Returns the depth of the open tag associated with the specified element name or -1 if no matching element is found.
Boolean
getFeatureDefault(String featureId)
Returns the default state for a feature.
protected static short
getNamesValue(String value)
Converts HTML names string value to constant value.
protected int
getParentDepth(HTMLElements.Element[] parents, short bounds)
Returns the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.
Object
getPropertyDefault(String propertyId)
Returns the default state for a property.
String[]
getRecognizedFeatures()
Returns recognized features.
String[]
getRecognizedProperties()
Returns recognized properties.
void
ignorableWhitespace(XMLString text, Augmentations augs)
Ignorable whitespace.
protected static String
modifyName(String name, short mode)
Modifies the given name based on the specified mode.
void
processingInstruction(String target, XMLString data, Augmentations augs)
Processing instruction.
void
reset(XMLComponentManager manager)
Resets the component.
void
setDocumentHandler(XMLDocumentHandler handler)
Sets the document handler.
void
setDocumentSource(XMLDocumentSource source)
Sets the document source.
void
setFeature(String featureId, boolean state)
Sets a feature.
void
setProperty(String propertyId, Object value)
Sets a property.
void
startCDATA(Augmentations augs)
Start CDATA section.
void
startDocument(XMLLocator locator, String encoding, Augmentations augs)
Start document.
void
startDocument(XMLLocator locator, String encoding, NamespaceContext nscontext, Augmentations augs)
Start document.
void
startElement(QName elem, XMLAttributes attrs, Augmentations augs)
Start element.
void
startGeneralEntity(String name, XMLResourceIdentifier id, String encoding, Augmentations augs)
Start entity.
void
startPrefixMapping(String prefix, String uri, Augmentations augs)
Start prefix mapping.
protected Augmentations
synthesizedAugs()
Returns an augmentations object with a synthesized item added.
void
textDecl(String version, String encoding, Augmentations augs)
Text declaration.
void
xmlDecl(String version, String encoding, String standalone, Augmentations augs)
XML declaration.

Field Details

AUGMENTATIONS

protected static final String AUGMENTATIONS
Include infoset augmentations.

DOCUMENT_FRAGMENT

protected static final String DOCUMENT_FRAGMENT
Document fragment balancing only.

DOCUMENT_FRAGMENT_DEPRECATED

protected static final String DOCUMENT_FRAGMENT_DEPRECATED
Document fragment balancing only (deprecated).

ERROR_REPORTER

protected static final String ERROR_REPORTER
Error reporter.

IGNORE_OUTSIDE_CONTENT

protected static final String IGNORE_OUTSIDE_CONTENT
Ignore outside content.

NAMESPACES

protected static final String NAMESPACES
Namespaces.

NAMES_ATTRS

protected static final String NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.

NAMES_ELEMS

protected static final String NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.

NAMES_LOWERCASE

protected static final short NAMES_LOWERCASE
Lowercase HTML names.
Field Value:
2

NAMES_MATCH

protected static final short NAMES_MATCH
Match HTML element names.
Field Value:
0

NAMES_NO_CHANGE

protected static final short NAMES_NO_CHANGE
Don't modify HTML names.
Field Value:
0

NAMES_UPPERCASE

protected static final short NAMES_UPPERCASE
Uppercase HTML names.
Field Value:
1

REPORT_ERRORS

protected static final String REPORT_ERRORS
Report errors.

SYNTHESIZED_ITEM

protected static final HTMLEventInfo SYNTHESIZED_ITEM
Synthesized event info item.

fAugmentations

protected boolean fAugmentations
Include infoset augmentations.

fDocumentFragment

protected boolean fDocumentFragment
Document fragment balancing only.

fDocumentHandler

protected XMLDocumentHandler fDocumentHandler
The document handler.

fDocumentSource

protected XMLDocumentSource fDocumentSource
The document source.

fElementStack

protected final HTMLTagBalancer.InfoStack fElementStack
The element stack.

fErrorReporter

protected HTMLErrorReporter fErrorReporter
Error reporter.

fIgnoreOutsideContent

protected boolean fIgnoreOutsideContent
Ignore outside content.

fInlineStack

protected final HTMLTagBalancer.InfoStack fInlineStack
The inline stack.

fNamesAttrs

protected short fNamesAttrs
Modify HTML attribute names.

fNamesElems

protected short fNamesElems
Modify HTML element names.

fNamespaces

protected boolean fNamespaces
Namespaces.

fReportErrors

protected boolean fReportErrors
Report errors.

fSeenAnything

protected boolean fSeenAnything
True if seen anything. Important for xml declaration.

fSeenBodyElement

protected boolean fSeenBodyElement
True if seen <body< element.

fSeenDoctype

protected boolean fSeenDoctype
True if root element has been seen.

fSeenHeadElement

protected boolean fSeenHeadElement
True if seen <head< element.

fSeenRootElement

protected boolean fSeenRootElement
True if root element has been seen.

fSeenRootElementEnd

protected boolean fSeenRootElementEnd
True if seen the end of the document element. In other words, this variable is set to false until the end </HTML> tag is seen (or synthesized). This is used to ensure that extraneous events after the end of the document element do not make the document stream ill-formed.

Method Details

callEndElement

protected final void callEndElement(QName element,
                                    Augmentations augs)
            throws XNIException
Call document handler end element.

callStartElement

protected final void callStartElement(QName element,
                                      XMLAttributes attrs,
                                      Augmentations augs)
            throws XNIException
Call document handler start element.

characters

public void characters(XMLString text,
                       Augmentations augs)
            throws XNIException
Characters.

comment

public void comment(XMLString text,
                    Augmentations augs)
            throws XNIException
Comment.

doctypeDecl

public void doctypeDecl(String rootElementName,
                        String publicId,
                        String systemId,
                        Augmentations augs)
            throws XNIException
Doctype declaration.

emptyAttributes

protected final XMLAttributes emptyAttributes()
Returns a set of empty attributes.

emptyElement

public void emptyElement(QName elem,
                         XMLAttributes attrs,
                         Augmentations augs)
            throws XNIException
Empty element.

endCDATA

public void endCDATA(Augmentations augs)
            throws XNIException
End CDATA section.

endDocument

public void endDocument(Augmentations augs)
            throws XNIException
End document.

endElement

public void endElement(QName element,
                       Augmentations augs)
            throws XNIException
End element.

endGeneralEntity

public void endGeneralEntity(String name,
                             Augmentations augs)
            throws XNIException
End entity.

endPrefixMapping

public void endPrefixMapping(String prefix,
                             Augmentations augs)
            throws XNIException
End prefix mapping.

getDocumentHandler

public XMLDocumentHandler getDocumentHandler()
Returns the document handler.

getDocumentSource

public XMLDocumentSource getDocumentSource()
Returns the document source.

getElement

protected HTMLElements.Element getElement(String name)
Returns an HTML element.

getElementDepth

protected final int getElementDepth(HTMLElements.Element element)
Returns the depth of the open tag associated with the specified element name or -1 if no matching element is found.
Parameters:
element - The element.

getFeatureDefault

public Boolean getFeatureDefault(String featureId)
Returns the default state for a feature.
Specified by:
getFeatureDefault in interface HTMLComponent

getNamesValue

protected static final short getNamesValue(String value)
Converts HTML names string value to constant value.

getParentDepth

protected int getParentDepth(HTMLElements.Element[] parents,
                             short bounds)
Returns the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.
Parameters:
parents - The parent elements.

getPropertyDefault

public Object getPropertyDefault(String propertyId)
Returns the default state for a property.
Specified by:
getPropertyDefault in interface HTMLComponent

getRecognizedFeatures

public String[] getRecognizedFeatures()
Returns recognized features.

getRecognizedProperties

public String[] getRecognizedProperties()
Returns recognized properties.

ignorableWhitespace

public void ignorableWhitespace(XMLString text,
                                Augmentations augs)
            throws XNIException
Ignorable whitespace.

modifyName

protected static final String modifyName(String name,
                                         short mode)
Modifies the given name based on the specified mode.

processingInstruction

public void processingInstruction(String target,
                                  XMLString data,
                                  Augmentations augs)
            throws XNIException
Processing instruction.

reset

public void reset(XMLComponentManager manager)
            throws XMLConfigurationException
Resets the component.

setDocumentHandler

public void setDocumentHandler(XMLDocumentHandler handler)
Sets the document handler.

setDocumentSource

public void setDocumentSource(XMLDocumentSource source)
Sets the document source.

setFeature

public void setFeature(String featureId,
                       boolean state)
            throws XMLConfigurationException
Sets a feature.

setProperty

public void setProperty(String propertyId,
                        Object value)
            throws XMLConfigurationException
Sets a property.

startCDATA

public void startCDATA(Augmentations augs)
            throws XNIException
Start CDATA section.

startDocument

public void startDocument(XMLLocator locator,
                          String encoding,
                          Augmentations augs)
            throws XNIException
Start document.

startDocument

public void startDocument(XMLLocator locator,
                          String encoding,
                          NamespaceContext nscontext,
                          Augmentations augs)
            throws XNIException
Start document.

startElement

public void startElement(QName elem,
                         XMLAttributes attrs,
                         Augmentations augs)
            throws XNIException
Start element.

startGeneralEntity

public void startGeneralEntity(String name,
                               XMLResourceIdentifier id,
                               String encoding,
                               Augmentations augs)
            throws XNIException
Start entity.

startPrefixMapping

public void startPrefixMapping(String prefix,
                               String uri,
                               Augmentations augs)
            throws XNIException
Start prefix mapping.

synthesizedAugs

protected final Augmentations synthesizedAugs()
Returns an augmentations object with a synthesized item added.

textDecl

public void textDecl(String version,
                     String encoding,
                     Augmentations augs)
            throws XNIException
Text declaration.

xmlDecl

public void xmlDecl(String version,
                    String encoding,
                    String standalone,
                    Augmentations augs)
            throws XNIException
XML declaration.

(C) Copyright 2002-2005, Andy Clark. All rights reserved.