Article thumbnail
Location of Repository

Complementary approaches to representing differences between structured documents

By David T. Barnard and George M. Logan


Structured documents Documents can be represented as structures with a hierarchical arrangement of text and non-text nodes, where nodes are labelled by category names such as “paragraph ” and “section”. Representing documents this way is a natural consequence of using the Standard Generalized Markup Language (SGML) to encode the content and form of documents [10, 11, 7]. SGML is widely used. HTML, the encoding used for World Wide Web documents, is an application of SGML [6]; although HTML is used to build hypertext networks of documents rather than hierarchies, each document is itself a hierarchy with explicitly coded links to build the network. The Text Encodin

Topics: structured documents, SGML
Year: 2013
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.