Article thumbnail
Location of Repository

Complementary approaches to representing differences between structured documents

By David T. Barnard and George M. Logan

Abstract

Structured documents Documents can be represented as structures with a hierarchical arrangement of text and non-text nodes, where nodes are labelled by category names such as “paragraph ” and “section”. Representing documents this way is a natural consequence of using the Standard Generalized Markup Language (SGML) to encode the content and form of documents [10, 11, 7]. SGML is widely used. HTML, the encoding used for World Wide Web documents, is an application of SGML [6]; although HTML is used to build hypertext networks of documents rather than hierarchies, each document is itself a hierarchy with explicitly coded links to build the network. The Text Encodin

Topics: structured documents, SGML
Year: 2013
OAI identifier: oai:CiteSeerX.psu:10.1.1.332.2284
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://gandalf.aksis.uib.no/al... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.