1 research outputs found
Non-hierarchical Structures: How to Model and Index Overlaps?
Overlap is a common phenomenon seen when structural components of a digital
object are neither disjoint nor nested inside each other. Overlapping
components resist reduction to a structural hierarchy, and tree-based indexing
and query processing techniques cannot be used for them. Our solution to this
data modeling problem is TGSA (Tree-like Graph for Structural Annotations), a
novel extension of the XML data model for non-hierarchical structures. We
introduce an algorithm for constructing TGSA from annotated documents; the
algorithm can efficiently process non-hierarchical structures and is associated
with formal proofs, ensuring that transformation of the document to the data
model is valid. To enable high performance query analysis in large data
repositories, we further introduce an extension of XML pre-post indexing for
non-hierarchical structures, which can process both reachability and
overlapping relationships.Comment: The paper has been accepted at the Balisage 2014 conferenc