127 research outputs found
Fast and Tiny Structural Self-Indexes for XML
XML document markup is highly repetitive and therefore well compressible
using dictionary-based methods such as DAGs or grammars. In the context of
selectivity estimation, grammar-compressed trees were used before as synopsis
for structural XPath queries. Here a fully-fledged index over such grammars is
presented. The index allows to execute arbitrary tree algorithms with a
slow-down that is comparable to the space improvement. More interestingly,
certain algorithms execute much faster over the index (because no decompression
occurs). E.g., for structural XPath count queries, evaluating over the index is
faster than previous XPath implementations, often by two orders of magnitude.
The index also allows to serialize XML results (including texts) faster than
previous systems, by a factor of ca. 2-3. This is due to efficient copy
handling of grammar repetitions, and because materialization is totally
avoided. In order to compare with twig join implementations, we implemented a
materializer which writes out pre-order numbers of result nodes, and show its
competitiveness.Comment: 13 page
Compression vs Queryability - A Case Study
International audienceSome compromise on compression is known to be necessary, if the relative positions of the information stored by semi-structured documents are to remain accessible under queries. With this in view, we compare, on an example, the `query-friendliness' of XML documents, when compressed into straightline tree grammars which are either regular or context-free. The queries considered are in a limited fragment of XPath, corresponding to a type of patterns; each such query defines naturally a non-deterministic, bottom-up `query automaton' that runs just as well on a tree as on its compressed dag
XQuery Streaming by Forest Transducers
Streaming of XML transformations is a challenging task and only very few
systems support streaming. Research approaches generally define custom
fragments of XQuery and XPath that are amenable to streaming, and then design
custom algorithms for each fragment. These languages have several shortcomings.
Here we take a more principles approach to the problem of streaming
XQuery-based transformations. We start with an elegant transducer model for
which many static analysis problems are well-understood: the Macro Forest
Transducer (MFT). We show that a large fragment of XQuery can be translated
into MFTs --- indeed, a fragment of XQuery, that can express important features
that are missing from other XQuery stream engines, such as GCX: our fragment of
XQuery supports XPath predicates and let-statements. We then rely on a
streaming execution engine for MFTs, one which uses a well-founded set of
optimizations from functional programming, such as strictness analysis and
deforestation. Our prototype achieves time and memory efficiency comparable to
the fastest known engine for XQuery streaming, GCX. This is surprising because
our engine relies on the OCaml built in garbage collector and does not use any
specialized buffer management, while GCX's efficiency is due to clever and
explicit buffer management.Comment: Full version of the paper in the Proceedings of the 30th IEEE
International Conference on Data Engineering (ICDE 2014
08261 Executive Summary -- Structure-Based Compression of Complex Massive Data
From 22nd June to 27th of June 2008, the Dagstuhl Seminar
``08261 Structure-Based Compression of
Complex Massive Data\u27\u27 took place at the
Conference and Research Center (IBFI) in Dagstuhl.
22 researchers with interests in theory and application
of compression and computation on compressed structures
met to present their current work and to discuss
future directions
08261 Abstracts Collection -- Structure-Based Compression of Complex Massive Data
From June 22, 2008 to June 27, 2008 the Dagstuhl Seminar 08261 ``Structure-Based Compression of Complex Massive Data\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available
XPath Node Selection over Grammar-Compressed Trees
XML document markup is highly repetitive and therefore well compressible
using grammar-based compression. Downward, navigational XPath can be executed
over grammar-compressed trees in PTIME: the query is translated into an
automaton which is executed in one pass over the grammar. This result is
well-known and has been mentioned before. Here we present precise bounds on the
time complexity of this problem, in terms of big-O notation. For a given
grammar and XPath query, we consider three different tasks: (1) to count the
number of nodes selected by the query, (2) to materialize the pre-order numbers
of the selected nodes, and (3) to serialize the subtrees at the selected nodes.Comment: In Proceedings TTATT 2013, arXiv:1311.505
Logics for Unranked Trees: An Overview
Labeled unranked trees are used as a model of XML documents, and logical
languages for them have been studied actively over the past several years. Such
logics have different purposes: some are better suited for extracting data,
some for expressing navigational properties, and some make it easy to relate
complex properties of trees to the existence of tree automata for those
properties. Furthermore, logics differ significantly in their model-checking
properties, their automata models, and their behavior on ordered and unordered
trees. In this paper we present a survey of logics for unranked trees
Managing Compressed Structured Text
[Definition]: Compressing structured text is the problem of creating a reduced-space representation from which the original
data can be re-created exactly. Compared to plain text compression, the goal is to take advantage of the structural
properties of the data. A more ambitious goal is that of being able of manipulating this text in compressed form,
without decompressing it. This entry focuses on compressing, navigating, and searching structured text, as those
are the areas where more advances have been made
- âŠ