852 research outputs found
A model for querying semistructured data through the exploitation of regular sub-structures
Much research has been undertaken in order to speed up the processing of semistructured data in general and XML in particular. Many approaches for storage, compression, indexing and querying exist, e.g. [1, 2]. We do not present yet another such algorithm but a unifying model in which these algorithm can be understood. The key idea behind this research is the assumption, that most practical queries are based on a particular pattern of data that can be deduced from the query and which can then be captured using a regular structure amendable to efficient processing techniques
Quasi-SLCA based Keyword Query Processing over Probabilistic XML Data
The probabilistic threshold query is one of the most common queries in
uncertain databases, where a result satisfying the query must be also with
probability meeting the threshold requirement. In this paper, we investigate
probabilistic threshold keyword queries (PrTKQ) over XML data, which is not
studied before. We first introduce the notion of quasi-SLCA and use it to
represent results for a PrTKQ with the consideration of possible world
semantics. Then we design a probabilistic inverted (PI) index that can be used
to quickly return the qualified answers and filter out the unqualified ones
based on our proposed lower/upper bounds. After that, we propose two efficient
and comparable algorithms: Baseline Algorithm and PI index-based Algorithm. To
accelerate the performance of algorithms, we also utilize probability density
function. An empirical study using real and synthetic data sets has verified
the effectiveness and the efficiency of our approaches
Adding Logical Operators to Tree Pattern Queries on Graph-Structured Data
As data are increasingly modeled as graphs for expressing complex
relationships, the tree pattern query on graph-structured data becomes an
important type of queries in real-world applications. Most practical query
languages, such as XQuery and SPARQL, support logical expressions using
logical-AND/OR/NOT operators to define structural constraints of tree patterns.
In this paper, (1) we propose generalized tree pattern queries (GTPQs) over
graph-structured data, which fully support propositional logic of structural
constraints. (2) We make a thorough study of fundamental problems including
satisfiability, containment and minimization, and analyze the computational
complexity and the decision procedures of these problems. (3) We propose a
compact graph representation of intermediate results and a pruning approach to
reduce the size of intermediate results and the number of join operations --
two factors that often impair the efficiency of traditional algorithms for
evaluating tree pattern queries. (4) We present an efficient algorithm for
evaluating GTPQs using 3-hop as the underlying reachability index. (5)
Experiments on both real-life and synthetic data sets demonstrate the
effectiveness and efficiency of our algorithm, from several times to orders of
magnitude faster than state-of-the-art algorithms in terms of evaluation time,
even for traditional tree pattern queries with only conjunctive operations.Comment: 16 page
Boosting XML Filtering with a Scalable FPGA-based Architecture
The growing amount of XML encoded data exchanged over the Internet increases
the importance of XML based publish-subscribe (pub-sub) and content based
routing systems. The input in such systems typically consists of a stream of
XML documents and a set of user subscriptions expressed as XML queries. The
pub-sub system then filters the published documents and passes them to the
subscribers. Pub-sub systems are characterized by very high input ratios,
therefore the processing time is critical. In this paper we propose a "pure
hardware" based solution, which utilizes XPath query blocks on FPGA to solve
the filtering problem. By utilizing the high throughput that an FPGA provides
for parallel processing, our approach achieves drastically better throughput
than the existing software or mixed (hardware/software) architectures. The
XPath queries (subscriptions) are translated to regular expressions which are
then mapped to FPGA devices. By introducing stacks within the FPGA we are able
to express and process a wide range of path queries very efficiently, on a
scalable environment. Moreover, the fact that the parser and the filter
processing are performed on the same FPGA chip, eliminates expensive
communication costs (that a multi-core system would need) thus enabling very
fast and efficient pipelining. Our experimental evaluation reveals more than
one order of magnitude improvement compared to traditional pub/sub systems.Comment: CIDR 200
Fast and Tiny Structural Self-Indexes for XML
XML document markup is highly repetitive and therefore well compressible
using dictionary-based methods such as DAGs or grammars. In the context of
selectivity estimation, grammar-compressed trees were used before as synopsis
for structural XPath queries. Here a fully-fledged index over such grammars is
presented. The index allows to execute arbitrary tree algorithms with a
slow-down that is comparable to the space improvement. More interestingly,
certain algorithms execute much faster over the index (because no decompression
occurs). E.g., for structural XPath count queries, evaluating over the index is
faster than previous XPath implementations, often by two orders of magnitude.
The index also allows to serialize XML results (including texts) faster than
previous systems, by a factor of ca. 2-3. This is due to efficient copy
handling of grammar repetitions, and because materialization is totally
avoided. In order to compare with twig join implementations, we implemented a
materializer which writes out pre-order numbers of result nodes, and show its
competitiveness.Comment: 13 page
- …