571 research outputs found
Implementation and Performance Evaluation of a Parallel Transitive Closure Algorithms on PRISMA/DB
This paper is one of the first to discuss actual implementation of and experimentation with parallel transitive closure operations on a full-fledged relational database system. It brings two research efforts together; the development of an efficient execution strategy for parallel computation of path problems, called Disconnection Set Approach, and the development and implementation of a parallel, main-memory DBMS, called PRISMA/DB. First, we report on the implementation of the disconnection set approach on PRISMA/DB, showing how the latter's design allowed us to easily extend the functionality of the system. Second, we investigate the disconnection set approach's parallel behavior and performance by means of extensive experimentation. It is shown that the parallel implementation of the disconnection set approach yields very good performance characteristics, and that (super)linear speedup w.r.t. a special implementation of semi-naive is achieved for regular, so-called linear fragmenta..
A survey of parallel execution strategies for transitive closure and logic programs
An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods
Models of higher-order, type-safe, distributed computation over autonomous persistent object stores
A remote procedure call (RPC) mechanism permits the calling of procedures in another
address space. RPC is a simple but highly effective mechanism for interprocess communication
and enjoys nowadays a great popularity as a tool for building distributed applications.
This popularity is partly a result of their overall simplicity but also partly a consequence
of more than 20 years of research in transpaxent distribution that have failed to deliver
systems that meet the expectations of real-world application programmers.
During the same 20 years, persistent systems have proved their suitability for building
complex database applications by seamlessly integrating features traditionally found in
database management systems into the programming language itself. Some research. effort
has been invested on distributed persistent systems, but the outcomes commonly suffer
from the same problems found with transparent distribution.
In this thesis I claim that a higher-order persistent RPC is useful for building distributed
persistent applications. The proposed mechanism is: realistic in the sense that it uses
current technology and tolerates partial failures; understandable by application programmers;
and general to support the development of many classes of distributed persistent
applications.
In order to demonstrate the validity of these claims, I propose and have implemented three
models for distributed higher-order computation over autonomous persistent stores. Each
model has successively exposed new problems which have then been overcome by the next
model. Together, the three models provide a general yet simple higher-order persistent
RPC that is able to operate in realistic environments with partial failures.
The real strength of this thesis is the demonstration of realism and simplicity. A higherorder
persistent RPC was not only implemented but also used by programmers without
experience of programming distributed applications. Furthermore, a distributed persistent
application has been built using these models which would not have been feasible with a
traditional (non-persistent) programming language
Optimization of Regular Path Queries in Graph Databases
Regular path queries offer a powerful navigational mechanism in graph databases. Recently, there has been renewed interest in such queries in the context of the Semantic Web. The extension of SPARQL in version 1.1 with property paths offers a type of regular path query for RDF graph databases. While eminently useful, such queries are difficult to optimize and evaluate efficiently, however. We design and implement a cost-based optimizer we call Waveguide for SPARQL queries with property paths. Waveguide builds a query planwhich we call a waveplan (WP)which guides the query evaluation. There are numerous choices in the con- struction of a plan, and a number of optimization methods, so the space of plans for a query can be quite large. Execution costs of plans for the same query can vary by orders of magnitude with the best plan often offering excellent performance. A WPs costs can be estimated, which opens the way to cost-based optimization. We demonstrate that Waveguide properly subsumes existing techniques and that the new plans it adds are relevant. We analyze the effective plan space which is enabled by Waveguide and design an efficient enumerator for it. We implement a pro- totype of a Waveguide cost-based optimizer on top of an open-source relational RDF store. Finally, we perform a comprehensive performance study of the state of the art for evaluation of SPARQL property paths and demonstrate the significant performance gains that Waveguide offers
Transitive closure algorithm MEMTC and its performance analysis
AbstractIn this paper, we present a new algorithm for computing the full transitive closure designed for operation in layered memories. The algorithm is based on strongly connected component detection and on a very compact representation of data. We analyze the average-case performance of the algorithm experimentally in an environment where two layers of memory of different speed are used. In our analysis, we use trace-based simulation of memory operations
Causal Fourier Analysis on Directed Acyclic Graphs and Posets
We present a novel form of Fourier analysis, and associated signal processing
concepts, for signals (or data) indexed by edge-weighted directed acyclic
graphs (DAGs). This means that our Fourier basis yields an eigendecomposition
of a suitable notion of shift and convolution operators that we define. DAGs
are the common model to capture causal relationships between data values and in
this case our proposed Fourier analysis relates data with its causes under a
linearity assumption that we define. The definition of the Fourier transform
requires the transitive closure of the weighted DAG for which several forms are
possible depending on the interpretation of the edge weights. Examples include
level of influence, distance, or pollution distribution. Our framework is
different from prior GSP: it is specific to DAGs and leverages, and extends,
the classical theory of Moebius inversion from combinatorics. For a
prototypical application we consider DAGs modeling dynamic networks in which
edges change over time. Specifically, we model the spread of an infection on
such a DAG obtained from real-world contact tracing data and learn the
infection signal from samples assuming sparsity in the Fourier domain.Comment: 13 pages, 11 figure
RDF Querying
Reactive Web systems, Web services, and Web-based publish/
subscribe systems communicate events as XML messages, and in
many cases require composite event detection: it is not sufficient to react
to single event messages, but events have to be considered in relation to
other events that are received over time.
Emphasizing language design and formal semantics, we describe the
rule-based query language XChangeEQ for detecting composite events.
XChangeEQ is designed to completely cover and integrate the four complementary
querying dimensions: event data, event composition, temporal
relationships, and event accumulation. Semantics are provided as
model and fixpoint theories; while this is an established approach for rule
languages, it has not been applied for event queries before
- ā¦