571 research outputs found

    Implementation and Performance Evaluation of a Parallel Transitive Closure Algorithms on PRISMA/DB

    Get PDF
    This paper is one of the first to discuss actual implementation of and experimentation with parallel transitive closure operations on a full-fledged relational database system. It brings two research efforts together; the development of an efficient execution strategy for parallel computation of path problems, called Disconnection Set Approach, and the development and implementation of a parallel, main-memory DBMS, called PRISMA/DB. First, we report on the implementation of the disconnection set approach on PRISMA/DB, showing how the latter's design allowed us to easily extend the functionality of the system. Second, we investigate the disconnection set approach's parallel behavior and performance by means of extensive experimentation. It is shown that the parallel implementation of the disconnection set approach yields very good performance characteristics, and that (super)linear speedup w.r.t. a special implementation of semi-naive is achieved for regular, so-called linear fragmenta..

    A survey of parallel execution strategies for transitive closure and logic programs

    Get PDF
    An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods

    Models of higher-order, type-safe, distributed computation over autonomous persistent object stores

    Get PDF
    A remote procedure call (RPC) mechanism permits the calling of procedures in another address space. RPC is a simple but highly effective mechanism for interprocess communication and enjoys nowadays a great popularity as a tool for building distributed applications. This popularity is partly a result of their overall simplicity but also partly a consequence of more than 20 years of research in transpaxent distribution that have failed to deliver systems that meet the expectations of real-world application programmers. During the same 20 years, persistent systems have proved their suitability for building complex database applications by seamlessly integrating features traditionally found in database management systems into the programming language itself. Some research. effort has been invested on distributed persistent systems, but the outcomes commonly suffer from the same problems found with transparent distribution. In this thesis I claim that a higher-order persistent RPC is useful for building distributed persistent applications. The proposed mechanism is: realistic in the sense that it uses current technology and tolerates partial failures; understandable by application programmers; and general to support the development of many classes of distributed persistent applications. In order to demonstrate the validity of these claims, I propose and have implemented three models for distributed higher-order computation over autonomous persistent stores. Each model has successively exposed new problems which have then been overcome by the next model. Together, the three models provide a general yet simple higher-order persistent RPC that is able to operate in realistic environments with partial failures. The real strength of this thesis is the demonstration of realism and simplicity. A higherorder persistent RPC was not only implemented but also used by programmers without experience of programming distributed applications. Furthermore, a distributed persistent application has been built using these models which would not have been feasible with a traditional (non-persistent) programming language

    Optimization of Regular Path Queries in Graph Databases

    Get PDF
    Regular path queries offer a powerful navigational mechanism in graph databases. Recently, there has been renewed interest in such queries in the context of the Semantic Web. The extension of SPARQL in version 1.1 with property paths offers a type of regular path query for RDF graph databases. While eminently useful, such queries are difficult to optimize and evaluate efficiently, however. We design and implement a cost-based optimizer we call Waveguide for SPARQL queries with property paths. Waveguide builds a query planwhich we call a waveplan (WP)which guides the query evaluation. There are numerous choices in the con- struction of a plan, and a number of optimization methods, so the space of plans for a query can be quite large. Execution costs of plans for the same query can vary by orders of magnitude with the best plan often offering excellent performance. A WPs costs can be estimated, which opens the way to cost-based optimization. We demonstrate that Waveguide properly subsumes existing techniques and that the new plans it adds are relevant. We analyze the effective plan space which is enabled by Waveguide and design an efficient enumerator for it. We implement a pro- totype of a Waveguide cost-based optimizer on top of an open-source relational RDF store. Finally, we perform a comprehensive performance study of the state of the art for evaluation of SPARQL property paths and demonstrate the significant performance gains that Waveguide offers

    Transitive closure algorithm MEMTC and its performance analysis

    Get PDF
    AbstractIn this paper, we present a new algorithm for computing the full transitive closure designed for operation in layered memories. The algorithm is based on strongly connected component detection and on a very compact representation of data. We analyze the average-case performance of the algorithm experimentally in an environment where two layers of memory of different speed are used. In our analysis, we use trace-based simulation of memory operations

    Causal Fourier Analysis on Directed Acyclic Graphs and Posets

    Full text link
    We present a novel form of Fourier analysis, and associated signal processing concepts, for signals (or data) indexed by edge-weighted directed acyclic graphs (DAGs). This means that our Fourier basis yields an eigendecomposition of a suitable notion of shift and convolution operators that we define. DAGs are the common model to capture causal relationships between data values and in this case our proposed Fourier analysis relates data with its causes under a linearity assumption that we define. The definition of the Fourier transform requires the transitive closure of the weighted DAG for which several forms are possible depending on the interpretation of the edge weights. Examples include level of influence, distance, or pollution distribution. Our framework is different from prior GSP: it is specific to DAGs and leverages, and extends, the classical theory of Moebius inversion from combinatorics. For a prototypical application we consider DAGs modeling dynamic networks in which edges change over time. Specifically, we model the spread of an infection on such a DAG obtained from real-world contact tracing data and learn the infection signal from samples assuming sparsity in the Fourier domain.Comment: 13 pages, 11 figure

    RDF Querying

    Get PDF
    Reactive Web systems, Web services, and Web-based publish/ subscribe systems communicate events as XML messages, and in many cases require composite event detection: it is not sufficient to react to single event messages, but events have to be considered in relation to other events that are received over time. Emphasizing language design and formal semantics, we describe the rule-based query language XChangeEQ for detecting composite events. XChangeEQ is designed to completely cover and integrate the four complementary querying dimensions: event data, event composition, temporal relationships, and event accumulation. Semantics are provided as model and fixpoint theories; while this is an established approach for rule languages, it has not been applied for event queries before
    • ā€¦
    corecore