224 research outputs found

    A survey of parallel execution strategies for transitive closure and logic programs

    Get PDF
    An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods

    Higher-Order, Data-Parallel Structured Deduction

    Full text link
    State-of-the-art Datalog engines include expressive features such as ADTs (structured heap values), stratified aggregation and negation, various primitive operations, and the opportunity for further extension using FFIs. Current parallelization approaches for state-of-art Datalogs target shared-memory locking data-structures using conventional multi-threading, or use the map-reduce model for distributed computing. Furthermore, current state-of-art approaches cannot scale to formal systems which pervasively manipulate structured data due to their lack of indexing for structured data stored in the heap. In this paper, we describe a new approach to data-parallel structured deduction that involves a key semantic extension of Datalog to permit first-class facts and higher-order relations via defunctionalization, an implementation approach that enables parallelism uniformly both across sets of disjoint facts and over individual facts with nested structure. We detail a core language, DLsDL_s, whose key invariant (subfact closure) ensures that each subfact is materialized as a top-class fact. We extend DLsDL_s to Slog, a fully-featured language whose forms facilitate leveraging subfact closure to rapidly implement expressive, high-performance formal systems. We demonstrate Slog by building a family of control-flow analyses from abstract machines, systematically, along with several implementations of classical type systems (such as STLC and LF). We performed experiments on EC2, Azure, and ALCF's Theta at up to 1000 threads, showing orders-of-magnitude scalability improvements versus competing state-of-art systems

    Reasoning in description logics using resolution and deductive databases

    Get PDF

    Declarative Data Analytics: a Survey

    Full text link
    The area of declarative data analytics explores the application of the declarative paradigm on data science and machine learning. It proposes declarative languages for expressing data analysis tasks and develops systems which optimize programs written in those languages. The execution engine can be either centralized or distributed, as the declarative paradigm advocates independence from particular physical implementations. The survey explores a wide range of declarative data analysis frameworks by examining both the programming model and the optimization techniques used, in order to provide conclusions on the current state of the art in the area and identify open challenges.Comment: 36 pages, 2 figure

    Incremental Processing and Optimization of Update Streams

    Get PDF
    Over the recent years, we have seen an increasing number of applications in networking, sensor networks, cloud computing, and environmental monitoring, which monitor, plan, control, and make decisions over data streams from multiple sources. We are interested in extending traditional stream processing techniques to meet the new challenges of these applications. Generally, in order to support genuine continuous query optimization and processing over data streams, we need to systematically understand how to address incremental optimization and processing of update streams for a rich class of queries commonly used in the applications. Our general thesis is that efficient incremental processing and re-optimization of update streams can be achieved by various incremental view maintenance techniques if we cast the problems as incremental view maintenance problems over data streams. We focus on two incremental processing of update streams challenges currently not addressed in existing work on stream query processing: incremental processing of transitive closure queries over data streams, and incremental re-optimization of queries. In addition to addressing these specific challenges, we also develop a working prototype system Aspen, which serves as an end-to-end stream processing system that has been deployed as the foundation for a case study of our SmartCIS application. We validate our solutions both analytically and empirically on top of our prototype system Aspen, over a variety of benchmark workloads such as TPC-H and LinearRoad Benchmarks

    Parallelization of JStar Programs on a Distributed Computer

    Get PDF
    In the past, the performance of sequential programs grew exponentially as the performance of CPUs increased with Moore’s Law. Since 2005 however, performance improvements have come in the form of more parallel CPU cores. Writing parallel programs using existing programming languages can be difficult and error-prone. JStar is a new programming language that allows programs to be written in a naturally parallel way. The JStar project aims to produce compilers that can produce executables for a variety of architectures (such as many-core, GPUs and distributed computers). This thesis proposes a process for compiling these programs into distributed executables, and investigates various trade-offs and techniques for implementing JStar programs on a distributed computer. In this process, first a parallel design is created, then this design is expressed by a separate set of distribute statements that are combined with the original program to create a distributed program. The expressiveness and effectiveness of this approach is investigated for two case study JStar programs: (1) a prime number counting program (2) a version of Conway’s Game of Life. Various designs were hand-translated into a distributed Java programs and benchmarks were run to assess the performance of different designs. For each of these case studies, parallel designs were found that achieved high levels of speedup

    Computational and human-based methods for knowledge discovery over knowledge graphs

    Get PDF
    The modern world has evolved, accompanied by the huge exploitation of data and information. Daily, increasing volumes of data from various sources and formats are stored, resulting in a challenging strategy to manage and integrate them to discover new knowledge. The appropriate use of data in various sectors of society, such as education, healthcare, e-commerce, and industry, provides advantages for decision support in these areas. However, knowledge discovery becomes challenging since data may come from heterogeneous sources with important information hidden. Thus, new approaches that adapt to the new challenges of knowledge discovery in such heterogeneous data environments are required. The semantic web and knowledge graphs (KGs) are becoming increasingly relevant on the road to knowledge discovery. This thesis tackles the problem of knowledge discovery over KGs built from heterogeneous data sources. We provide a neuro-symbolic artificial intelligence system that integrates symbolic and sub-symbolic frameworks to exploit the semantics encoded in a KG and its structure. The symbolic system relies on existing approaches of deductive databases to make explicit, implicit knowledge encoded in a KG. The proposed deductive database DSDS can derive new statements to ego networks given an abstract target prediction. Thus, DSDS minimizes data sparsity in KGs. In addition, a sub-symbolic system relies on knowledge graph embedding (KGE) models. KGE models are commonly applied in the KG completion task to represent entities in a KG in a low-dimensional vector space. However, KGE models are known to suffer from data sparsity, and a symbolic system assists in overcoming this fact. The proposed approach discovers knowledge given a target prediction in a KG and extracts unknown implicit information related to the target prediction. As a proof of concept, we have implemented the neuro-symbolic system on top of a KG for lung cancer to predict polypharmacy treatment effectiveness. The symbolic system implements a deductive system to deduce pharmacokinetic drug-drug interactions encoded in a set of rules through the Datalog program. Additionally, the sub-symbolic system predicts treatment effectiveness using a KGE model, which preserves the KG structure. An ablation study on the components of our approach is conducted, considering state-of-the-art KGE methods. The observed results provide evidence for the benefits of the neuro-symbolic integration of our approach, where the neuro-symbolic system for an abstract target prediction exhibits improved results. The enhancement of the results occurs because the symbolic system increases the prediction capacity of the sub-symbolic system. Moreover, the proposed neuro-symbolic artificial intelligence system in Industry 4.0 (I4.0) is evaluated, demonstrating its effectiveness in determining relatedness among standards and analyzing their properties to detect unknown relations in the I4.0KG. The results achieved allow us to conclude that the proposed neuro-symbolic approach for an abstract target prediction improves the prediction capability of KGE models by minimizing data sparsity in KGs
    corecore