126 research outputs found

    A survey of parallel execution strategies for transitive closure and logic programs

    Get PDF
    An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods

    Query Rewriting and Optimization for Ontological Databases

    Full text link
    Ontological queries are evaluated against a knowledge base consisting of an extensional database and an ontology (i.e., a set of logical assertions and constraints which derive new intensional knowledge from the extensional database), rather than directly on the extensional database. The evaluation and optimization of such queries is an intriguing new problem for database research. In this paper, we discuss two important aspects of this problem: query rewriting and query optimization. Query rewriting consists of the compilation of an ontological query into an equivalent first-order query against the underlying extensional database. We present a novel query rewriting algorithm for rather general types of ontological constraints which is well-suited for practical implementations. In particular, we show how a conjunctive query against a knowledge base, expressed using linear and sticky existential rules, that is, members of the recently introduced Datalog+/- family of ontology languages, can be compiled into a union of conjunctive queries (UCQ) against the underlying database. Ontological query optimization, in this context, attempts to improve this rewriting process so to produce possibly small and cost-effective UCQ rewritings for an input query.Comment: arXiv admin note: text overlap with arXiv:1312.5914 by other author

    A Parallel semantics for normal logic programs plus time

    Get PDF
    It is proposed that Normal Logic Programs with an explicit time ordering are a suitable basis for a general purpose parallel programming language. Examples show that such a language can accept real-time external inputs and outputs, and mimic assignment, all without departing from its pure logical semantics. This paper describes a fully incremental bottom-up interpreter that supports a wide range of parallel execution strategies and can extract significant potential parallelism from programs with complex dependencies

    Schema Independent Relational Learning

    Full text link
    Learning novel concepts and relations from relational databases is an important problem with many applications in database systems and machine learning. Relational learning algorithms learn the definition of a new relation in terms of existing relations in the database. Nevertheless, the same data set may be represented under different schemas for various reasons, such as efficiency, data quality, and usability. Unfortunately, the output of current relational learning algorithms tends to vary quite substantially over the choice of schema, both in terms of learning accuracy and efficiency. This variation complicates their off-the-shelf application. In this paper, we introduce and formalize the property of schema independence of relational learning algorithms, and study both the theoretical and empirical dependence of existing algorithms on the common class of (de) composition schema transformations. We study both sample-based learning algorithms, which learn from sets of labeled examples, and query-based algorithms, which learn by asking queries to an oracle. We prove that current relational learning algorithms are generally not schema independent. For query-based learning algorithms we show that the (de) composition transformations influence their query complexity. We propose Castor, a sample-based relational learning algorithm that achieves schema independence by leveraging data dependencies. We support the theoretical results with an empirical study that demonstrates the schema dependence/independence of several algorithms on existing benchmark and real-world datasets under (de) compositions

    Programming with Purity Reflection: Peaceful Coexistence of Effects, Laziness, and Parallelism

    Get PDF
    We present purity reflection, a programming language feature that enables higher-order functions to inspect the purity of their function arguments and to vary their behavior based on this information. The upshot is that operations on data structures can selectively use lazy and/or parallel evaluation while ensuring that side effects are never lost or re-ordered. The technique builds on a recent Hindley-Milner style type and effect system based on Boolean unification which supports both effect polymorphism and complete type inference. We illustrate that avoiding the so-called \u27poisoning problem\u27 is crucial to support purity reflection. We propose several new data structures that use purity reflection to switch between eager and lazy, sequential and parallel evaluation. We propose a DelayList, which is maximally lazy but switches to eager evaluation for impure operations. We also propose a DelayMap which is maximally lazy in its values, but also exploits eager and parallel evaluation. We implement purity reflection as an extension of the Flix programming language. We present a new effect-aware form of monomorphization that eliminates purity reflection at compile-time. And finally, we evaluate the cost of this new monomorphization on compilation time and on code size, and determine that it is minimal

    Datalog as a parallel general purpose programming language

    Get PDF
    The increasing available parallelism of computers demands new programming languages that make parallel programming dramatically easier and less error prone. It is proposed that datalog with negation and timestamps is a suitable basis for a general purpose programming language for sequential, parallel and distributed computers. This paper develops a fully incremental bottom-up interpreter for datalog that supports a wide range of execution strategies, with trade-offs affecting efficiency, parallelism and control of resource usage. Examples show how the language can accept real-time external inputs and outputs, and mimic assignment, all without departing from its pure logical semantics

    Logic Programming as Constructivism

    Get PDF
    The features of logic programming that seem unconventional from the viewpoint of classical logic can be explained in terms of constructivistic logic. We motivate and propose a constructivistic proof theory of non-Horn logic programming. Then, we apply this formalization for establishing results of practical interest. First, we show that 'stratification can be motivated in a simple and intuitive way. Relying on similar motivations, we introduce the larger classes of 'loosely stratified' and 'constructively consistent' programs. Second, we give a formal basis for introducing quantifiers into queries and logic programs by defining 'constructively domain independent* formulas. Third, we extend the Generalized Magic Sets procedure to loosely stratified and constructively consistent programs, by relying on a 'conditional fixpoini procedure

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    Spinning Fast Iterative Data Flows

    Full text link
    Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature of many analysis and machine learning algorithms, however, is still a challenge for current systems. While certain types of bulk iterative algorithms are supported by novel dataflow frameworks, these systems cannot exploit computational dependencies present in many algorithms, such as graph algorithms. As a result, these algorithms are inefficiently executed and have led to specialized systems based on other paradigms, such as message passing or shared memory. We propose a method to integrate incremental iterations, a form of workset iterations, with parallel dataflows. After showing how to integrate bulk iterations into a dataflow system and its optimizer, we present an extension to the programming model for incremental iterations. The extension alleviates for the lack of mutable state in dataflows and allows for exploiting the sparse computational dependencies inherent in many iterative algorithms. The evaluation of a prototypical implementation shows that those aspects lead to up to two orders of magnitude speedup in algorithm runtime, when exploited. In our experiments, the improved dataflow system is highly competitive with specialized systems while maintaining a transparent and unified dataflow abstraction.Comment: VLDB201
    corecore