40 research outputs found

    Spinning Fast Iterative Data Flows

    Full text link
    Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature of many analysis and machine learning algorithms, however, is still a challenge for current systems. While certain types of bulk iterative algorithms are supported by novel dataflow frameworks, these systems cannot exploit computational dependencies present in many algorithms, such as graph algorithms. As a result, these algorithms are inefficiently executed and have led to specialized systems based on other paradigms, such as message passing or shared memory. We propose a method to integrate incremental iterations, a form of workset iterations, with parallel dataflows. After showing how to integrate bulk iterations into a dataflow system and its optimizer, we present an extension to the programming model for incremental iterations. The extension alleviates for the lack of mutable state in dataflows and allows for exploiting the sparse computational dependencies inherent in many iterative algorithms. The evaluation of a prototypical implementation shows that those aspects lead to up to two orders of magnitude speedup in algorithm runtime, when exploited. In our experiments, the improved dataflow system is highly competitive with specialized systems while maintaining a transparent and unified dataflow abstraction.Comment: VLDB201

    FICCS; A Fact Integrity Constraint Checking System

    Get PDF

    Declarative operations on nets

    Get PDF
    To increase the expressiveness of knowledge representations, the graph-theoretical basis of semantic networks is reconsidered. Directed labeled graphs are generalized to directed recursive labelnode hypergraphs, which permit a most natural representation of multi-level structures and n-ary relationships. This net formalism is embedded into the relational/functional programming language RELFUN. Operations on (generalized) graphs are specified in a declarative fashion to enhance readability and maintainability. For this, nets are represented as nested RELFUN terms kept in a normal form by rules associated directly with their constructors. These rules rely on equational axioms postulated in the formal definition of the generalized graphs as a constructor algebra. Certain kinds of sharing in net diagrams are mirrored by binding common subterms to logical variables. A package of declarative transformations on net terms is developed. It includes generalized set operations, structure-reducing operations, and extended path searching. The generation of parts lists is given as an application in mechanical engineering. Finally, imperative net storage and retrieval operations are discussed

    Natively probabilistic computation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2009.Includes bibliographical references (leaves 129-135).I introduce a new set of natively probabilistic computing abstractions, including probabilistic generalizations of Boolean circuits, backtracking search and pure Lisp. I show how these tools let one compactly specify probabilistic generative models, generalize and parallelize widely used sampling algorithms like rejection sampling and Markov chain Monte Carlo, and solve difficult Bayesian inference problems. I first introduce Church, a probabilistic programming language for describing probabilistic generative processes that induce distributions, which generalizes Lisp, a language for describing deterministic procedures that induce functions. I highlight the ways randomness meshes with the reflectiveness of Lisp to support the representation of structured, uncertain knowledge, including nonparametric Bayesian models from the current literature, programs for decision making under uncertainty, and programs that learn very simple programs from data. I then introduce systematic stochastic search, a recursive algorithm for exact and approximate sampling that generalizes a popular form of backtracking search to the broader setting of stochastic simulation and recovers widely used particle filters as a special case. I use it to solve probabilistic reasoning problems from statistical physics, causal reasoning and stereo vision. Finally, I introduce stochastic digital circuits that model the probability algebra just as traditional Boolean circuits model the Boolean algebra.(cont.) I show how these circuits can be used to build massively parallel, fault-tolerant machines for sampling and allow one to efficiently run Markov chain Monte Carlo methods on models with hundreds of thousands of variables in real time. I emphasize the ways in which these ideas fit together into a coherent software and hardware stack for natively probabilistic computing, organized around distributions and samplers rather than deterministic functions. I argue that by building uncertainty and randomness into the foundations of our programming languages and computing machines, we may arrive at ones that are more powerful, flexible and efficient than deterministic designs, and are in better alignment with the needs of computational science, statistics and artificial intelligence.by Vikash Kumar Mansinghka.Ph.D

    Workshop on Database Programming Languages

    Get PDF
    These are the revised proceedings of the Workshop on Database Programming Languages held at Roscoff, Finistère, France in September of 1987. The last few years have seen an enormous activity in the development of new programming languages and new programming environments for databases. The purpose of the workshop was to bring together researchers from both databases and programming languages to discuss recent developments in the two areas in the hope of overcoming some of the obstacles that appear to prevent the construction of a uniform database programming environment. The workshop, which follows a previous workshop held in Appin, Scotland in 1985, was extremely successful. The organizers were delighted with both the quality and volume of the submissions for this meeting, and it was regrettable that more papers could not be accepted. Both the stimulating discussions and the excellent food and scenery of the Brittany coast made the meeting thoroughly enjoyable. There were three main foci for this workshop: the type systems suitable for databases (especially object-oriented and complex-object databases,) the representation and manipulation of persistent structures, and extensions to deductive databases that allow for more general and flexible programming. Many of the papers describe recent results, or work in progress, and are indicative of the latest research trends in database programming languages. The organizers are extremely grateful for the financial support given by CRAI (Italy), Altaïr (France) and AT&T (USA). We would also like to acknowledge the organizational help provided by Florence Deshors, Hélène Gans and Pauline Turcaud of Altaïr, and by Karen Carter of the University of Pennsylvania

    Context Exploitation in Data Fusion

    Get PDF
    Complex and dynamic environments constitute a challenge for existing tracking algorithms. For this reason, modern solutions are trying to utilize any available information which could help to constrain, improve or explain the measurements. So called Context Information (CI) is understood as information that surrounds an element of interest, whose knowledge may help understanding the (estimated) situation and also in reacting to that situation. However, context discovery and exploitation are still largely unexplored research topics. Until now, the context has been extensively exploited as a parameter in system and measurement models which led to the development of numerous approaches for the linear or non-linear constrained estimation and target tracking. More specifically, the spatial or static context is the most common source of the ambient information, i.e. features, utilized for recursive enhancement of the state variables either in the prediction or the measurement update of the filters. In the case of multiple model estimators, context can not only be related to the state but also to a certain mode of the filter. Common practice for multiple model scenarios is to represent states and context as a joint distribution of Gaussian mixtures. These approaches are commonly referred as the join tracking and classification. Alternatively, the usefulness of context was also demonstrated in aiding the measurement data association. Process of formulating a hypothesis, which assigns a particular measurement to the track, is traditionally governed by the empirical knowledge of the noise characteristics of sensors and operating environment, i.e. probability of detection, false alarm, clutter noise, which can be further enhanced by conditioning on context. We believe that interactions between the environment and the object could be classified into actions, activities and intents, and formed into structured graphs with contextual links translated into arcs. By learning the environment model we will be able to make prediction on the target\u2019s future actions based on its past observation. Probability of target future action could be utilized in the fusion process to adjust tracker confidence on measurements. By incorporating contextual knowledge of the environment, in the form of a likelihood function, in the filter measurement update step, we have been able to reduce uncertainties of the tracking solution and improve the consistency of the track. The promising results demonstrate that the fusion of CI brings a significant performance improvement in comparison to the regular tracking approaches

    Programming Languages and Systems

    Get PDF
    This open access book constitutes the proceedings of the 30th European Symposium on Programming, ESOP 2021, which was held during March 27 until April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The 24 papers included in this volume were carefully reviewed and selected from 79 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems

    Query Answering in Probabilistic Data and Knowledge Bases

    Get PDF
    Probabilistic data and knowledge bases are becoming increasingly important in academia and industry. They are continuously extended with new data, powered by modern information extraction tools that associate probabilities with knowledge base facts. The state of the art to store and process such data is founded on probabilistic database systems, which are widely and successfully employed. Beyond all the success stories, however, such systems still lack the fundamental machinery to convey some of the valuable knowledge hidden in them to the end user, which limits their potential applications in practice. In particular, in their classical form, such systems are typically based on strong, unrealistic limitations, such as the closed-world assumption, the closed-domain assumption, the tuple-independence assumption, and the lack of commonsense knowledge. These limitations do not only lead to unwanted consequences, but also put such systems on weak footing in important tasks, querying answering being a very central one. In this thesis, we enhance probabilistic data and knowledge bases with more realistic data models, thereby allowing for better means for querying them. Building on the long endeavor of unifying logic and probability, we develop different rigorous semantics for probabilistic data and knowledge bases, analyze their computational properties and identify sources of (in)tractability and design practical scalable query answering algorithms whenever possible. To achieve this, the current work brings together some recent paradigms from logics, probabilistic inference, and database theory

    Studies related to the process of program development

    Get PDF
    The submitted work consists of a collection of publications arising from research carried out at Rhodes University (1970-1980) and at Heriot-Watt University (1980-1992). The theme of this research is the process of program development, i.e. the process of creating a computer program to solve some particular problem. The papers presented cover a number of different topics which relate to this process, viz. (a) Programming methodology programming. (b) Properties of programming languages. aspects of structured. (c) Formal specification of programming languages. (d) Compiler techniques. (e) Declarative programming languages. (f) Program development aids. (g) Automatic program generation. (h) Databases. (i) Algorithms and applications
    corecore