52 research outputs found

    Computer Aided Verification

    Get PDF
    This open access two-volume set LNCS 10980 and 10981 constitutes the refereed proceedings of the 30th International Conference on Computer Aided Verification, CAV 2018, held in Oxford, UK, in July 2018. The 52 full and 13 tool papers presented together with 3 invited papers and 2 tutorials were carefully reviewed and selected from 215 submissions. The papers cover a wide range of topics and techniques, from algorithmic and logical foundations of verification to practical applications in distributed, networked, cyber-physical, and autonomous systems. They are organized in topical sections on model checking, program analysis using polyhedra, synthesis, learning, runtime verification, hybrid and timed systems, tools, probabilistic systems, static analysis, theory and security, SAT, SMT and decisions procedures, concurrency, and CPS, hardware, industrial applications

    Query Answering in Probabilistic Data and Knowledge Bases

    Get PDF
    Probabilistic data and knowledge bases are becoming increasingly important in academia and industry. They are continuously extended with new data, powered by modern information extraction tools that associate probabilities with knowledge base facts. The state of the art to store and process such data is founded on probabilistic database systems, which are widely and successfully employed. Beyond all the success stories, however, such systems still lack the fundamental machinery to convey some of the valuable knowledge hidden in them to the end user, which limits their potential applications in practice. In particular, in their classical form, such systems are typically based on strong, unrealistic limitations, such as the closed-world assumption, the closed-domain assumption, the tuple-independence assumption, and the lack of commonsense knowledge. These limitations do not only lead to unwanted consequences, but also put such systems on weak footing in important tasks, querying answering being a very central one. In this thesis, we enhance probabilistic data and knowledge bases with more realistic data models, thereby allowing for better means for querying them. Building on the long endeavor of unifying logic and probability, we develop different rigorous semantics for probabilistic data and knowledge bases, analyze their computational properties and identify sources of (in)tractability and design practical scalable query answering algorithms whenever possible. To achieve this, the current work brings together some recent paradigms from logics, probabilistic inference, and database theory

    Is the most likely model likely to be the correct model?

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 89-93).In this work, I test the hypothesis that the 2-dimensional dependencies of a deterministic model can be correctly recovered via hypothesis-enumeration and Bayesian selection for a linear sequence, and what the degree of 'ignorance' or 'uncertainty' is that Bayesian selection can tolerate concerning the properties of the model and data. The experiment tests the data created by a number of rules of size 3 and compares the implied dependency map to the (correct) dependencies of the various generating rules, then extends it to a composition of 2 rules of total size 5. I found that 'causal' belief networks do not map directly to the dependencies of actual causal structures. For deterministic rules satisfying the condition of multiple involvement (two tails), the correct model is not likely to be retrieved without augmenting the model selection with a prior high enough to suggest that the desired dependency model is already known - simply restricting the class of models to trees, and placing other restrictions (such as ordering) is not sufficient. Second, the identified-model to correct-model map is not 1 to 1 - in the rule cases where the correct model is identified, the identified model could just as easily have been produced by a different rule. Third, I discovered that uncertainty concerning identification of observations directly resulted in the loss of existing information and made model selection the product of pure chance (such as the last observation). How to read and identify observations had to be agreed upon a-priori by both the rule and the learner to have any consistency in model identification.(cont.) Finally, I discovered that it is not the rule-observations that discriminate between models, but rather the noise, or uncaptured observations that govern the identified model. In analysis, I found that in enumeration of hypotheses (as dependency graphs) the differentiating space is very small. With representations of conditional independence, the equivalent factorizations of the graphs make the differentiating space even smaller. As Bayesian model identification relies on convergence to the differentiating space, if those spaces are diminishing in size (if the model size is allowed to grow) relative to the observation sequence, then maximizing the likelihood of a particular hypothesis may fail to converge on the correct one. Overall I found that if a learning mechanism either does not know how to read observations or know the dependencies he is looking for a-priori, then it is not likely to identify them probabilistically. Finally, I also confirmed existing results - that model selection always prefers increasingly connected models over independent models was confirmed, as was the knowledge that several conditional-independence graphs have equivalent factorizations. Finally Shannon's Asymptotic Equipartition Property was confirmed to apply both for novel observations and for an increasing model/parameter space size. These results are applicable to a number of domains: natural language processing and language induction by statistical means, bioinformatics and statistical identification and merging of ontologies, and induction of real-world causal dependencies.by Beracah Yankama.S.M

    Program analysis of probabilistic programs

    Get PDF
    Probabilistic programming is a growing area that strives to make statistical analysis more accessible, by separating probabilistic modelling from probabilistic inference. In practice this decoupling is difficult. The performance of inference methods is sensitive to both the underlying model and the observed data. Different inference techniques are applicable to different classes of models, have different advantages and shortcomings, and require different optimisation and diagnostics techniques to ensure robustness and reliability. No single inference algorithm can be used as a probabilistic programming back-end that is simultaneously reliable, efficient, black-box, and general. Probabilistic programming languages often choose a single algorithm to apply to a given problem, thus inheriting its limitations. While substantial work has been done both to formalise probabilistic programming and to improve efficiency of inference, there has been little work that makes use of the available program structure, by formally analysing it, to better utilise the underlying inference algorithm. My thesis is that it is possible to improve probabilistic programming using program analysis, and I present three novel techniques (both static and dynamic), which analyse a probabilistic program and adapt it to make inference more efficient, sometimes in a way that would have been tedious or impossible to do by hand. Part I of the thesis focuses on static analysis and gives the fi rst formal treatment of the popular probabilistic programming language Stan. While efficient, Stan constrains the space of programs expressible in the language. Programs must be written according to Stan's block syntax, which reduces compositionality. In addition, Stan does not support the explicit use of discrete parameters. Part I introduces the probabilistic programming language SlicStan: a compositional, self-optimising version of Stan, which supports both discrete and continuous parameters. SlicStan uses information ow analysis and type inference to capture conditional independence relationships in the program and transform it for inference in Stan. The result can be seen as a hybrid inference algorithm, where different parameters are inferred according to different inference algorithms for efficiency. Part II shows an example of dynamic analysis. The performance of inference algorithms can be dramatically affected by the parameterisation used to express a model. It is difficult to know in advance what parameterisation is suitable, as it depends on the properties of the observed data. This part demonstrates that reparameterisation can be automated by combining effect handlers in the probabilistic programming language Edward2, with variational inference preprocessing that searches over a space of possible parameterisations

    On the Discovery of Semantically Meaningful SQL Constraints from Armstrong Samples: Foundations, Implementation, and Evaluation

    No full text
    A database is said to be C-Armstrong for a finite set Σ of data dependencies in a class C if the database satisfies all data dependencies in Σ and violates all data dependencies in C that are not implied by Σ. Therefore, Armstrong databases are concise, user-friendly representations of abstract data dependencies that can be used to judge, justify, convey, and test the understanding of database design choices. Indeed, an Armstrong database satisfies exactly those data dependencies that are considered meaningful by the current design choice Σ. Structural and computational properties of Armstrong databases have been deeply investigated in Codd’s Turing Award winning relational model of data. Armstrong databases have been incorporated in approaches towards relational database design. They have also been found useful for the elicitation of requirements, the semantic sampling of existing databases, and the specification of schema mappings. This research establishes a toolbox of Armstrong databases for SQL data. This is challenging as SQL data can contain null marker occurrences in columns declared NULL, and may contain duplicate rows. Thus, the existing theory of Armstrong databases only applies to idealized instances of SQL data, that is, instances without null marker occurrences and without duplicate rows. For the thesis, two popular interpretations of null markers are considered: the no information interpretation used in SQL, and the exists but unknown interpretation by Codd. Furthermore, the study is limited to the popular class C of functional dependencies. However, the presence of duplicate rows means that the class of uniqueness constraints is no longer subsumed by the class of functional dependencies, in contrast to the relational model of data. As a first contribution a provably-correct algorithm is developed that computes Armstrong databases for an arbitrarily given finite set of uniqueness constraints and functional dependencies. This contribution is based on axiomatic, algorithmic and logical characterizations of the associated implication problem that are also established in this thesis. While the problem to decide whether a given database is Armstrong for a given set of such constraints is precisely exponential, our algorithm computes an Armstrong database with a number of rows that is at most quadratic in the number of rows of a minimum-sized Armstrong database. As a second contribution the algorithms are implemented in the form of a design tool. Users of the tool can therefore inspect Armstrong databases to analyze their current design choice Σ. Intuitively, Armstrong databases are useful for the acquisition of semantically meaningful constraints, if the users can recognize the actual meaningfulness of constraints that they incorrectly perceived as meaningless before the inspection of an Armstrong database. As a final contribution, measures are introduced that formalize the term “useful” and it is shown by some detailed experiments that Armstrong tables, as computed by the tool, are indeed useful. In summary, this research establishes a toolbox of Armstrong databases that can be applied by database designers to concisely visualize constraints on SQL data. Such support can lead to database designs that guarantee efficient data management in practice
    corecore