34,765 research outputs found

    On derived dependencies and connected databases

    Get PDF
    AbstractThis paper introduces a new class of deductive databases (connected databases) for which SLDNF-resolution never flounders and always computes ground answers. The class of connected databases properly includes that of allowed databases. Moreover the definition of connected databases enables evaluable predicates to be included in a uniform way. An algorithm is described which, for each predicate defined in a normal database, derives a propositional formula (groundness formula) describing dependencies between the arguments of that predicate. Groundness formulae are used to determine whether a database is connected. They are also used to identify goals for which SLDNF-resolution will never flounder and will always compute ground answers on a connected database

    Probabilistic Relational Model Benchmark Generation

    Get PDF
    The validation of any database mining methodology goes through an evaluation process where benchmarks availability is essential. In this paper, we aim to randomly generate relational database benchmarks that allow to check probabilistic dependencies among the attributes. We are particularly interested in Probabilistic Relational Models (PRMs), which extend Bayesian Networks (BNs) to a relational data mining context and enable effective and robust reasoning over relational data. Even though a panoply of works have focused, separately , on the generation of random Bayesian networks and relational databases, no work has been identified for PRMs on that track. This paper provides an algorithmic approach for generating random PRMs from scratch to fill this gap. The proposed method allows to generate PRMs as well as synthetic relational data from a randomly generated relational schema and a random set of probabilistic dependencies. This can be of interest not only for machine learning researchers to evaluate their proposals in a common framework, but also for databases designers to evaluate the effectiveness of the components of a database management system

    Securing Databases from Probabilistic Inference

    Full text link
    Databases can leak confidential information when users combine query results with probabilistic data dependencies and prior knowledge. Current research offers mechanisms that either handle a limited class of dependencies or lack tractable enforcement algorithms. We propose a foundation for Database Inference Control based on ProbLog, a probabilistic logic programming language. We leverage this foundation to develop Angerona, a provably secure enforcement mechanism that prevents information leakage in the presence of probabilistic dependencies. We then provide a tractable inference algorithm for a practically relevant fragment of ProbLog. We empirically evaluate Angerona's performance showing that it scales to relevant security-critical problems.Comment: A short version of this paper has been accepted at the 30th IEEE Computer Security Foundations Symposium (CSF 2017

    Indeterministic Handling of Uncertain Decisions in Duplicate Detection

    Get PDF
    In current research, duplicate detection is usually considered as a deterministic approach in which tuples are either declared as duplicates or not. However, most often it is not completely clear whether two tuples represent the same real-world entity or not. In deterministic approaches, however, this uncertainty is ignored, which in turn can lead to false decisions. In this paper, we present an indeterministic approach for handling uncertain decisions in a duplicate detection process by using a probabilistic target schema. Thus, instead of deciding between multiple possible worlds, all these worlds can be modeled in the resulting data. This approach minimizes the negative impacts of false decisions. Furthermore, the duplicate detection process becomes almost fully automatic and human effort can be reduced to a large extent. Unfortunately, a full-indeterministic approach is by definition too expensive (in time as well as in storage) and hence impractical. For that reason, we additionally introduce several semi-indeterministic methods for heuristically reducing the set of indeterministic handled decisions in a meaningful way

    Structuring the process of integrity maintenance (extended version)

    Get PDF
    Two different approaches have been traditionally considered for dealing with the process of integrity constraints enforcement: integrity checking and integrity maintenance. However, while previous research in the first approach has mainly addressed efficiency issues, research in the second approach has been mainly concentrated in being able to generate all possible repairs that falsify an integrity constraint violation. In this paper we address efficiency issues during the process of integrity maintenance. In this sense, we propose a technique which improves efficiency of existing methods by defining the order in which maintenance of integrity constraints should be performed. Moreover, we use also this technique for being able to handle in an integrated way the integrity constraintsPostprint (published version
    corecore