34,765 research outputs found
On derived dependencies and connected databases
AbstractThis paper introduces a new class of deductive databases (connected databases) for which SLDNF-resolution never flounders and always computes ground answers. The class of connected databases properly includes that of allowed databases. Moreover the definition of connected databases enables evaluable predicates to be included in a uniform way. An algorithm is described which, for each predicate defined in a normal database, derives a propositional formula (groundness formula) describing dependencies between the arguments of that predicate. Groundness formulae are used to determine whether a database is connected. They are also used to identify goals for which SLDNF-resolution will never flounder and will always compute ground answers on a connected database
Probabilistic Relational Model Benchmark Generation
The validation of any database mining methodology goes through an evaluation
process where benchmarks availability is essential. In this paper, we aim to
randomly generate relational database benchmarks that allow to check
probabilistic dependencies among the attributes. We are particularly interested
in Probabilistic Relational Models (PRMs), which extend Bayesian Networks (BNs)
to a relational data mining context and enable effective and robust reasoning
over relational data. Even though a panoply of works have focused, separately ,
on the generation of random Bayesian networks and relational databases, no work
has been identified for PRMs on that track. This paper provides an algorithmic
approach for generating random PRMs from scratch to fill this gap. The proposed
method allows to generate PRMs as well as synthetic relational data from a
randomly generated relational schema and a random set of probabilistic
dependencies. This can be of interest not only for machine learning researchers
to evaluate their proposals in a common framework, but also for databases
designers to evaluate the effectiveness of the components of a database
management system
Securing Databases from Probabilistic Inference
Databases can leak confidential information when users combine query results
with probabilistic data dependencies and prior knowledge. Current research
offers mechanisms that either handle a limited class of dependencies or lack
tractable enforcement algorithms. We propose a foundation for Database
Inference Control based on ProbLog, a probabilistic logic programming language.
We leverage this foundation to develop Angerona, a provably secure enforcement
mechanism that prevents information leakage in the presence of probabilistic
dependencies. We then provide a tractable inference algorithm for a practically
relevant fragment of ProbLog. We empirically evaluate Angerona's performance
showing that it scales to relevant security-critical problems.Comment: A short version of this paper has been accepted at the 30th IEEE
Computer Security Foundations Symposium (CSF 2017
Indeterministic Handling of Uncertain Decisions in Duplicate Detection
In current research, duplicate detection is usually considered as a deterministic approach in which tuples are either declared as duplicates or not. However, most often it is not completely clear whether two tuples represent the same real-world entity or not. In deterministic approaches, however, this uncertainty is ignored, which in turn can lead to false decisions. In this paper, we present an indeterministic approach for handling uncertain decisions in a duplicate detection process by using a probabilistic target schema. Thus, instead of deciding between multiple possible worlds, all these worlds can be modeled in the resulting data. This approach minimizes the negative impacts of false decisions. Furthermore, the duplicate detection process becomes almost fully automatic and human effort can be reduced to a large extent. Unfortunately, a full-indeterministic approach is by definition too expensive (in time as well as in storage) and hence impractical. For that reason, we additionally introduce several semi-indeterministic methods for heuristically reducing the set of indeterministic handled decisions in a meaningful way
Structuring the process of integrity maintenance (extended version)
Two different approaches have been traditionally considered for dealing with the process of integrity constraints
enforcement: integrity checking and integrity maintenance. However, while previous research in the first approach has
mainly addressed efficiency issues, research in the second approach has been mainly concentrated in being able to
generate all possible repairs that falsify an integrity constraint violation. In this paper we address efficiency issues during
the process of integrity maintenance. In this sense, we propose a technique which improves efficiency of existing methods
by defining the order in which maintenance of integrity constraints should be performed. Moreover, we use also this
technique for being able to handle in an integrated way the integrity constraintsPostprint (published version
- …