1,842 research outputs found
Towards Intelligent Databases
This article is a presentation of the objectives and techniques
of deductive databases. The deductive approach to databases aims at extending
with intensional definitions other database paradigms that describe
applications extensionaUy. We first show how constructive specifications can
be expressed with deduction rules, and how normative conditions can be defined
using integrity constraints. We outline the principles of bottom-up and
top-down query answering procedures and present the techniques used for
integrity checking. We then argue that it is often desirable to manage with
a database system not only database applications, but also specifications of
system components. We present such meta-level specifications and discuss
their advantages over conventional approaches
Time-Aware Probabilistic Knowledge Graphs
The emergence of open information extraction as a tool for constructing and expanding knowledge graphs has aided the growth of temporal data, for instance, YAGO, NELL and Wikidata. While YAGO and Wikidata maintain the valid time of facts, NELL records the time point at which a fact is retrieved from some Web corpora. Collectively, these knowledge graphs (KG) store facts extracted from Wikipedia and other sources. Due to the imprecise nature of the extraction tools that are used to build and expand KG, such as NELL, the facts in the KG are weighted (a confidence value representing the correctness of a fact). Additionally, NELL can be considered as a transaction time KG because every fact is associated with extraction date. On the other hand, YAGO and Wikidata use the valid time model because they maintain facts together with their validity time (temporal scope). In this paper, we propose a bitemporal model (that combines transaction and valid time models) for maintaining and querying bitemporal probabilistic knowledge graphs. We study coalescing and scalability of marginal and MAP inference. Moreover, we show that complexity of reasoning tasks in atemporal probabilistic KG carry over to the bitemporal setting. Finally, we report our evaluation results of the proposed model
Prioritized Repairing and Consistent Query Answering in Relational Databases
A consistent query answer in an inconsistent database is an answer obtained
in every (minimal) repair. The repairs are obtained by resolving all conflicts
in all possible ways. Often, however, the user is able to provide a preference
on how conflicts should be resolved. We investigate here the framework of
preferred consistent query answers, in which user preferences are used to
narrow down the set of repairs to a set of preferred repairs. We axiomatize
desirable properties of preferred repairs. We present three different families
of preferred repairs and study their mutual relationships. Finally, we
investigate the complexity of preferred repairing and computing preferred
consistent query answers.Comment: Accepted to the special SUM'08 issue of AMA
An epistemic approach to model uncertainty in data-graphs
Graph databases are becoming widely successful as data models that allow to
effectively represent and process complex relationships among various types of
data. As with any other type of data repository, graph databases may suffer
from errors and discrepancies with respect to the real-world data they intend
to represent. In this work we explore the notion of probabilistic unclean graph
databases, previously proposed for relational databases, in order to capture
the idea that the observed (unclean) graph database is actually the noisy
version of a clean one that correctly models the world but that we know
partially. As the factors that may be involved in the observation can be many,
e.g, all different types of clerical errors or unintended transformations of
the data, we assume a probabilistic model that describes the distribution over
all possible ways in which the clean (uncertain) database could have been
polluted. Based on this model we define two computational problems: data
cleaning and probabilistic query answering and study for both of them their
corresponding complexity when considering that the transformation of the
database can be caused by either removing (subset) or adding (superset) nodes
and edges.Comment: 25 pages, 3 figure
A Formal Framework for Probabilistic Unclean Databases
Most theoretical frameworks that focus on data errors and inconsistencies follow logic-based reasoning. Yet, practical data cleaning tools need to incorporate statistical reasoning to be effective in real-world data cleaning tasks. Motivated by empirical successes, we propose a formal framework for unclean databases, where two types of statistical knowledge are incorporated: The first represents a belief of how intended (clean) data is generated, and the second represents a belief of how noise is introduced in the actual observed database. To capture this noisy channel model, we introduce the concept of a Probabilistic Unclean Database (PUD), a triple that consists of a probabilistic database that we call the intention, a probabilistic data transformator that we call the realization and captures how noise is introduced, and an observed unclean database that we call the observation. We define three computational problems in the PUD framework: cleaning (infer the most probable intended database, given a PUD), probabilistic query answering (compute the probability of an answer tuple over the unclean observed database), and learning (estimate the most likely intention and realization models of a PUD, given examples as training data). We illustrate the PUD framework on concrete representations of the intention and realization, show that they generalize traditional concepts of repairs such as cardinality and value repairs, draw connections to consistent query answering, and prove tractability results. We further show that parameters can be learned in some practical instantiations, and in fact, prove that under certain conditions we can learn a PUD directly from a single dirty database without any need for clean examples
- …