5 research outputs found
Defining and Mining Functional Dependencies in Probabilistic Databases
Functional dependencies -- traditional, approximate and conditional are of
critical importance in relational databases, as they inform us about the
relationships between attributes. They are useful in schema normalization, data
rectification and source selection. Most of these were however developed in the
context of deterministic data. Although uncertain databases have started
receiving attention, these dependencies have not been defined for them, nor are
fast algorithms available to evaluate their confidences. This paper defines the
logical extensions of various forms of functional dependencies for
probabilistic databases and explores the connections between them. We propose a
pruning-based exact algorithm to evaluate the confidence of functional
dependencies, a Monte-Carlo based algorithm to evaluate the confidence of
approximate functional dependencies and algorithms for their conditional
counterparts in probabilistic databases. Experiments are performed on both
synthetic and real data evaluating the performance of these algorithms in
assessing the confidence of dependencies and mining them from data. We believe
that having these dependencies and algorithms available for probabilistic
databases will drive adoption of probabilistic data storage in the industry.Comment: 9 pages, 10 figure
-DB: Managing scientific hypotheses as uncertain data
In view of the paradigm shift that makes science ever more data-driven, we
consider deterministic scientific hypotheses as uncertain data. This vision
comprises a probabilistic database (p-DB) design methodology for the systematic
construction and management of U-relational hypothesis DBs, viz.,
-DBs. It introduces hypothesis management as a promising new class of
applications for p-DBs. We illustrate the potential of -DB as a tool
for deep predictive analytics.Comment: To appear in PVLDB 201
Managing large-scale scientific hypotheses as uncertain and probabilistic data
In view of the paradigm shift that makes science ever more data-driven, in
this thesis we propose a synthesis method for encoding and managing large-scale
deterministic scientific hypotheses as uncertain and probabilistic data.
In the form of mathematical equations, hypotheses symmetrically relate
aspects of the studied phenomena. For computing predictions, however,
deterministic hypotheses can be abstracted as functions. We build upon Simon's
notion of structural equations in order to efficiently extract the (so-called)
causal ordering between variables, implicit in a hypothesis structure (set of
mathematical equations).
We show how to process the hypothesis predictive structure effectively
through original algorithms for encoding it into a set of functional
dependencies (fd's) and then performing causal reasoning in terms of acyclic
pseudo-transitive reasoning over fd's. Such reasoning reveals important causal
dependencies implicit in the hypothesis predictive data and guide our synthesis
of a probabilistic database. Like in the field of graphical models in AI, such
a probabilistic database should be normalized so that the uncertainty arisen
from competing hypotheses is decomposed into factors and propagated properly
onto predictive data by recovering its joint probability distribution through a
lossless join. That is motivated as a design-theoretic principle for
data-driven hypothesis management and predictive analytics.
The method is applicable to both quantitative and qualitative deterministic
hypotheses and demonstrated in realistic use cases from computational science.Comment: 145 pages, 61 figures, 1 table. PhD thesis, National Laboratory for
Scientific Computing (LNCC), Brazil, February 201
The Fourth International VLDB Workshop on Management of Uncertain Data
This is the fourth edition of the international VLDB workshop on Management of Uncertain Data. Previous editions of this workshop took place in New Zealand, Austria and France. Research on uncertain data has grown over the past few years. Besides workshops on the topic of uncertain data, also sessions at large conferences, such as VLDB, on the same topic are organized. This edition, we have ten research talks, in four sessions, addressing different topics in uncertain data. In addition, we start the workshop with an invited talk by Peter Haas from IBM Research, entitled From MUD to MIRE: Managing Inherent Risk in the Enterprise. We would like to thank the reviewers for their time and effort. We would also like to thank the Centre Telematics and Information Technology for sponsoring the proceedings of the workshop. Ander de Keijzer Maurice van Keule
Schema Design for Uncertain Databases
We address schema design in uncertain databases. Since uncertain data is relational in nature, decomposition becomes a key issue in design. Decomposition relies on dependency theory, and primarily on functional dependencies. We study the theory of functional dependencies (FDs) for uncertain relations. We define several kinds of horizontal FDs and vertical FDs, each of which is consistent with conventional FDs when an uncertain relation doesn’t contain any uncertainty. In addition to standard forms of decompositions allowed by ordinary relations, our FDs allow more complex decompositions specific to uncertain data. We show how our theory of FDs can be used for lossless decomposition of uncertain relations. We then present algorithms and complexity results for three fundamental problems with respect to FDs over ordinary and uncertain relations: (1) Testing whether a relation instance satisfies an FD; (2) Finding all FDs satisfied by a relation instance; and (3) Inferring all FDs that hold in the result of a query over uncertain relations with FDs. We also give a sound and complete axiomatization of horizontal and vertical FDs. We look at keys as a special case of FDs. Finally, we briefly consider uncertain data that contains confidence values