5 research outputs found

    Defining and Mining Functional Dependencies in Probabilistic Databases

    Full text link
    Functional dependencies -- traditional, approximate and conditional are of critical importance in relational databases, as they inform us about the relationships between attributes. They are useful in schema normalization, data rectification and source selection. Most of these were however developed in the context of deterministic data. Although uncertain databases have started receiving attention, these dependencies have not been defined for them, nor are fast algorithms available to evaluate their confidences. This paper defines the logical extensions of various forms of functional dependencies for probabilistic databases and explores the connections between them. We propose a pruning-based exact algorithm to evaluate the confidence of functional dependencies, a Monte-Carlo based algorithm to evaluate the confidence of approximate functional dependencies and algorithms for their conditional counterparts in probabilistic databases. Experiments are performed on both synthetic and real data evaluating the performance of these algorithms in assessing the confidence of dependencies and mining them from data. We believe that having these dependencies and algorithms available for probabilistic databases will drive adoption of probabilistic data storage in the industry.Comment: 9 pages, 10 figure

    Υ\Upsilon-DB: Managing scientific hypotheses as uncertain data

    Full text link
    In view of the paradigm shift that makes science ever more data-driven, we consider deterministic scientific hypotheses as uncertain data. This vision comprises a probabilistic database (p-DB) design methodology for the systematic construction and management of U-relational hypothesis DBs, viz., Υ\Upsilon-DBs. It introduces hypothesis management as a promising new class of applications for p-DBs. We illustrate the potential of Υ\Upsilon-DB as a tool for deep predictive analytics.Comment: To appear in PVLDB 201

    Managing large-scale scientific hypotheses as uncertain and probabilistic data

    Full text link
    In view of the paradigm shift that makes science ever more data-driven, in this thesis we propose a synthesis method for encoding and managing large-scale deterministic scientific hypotheses as uncertain and probabilistic data. In the form of mathematical equations, hypotheses symmetrically relate aspects of the studied phenomena. For computing predictions, however, deterministic hypotheses can be abstracted as functions. We build upon Simon's notion of structural equations in order to efficiently extract the (so-called) causal ordering between variables, implicit in a hypothesis structure (set of mathematical equations). We show how to process the hypothesis predictive structure effectively through original algorithms for encoding it into a set of functional dependencies (fd's) and then performing causal reasoning in terms of acyclic pseudo-transitive reasoning over fd's. Such reasoning reveals important causal dependencies implicit in the hypothesis predictive data and guide our synthesis of a probabilistic database. Like in the field of graphical models in AI, such a probabilistic database should be normalized so that the uncertainty arisen from competing hypotheses is decomposed into factors and propagated properly onto predictive data by recovering its joint probability distribution through a lossless join. That is motivated as a design-theoretic principle for data-driven hypothesis management and predictive analytics. The method is applicable to both quantitative and qualitative deterministic hypotheses and demonstrated in realistic use cases from computational science.Comment: 145 pages, 61 figures, 1 table. PhD thesis, National Laboratory for Scientific Computing (LNCC), Brazil, February 201

    The Fourth International VLDB Workshop on Management of Uncertain Data

    Get PDF
    This is the fourth edition of the international VLDB workshop on Management of Uncertain Data. Previous editions of this workshop took place in New Zealand, Austria and France. Research on uncertain data has grown over the past few years. Besides workshops on the topic of uncertain data, also sessions at large conferences, such as VLDB, on the same topic are organized. This edition, we have ten research talks, in four sessions, addressing different topics in uncertain data. In addition, we start the workshop with an invited talk by Peter Haas from IBM Research, entitled From MUD to MIRE: Managing Inherent Risk in the Enterprise. We would like to thank the reviewers for their time and effort. We would also like to thank the Centre Telematics and Information Technology for sponsoring the proceedings of the workshop. Ander de Keijzer Maurice van Keule

    Schema Design for Uncertain Databases

    No full text
    We address schema design in uncertain databases. Since uncertain data is relational in nature, decomposition becomes a key issue in design. Decomposition relies on dependency theory, and primarily on functional dependencies. We study the theory of functional dependencies (FDs) for uncertain relations. We define several kinds of horizontal FDs and vertical FDs, each of which is consistent with conventional FDs when an uncertain relation doesn’t contain any uncertainty. In addition to standard forms of decompositions allowed by ordinary relations, our FDs allow more complex decompositions specific to uncertain data. We show how our theory of FDs can be used for lossless decomposition of uncertain relations. We then present algorithms and complexity results for three fundamental problems with respect to FDs over ordinary and uncertain relations: (1) Testing whether a relation instance satisfies an FD; (2) Finding all FDs satisfied by a relation instance; and (3) Inferring all FDs that hold in the result of a query over uncertain relations with FDs. We also give a sound and complete axiomatization of horizontal and vertical FDs. We look at keys as a special case of FDs. Finally, we briefly consider uncertain data that contains confidence values
    corecore