1 research outputs found
Tuple-Independent Representations of Infinite Probabilistic Databases
Probabilistic databases (PDBs) are probability spaces over database
instances. They provide a framework for handling uncertainty in databases, as
occurs due to data integration, noisy data, data from unreliable sources or
randomized processes. Most of the existing theory literature investigated
finite, tuple-independent PDBs (TI-PDBs) where the occurrences of tuples are
independent events. Only recently, Grohe and Lindner (PODS '19) introduced
independence assumptions for PDBs beyond the finite domain assumption. In the
finite, a major argument for discussing the theoretical properties of TI-PDBs
is that they can be used to represent any finite PDB via views. This is no
longer the case once the number of tuples is countably infinite. In this paper,
we systematically study the representability of infinite PDBs in terms of
TI-PDBs and the related block-independent disjoint PDBs.
The central question is which infinite PDBs are representable as first-order
views over tuple-independent PDBs. We give a necessary condition for the
representability of PDBs and provide a sufficient criterion for
representability in terms of the probability distribution of a PDB. With
various examples, we explore the limits of our criteria. We show that
conditioning on first order properties yields no additional power in terms of
expressivity. Finally, we discuss the relation between purely logical and
arithmetic reasons for (non-)representability