3 research outputs found
Generative Datalog with Continuous Distributions
Arguing for the need to combine declarative and probabilistic programming,
B\'ar\'any et al. (TODS 2017) recently introduced a probabilistic extension of
Datalog as a "purely declarative probabilistic programming language." We
revisit this language and propose a more principled approach towards defining
its semantics based on stochastic kernels and Markov processes - standard
notions from probability theory. This allows us to extend the semantics to
continuous probability distributions, thereby settling an open problem posed by
B\'ar\'any et al.
We show that our semantics is fairly robust, allowing both parallel execution
and arbitrary chase orders when evaluating a program. We cast our semantics in
the framework of infinite probabilistic databases (Grohe and Lindner, ICDT
2020), and show that the semantics remains meaningful even when the input of a
probabilistic Datalog program is an arbitrary probabilistic database.Comment: Extended Versio
Probabilistic Data with Continuous Distributions
Statistical models of real world data typically involve continuous
probability distributions such as normal, Laplace, or exponential
distributions. Such distributions are supported by many probabilistic modelling
formalisms, including probabilistic database systems. Yet, the traditional
theoretical framework of probabilistic databases focusses entirely on finite
probabilistic databases.
Only recently, we set out to develop the mathematical theory of infinite
probabilistic databases. The present paper is an exposition of two recent
papers which are cornerstones of this theory. In (Grohe, Lindner; ICDT 2020) we
propose a very general framework for probabilistic databases, possibly
involving continuous probability distributions, and show that queries have a
well-defined semantics in this framework. In (Grohe, Kaminski, Katoen, Lindner;
PODS 2020) we extend the declarative probabilistic programming language
Generative Datalog, proposed by (B\'ar\'any et al.~2017) for discrete
probability distributions, to continuous probability distributions and show
that such programs yield generative models of continuous probabilistic
databases
Generative Datalog with Continuous Distributions
Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a "purely declarative probabilistic programming language." We revisit this language and propose a more foundational approach towards defining its semantics. It is based on standard notions from probability theory known as stochastic kernels and Markov processes. This allows us to extend the semantics to continuous probability distributions, thereby settling an open problem posed by Bárány et al. We show that our semantics is fairly robust, allowing both parallel execution and arbitrary chase orders when evaluating a program. We cast our semantics in the framework of infinite probabilistic databases (Grohe and Lindner, ICDT 2020), and we show that the semantics remains meaningful even when the input of a probabilistic Datalog program is an arbitrary probabilistic database