28 research outputs found
Query Answering in Probabilistic Data and Knowledge Bases
Probabilistic data and knowledge bases are becoming increasingly important in academia and industry. They are continuously extended with new data, powered by modern information extraction tools that associate probabilities with knowledge base facts. The state of the art to store and process such data is founded on probabilistic database systems, which are widely and successfully employed. Beyond all the success stories, however, such systems still lack the fundamental machinery to convey some of the valuable knowledge hidden in them to the end user, which limits their potential applications in practice. In particular, in their classical form, such systems are typically based on strong, unrealistic limitations, such as the closed-world assumption, the closed-domain assumption, the tuple-independence assumption, and the lack of commonsense knowledge. These limitations do not only lead to unwanted consequences, but also put such systems on weak footing in important tasks, querying answering being a very central one. In this thesis, we enhance probabilistic data and knowledge bases with more realistic data models, thereby allowing for better means for querying them. Building on the long endeavor of unifying logic and probability, we develop different rigorous semantics for probabilistic data and knowledge bases, analyze their computational properties and identify sources of (in)tractability and design practical scalable query answering algorithms whenever possible. To achieve this, the current work brings together some recent paradigms from logics, probabilistic inference, and database theory
Learning to Reason: Leveraging Neural Networks for Approximate DNF Counting
Weighted model counting (WMC) has emerged as a prevalent approach for
probabilistic inference. In its most general form, WMC is #P-hard. Weighted DNF
counting (weighted #DNF) is a special case, where approximations with
probabilistic guarantees are obtained in O(nm), where n denotes the number of
variables, and m the number of clauses of the input DNF, but this is not
scalable in practice. In this paper, we propose a neural model counting
approach for weighted #DNF that combines approximate model counting with deep
learning, and accurately approximates model counts in linear time when width is
bounded. We conduct experiments to validate our method, and show that our model
learns and generalizes very well to large-scale #DNF instances.Comment: To appear in Proceedings of the Thirty-Fourth AAAI Conference on
Artificial Intelligence (AAAI-20). Code and data available at:
https://github.com/ralphabb/NeuralDNF
The Dichotomy of Evaluating Homomorphism-Closed Queries on Probabilistic Graphs
We study the problem of probabilistic query evaluation on probabilistic
graphs, namely, tuple-independent probabilistic databases on signatures of
arity two. Our focus is the class of queries that is closed under
homomorphisms, or equivalently, the infinite unions of conjunctive queries. Our
main result states that all unbounded queries from this class are #P-hard for
probabilistic query evaluation. As bounded queries from this class are
equivalent to a union of conjunctive queries, they are already classified by
the dichotomy of Dalvi and Suciu (2012). Hence, our result and theirs imply a
complete data complexity dichotomy, between polynomial time and #P-hardness,
for evaluating infinite unions of conjunctive queries over probabilistic
graphs. This dichotomy covers in particular all fragments of infinite unions of
conjunctive queries such as negation-free (disjunctive) Datalog, regular path
queries, and a large class of ontology-mediated queries on arity-two
signatures. Our result is shown by reducing from counting the valuations of
positive partitioned 2-DNF formulae for some queries, or from the
source-to-target reliability problem in an undirected graph for other queries,
depending on properties of minimal models. The presented dichotomy result
applies to even a special case of probabilistic query evaluation called
generalized model counting, where fact probabilities must be 0, 0.5, or 1.Comment: 30 pages. Journal version of the ICDT'20 paper
https://drops.dagstuhl.de/opus/volltexte/2020/11939/. Submitted to LMCS. The
previous version (version 2) was the same as the ICDT'20 paper with some
minor formatting tweaks and 7 extra pages of technical appendi
Most Probable Explanations for Probabilistic Database Queries: Extended Version
Forming the foundations of large-scale knowledge bases, probabilistic databases have been widely studied in the literature. In particular, probabilistic query evaluation has been investigated intensively as a central inference mechanism. However, despite its power, query evaluation alone cannot extract all the relevant information encompassed in large-scale knowledge bases. To exploit this potential, we study two inference tasks; namely finding the most probable database and the most probable hypothesis for a given query. As natural counterparts of most probable explanations (MPE) and maximum a posteriori hypotheses (MAP) in probabilistic graphical models, they can be used in a variety of applications that involve prediction or diagnosis tasks. We investigate these problems relative to a variety of query languages, ranging from conjunctive queries to ontology-mediated queries, and provide a detailed complexity analysis
Approximate weighted model integration on DNF structures
Weighted model counting consists of computing the weighted sum of all satisfying assignments of a propositional formula. Weighted model counting is well-known to be #P-hard for exact solving, but admits a fully polynomial randomized approximation scheme when restricted to DNF structures. In this work, we study weighted model integration, a generalization of weighted model counting which involves real variables in addition to propositional variables, and pose the following question: Does weighted model integration on DNF structures admit a fully polynomial randomized approximation scheme? Building on classical results from approximate weighted model counting and approximate volume computation, we show that weighted model integration on DNF structures can indeed be approximated for a class of weight functions. Our approximation algorithm is based on three subroutines, each of which can be a weak (i.e., approximate), or a strong (i.e., exact) oracle, and in all cases, comes along with accuracy guarantees. We experimentally verify our approach over randomly generated DNF instances of varying sizes, and show that our algorithm scales to large problem instances, involving up to 1K variables, which are currently out of reach for existing, general-purpose weighted model integration solvers
Ontology-Mediated Query Answering over Log-Linear Probabilistic Data: Extended Version
Large-scale knowledge bases are at the heart of modern information systems. Their knowledge is inherently uncertain, and hence they are often materialized as probabilistic databases. However, probabilistic database management systems typically lack the capability to incorporate implicit background knowledge and, consequently, fail to capture some intuitive query answers. Ontology-mediated query answering is a popular paradigm for encoding commonsense knowledge, which can provide more complete answers to user queries. We propose a new data model that integrates the paradigm of ontology-mediated query answering with probabilistic databases, employing a log-linear probability model. We compare our approach to existing proposals, and provide supporting computational results