8 research outputs found

    Ontology-Mediated Query Answering over Log-Linear Probabilistic Data: Extended Version

    Get PDF
    Large-scale knowledge bases are at the heart of modern information systems. Their knowledge is inherently uncertain, and hence they are often materialized as probabilistic databases. However, probabilistic database management systems typically lack the capability to incorporate implicit background knowledge and, consequently, fail to capture some intuitive query answers. Ontology-mediated query answering is a popular paradigm for encoding commonsense knowledge, which can provide more complete answers to user queries. We propose a new data model that integrates the paradigm of ontology-mediated query answering with probabilistic databases, employing a log-linear probability model. We compare our approach to existing proposals, and provide supporting computational results

    The Dichotomy of Evaluating Homomorphism-Closed Queries on Probabilistic Graphs

    Get PDF
    We study the problem of probabilistic query evaluation on probabilistic graphs, namely, tuple-independent probabilistic databases on signatures of arity two. Our focus is the class of queries that is closed under homomorphisms, or equivalently, the infinite unions of conjunctive queries. Our main result states that all unbounded queries from this class are #P-hard for probabilistic query evaluation. As bounded queries from this class are equivalent to a union of conjunctive queries, they are already classified by the dichotomy of Dalvi and Suciu (2012). Hence, our result and theirs imply a complete data complexity dichotomy, between polynomial time and #P-hardness, for evaluating infinite unions of conjunctive queries over probabilistic graphs. This dichotomy covers in particular all fragments of infinite unions of conjunctive queries such as negation-free (disjunctive) Datalog, regular path queries, and a large class of ontology-mediated queries on arity-two signatures. Our result is shown by reducing from counting the valuations of positive partitioned 2-DNF formulae for some queries, or from the source-to-target reliability problem in an undirected graph for other queries, depending on properties of minimal models. The presented dichotomy result applies to even a special case of probabilistic query evaluation called generalized model counting, where fact probabilities must be 0, 0.5, or 1.Comment: 30 pages. Journal version of the ICDT'20 paper https://drops.dagstuhl.de/opus/volltexte/2020/11939/. Submitted to LMCS. The previous version (version 2) was the same as the ICDT'20 paper with some minor formatting tweaks and 7 extra pages of technical appendi

    Approximate weighted model integration on DNF structures

    Get PDF
    Weighted model counting consists of computing the weighted sum of all satisfying assignments of a propositional formula. Weighted model counting is well-known to be #P-hard for exact solving, but admits a fully polynomial randomized approximation scheme when restricted to DNF structures. In this work, we study weighted model integration, a generalization of weighted model counting which involves real variables in addition to propositional variables, and pose the following question: Does weighted model integration on DNF structures admit a fully polynomial randomized approximation scheme? Building on classical results from approximate weighted model counting and approximate volume computation, we show that weighted model integration on DNF structures can indeed be approximated for a class of weight functions. Our approximation algorithm is based on three subroutines, each of which can be a weak (i.e., approximate), or a strong (i.e., exact) oracle, and in all cases, comes along with accuracy guarantees. We experimentally verify our approach over randomly generated DNF instances of varying sizes, and show that our algorithm scales to large problem instances, involving up to 1K variables, which are currently out of reach for existing, general-purpose weighted model integration solvers

    Ontology-mediated query answering over log-linear probabilistic data

    No full text
    Large-scale knowledge bases are at the heart of modern information systems. Their knowledge is inherently uncertain, and hence they are often materialized as probabilistic databases. However, probabilistic database management systems typically lack the capability to incorporate implicit background knowledge and, consequently, fail to capture some intuitive query answers. Ontology-mediated query answering is a popular paradigm for encoding commonsense knowledge, which can provide more complete answers to user queries. We propose a new data model that integrates the paradigm of ontology-mediated query answering with probabilistic databases, employing a log-linear probability model. We compare our approach to existing proposals, and provide supporting computational results

    Ontology-mediated query answering over log-linear probabilistic data

    No full text
    Large-scale knowledge bases are at the heart of modern information systems. Their knowledge is inherently uncertain, and hence they are often materialized as probabilistic databases. However, probabilistic database management systems typically lack the capability to incorporate implicit background knowledge and, consequently, fail to capture some intuitive query answers. Ontology-mediated query answering is a popular paradigm for encoding commonsense knowledge, which can provide more complete answers to user queries. We propose a new data model that integrates the paradigm of ontology-mediated query answering with probabilistic databases, employing a log-linear probability model. We compare our approach to existing proposals, and provide supporting computational results

    Infinite Probabilistic Databases

    Get PDF
    Probabilistic databases (PDBs) model uncertainty in data in a quantitative way. In the established formal framework, probabilistic (relational) databases are finite probability spaces over relational database instances. This finiteness can clash with intuitive query behavior (Ceylan et al., KR 2016), and with application scenarios that are better modeled by continuous probability distributions (Dalvi et al., CACM 2009). We formally introduced infinite PDBs in (Grohe and Lindner, PODS 2019) with a primary focus on countably infinite spaces. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries and ultimately with the question whether queries have a well-defined semantics. We argue that finite point processes are an appropriate model from probability theory for dealing with general probabilistic databases. This allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and Datalog queries.Comment: This is the full version of the paper "Infinite Probabilistic Databases" presented at ICDT 2020 (arXiv:1904.06766