Search CORE

4,988 research outputs found

Revisiting the formal foundation of Probabilistic Databases

Author: Keulen Maurice van
Wanders Brend
Publication venue: Atlantis Press
Publication date: 01/01/2015
Field of study

One of the core problems in soft computing is dealing with uncertainty in data. In this paper, we revisit the formal foundation of a class of probabilistic databases with the purpose to (1) obtain data model independence, (2) separate metadata on uncertainty and probabilities from the raw data, (3) better understand aggregation, and (4) create more opportunities for optimization. The paper presents the formal framework and validates data model independence by showing how to a obtain probabilistic Datalog as well as a probabilistic relational algebra by applying the framework to their non-probabilistic counterparts. We conclude with a discussion on the latter three goals

Crossref

University of Twente Research Information

Infinite Probabilistic Databases

Author: Grohe Martin
Lindner Peter
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Conference on Database Theory (ICDT 2020)
Publication date: 01/01/2020
Field of study

Probabilistic databases (PDBs) are used to model uncertainty in data in a quantitative way. In the standard formal framework, PDBs are finite probability spaces over relational database instances. It has been argued convincingly that this is not compatible with an open-world semantics (Ceylan et al., KR 2016) and with application scenarios that are modeled by continuous probability distributions (Dalvi et al., CACM 2009). We recently introduced a model of PDBs as infinite probability spaces that addresses these issues (Grohe and Lindner, PODS 2019). While that work was mainly concerned with countably infinite probability spaces, our focus here is on uncountable spaces. Such an extension is necessary to model typical continuous probability distributions that appear in many applications. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries and ultimately with the question whether queries have a well-defined semantics. It turns out that so-called finite point processes are the appropriate model from probability theory for dealing with probabilistic databases. This model allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and Datalog queries

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Incremental data uncertainty handling using evidence combination: a case study on maritime data reasoning

Author: Flokstra Jan
Habib Mena B.
Keulen Maurice van
Wanders Brend
Publication venue: IEEE Computer Society
Publication date: 01/09/2015
Field of study

Semantic incompatibility is a conflict that occurs in the meanings of data. In this paper, we propose an approach for data cleaning by resolving semantic incompatibility. Our approach applies a dynamic and incremental enhancement of data quality. It checks the coherency/conflict of the newly recorded facts/relations against the existing ones. It reasons over the existing information and comes up with new discovered facts/relations. We choose maritime data cleaning as a validation scenario

University of Twente Research Information

On the Measurement of Privacy as an Attacker's Estimation Error

Author: Diaz Claudia
Forné Jordi
Parra-Arnau Javier
Rebollo-Monedero David
Publication venue
Publication date: 01/01/2012
Field of study

A wide variety of privacy metrics have been proposed in the literature to evaluate the level of protection offered by privacy enhancing-technologies. Most of these metrics are specific to concrete systems and adversarial models, and are difficult to generalize or translate to other contexts. Furthermore, a better understanding of the relationships between the different privacy metrics is needed to enable more grounded and systematic approach to measuring privacy, as well as to assist systems designers in selecting the most appropriate metric for a given application. In this work we propose a theoretical framework for privacy-preserving systems, endowed with a general definition of privacy in terms of the estimation error incurred by an attacker who aims to disclose the private information that the system is designed to conceal. We show that our framework permits interpreting and comparing a number of well-known metrics under a common perspective. The arguments behind these interpretations are based on fundamental results related to the theories of information, probability and Bayes decision.Comment: This paper has 18 pages and 17 figure

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Context Aware Computing for The Internet of Things: A Survey

Author: Arkady Zaslavsky
Charith Perera
Dimitrios Georgakopoulos
Peter Christen
Student Member
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/05/2013
Field of study

As we are moving towards the Internet of Things (IoT), the number of sensors deployed around the world is growing at a rapid pace. Market research has shown a significant growth of sensor deployments over the past decade and has predicted a significant increment of the growth rate in the future. These sensors continuously generate enormous amounts of data. However, in order to add value to raw sensor data we need to understand it. Collection, modelling, reasoning, and distribution of context in relation to sensor data plays critical role in this challenge. Context-aware computing has proven to be successful in understanding sensor data. In this paper, we survey context awareness from an IoT perspective. We present the necessary background by introducing the IoT paradigm and context-aware fundamentals at the beginning. Then we provide an in-depth analysis of context life cycle. We evaluate a subset of projects (50) which represent the majority of research and commercial solutions proposed in the field of context-aware computing conducted over the last decade (2001-2011) based on our own taxonomy. Finally, based on our evaluation, we highlight the lessons to be learnt from the past and some possible directions for future research. The survey addresses a broad range of techniques, methods, models, functionalities, systems, applications, and middleware solutions related to context awareness and IoT. Our goal is not only to analyse, compare and consolidate past research work but also to appreciate their findings and discuss their applicability towards the IoT.Comment: IEEE Communications Surveys & Tutorials Journal, 201

arXiv.org e-Print Archive

CiteSeerX

Deakin Research Online

Online Research @ Cardiff

The Australian National University

VerdictDB: Universalizing Approximate Query Processing

Author: Bickel P. J.
Bootstrapping Sample Survey Data Comparing Recent
Canty A. J.
Condie T.
Eykholt K.
Flajolet P.
Ganti V.
Hall P.
Kleiner A.
Mayo D. G.
Meliou A.
Mozafari B.
Mozafari B.
Mozafari B.
Mozafari B.
Olston C.
Park Y.
Politis D. N.
Sidirourgos L.
Su H.
Vrbsky S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/11/2018
Field of study

Despite 25 years of research in academia, approximate query processing (AQP) has had little industrial adoption. One of the major causes of this slow adoption is the reluctance of traditional vendors to make radical changes to their legacy codebases, and the preoccupation of newer vendors (e.g., SQL-on-Hadoop products) with implementing standard features. Additionally, the few AQP engines that are available are each tied to a specific platform and require users to completely abandon their existing databases---an unrealistic expectation given the infancy of the AQP technology. Therefore, we argue that a universal solution is needed: a database-agnostic approximation engine that will widen the reach of this emerging technology across various platforms. Our proposal, called VerdictDB, uses a middleware architecture that requires no changes to the backend database, and thus, can work with all off-the-shelf engines. Operating at the driver-level, VerdictDB intercepts analytical queries issued to the database and rewrites them into another query that, if executed by any standard relational engine, will yield sufficient information for computing an approximate answer. VerdictDB uses the returned result set to compute an approximate answer and error estimates, which are then passed on to the user or application. However, lack of access to the query execution layer introduces significant challenges in terms of generality, correctness, and efficiency. This paper shows how VerdictDB overcomes these challenges and delivers up to 171

\times

speedup (18.45

\times

on average) for a variety of existing engines, such as Impala, Spark SQL, and Amazon Redshift, while incurring less than 2.6% relative error. VerdictDB is open-sourced under Apache License.Comment: Extended technical report of the paper that appeared in Proceedings of the 2018 International Conference on Management of Data, pp. 1461-1476. ACM, 201

arXiv.org e-Print Archive

Crossref

Infinite Probabilistic Databases

Author: Grohe Martin
Lindner Peter
Publication venue
Publication date: 29/07/2021
Field of study

Probabilistic databases (PDBs) model uncertainty in data in a quantitative way. In the established formal framework, probabilistic (relational) databases are finite probability spaces over relational database instances. This finiteness can clash with intuitive query behavior (Ceylan et al., KR 2016), and with application scenarios that are better modeled by continuous probability distributions (Dalvi et al., CACM 2009). We formally introduced infinite PDBs in (Grohe and Lindner, PODS 2019) with a primary focus on countably infinite spaces. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries and ultimately with the question whether queries have a well-defined semantics. We argue that finite point processes are an appropriate model from probability theory for dealing with general probabilistic databases. This allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and Datalog queries.Comment: This is the full version of the paper "Infinite Probabilistic Databases" presented at ICDT 2020 (arXiv:1904.06766

arXiv.org e-Print Archive

Episciences.org

Welcome to Sigmod 2019 - The 2019 ACM SIGMOD International Conference on the Management of Data!

Author: Ailamaki A. (Anastasia)
Boncz P.A. (Peter)
Manegold S. (Stefan)
Publication venue
Publication date: 30/06/2019
Field of study

CWI's Institutional Repository