Search CORE

10 research outputs found

Provenance and Probabilities in Relational Databases: From Theory to Practice

Author: Senellart Pierre
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/12/2017
Field of study

International audienceWe review the basics of data provenance in relational databases. We describe different provenance formalisms, from Boolean provenance to provenance semirings and beyond, that can be used for a wide variety of purposes, to obtain additional information on the output of a query. We discuss representation systems for data provenance, circuits in particular, with a focus on practical implementation. Finally, we explain how provenance is practically used for probabilistic query evaluation in probabilistic databases

INRIA a CCSD electronic archive server

An Indexing Framework for Queries on Probabilistic Graphs

Author: Cheng CK
Maniu S
Senellart P
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

postprin

HKU Scholars Hub

The Fourth International VLDB Workshop on Management of Uncertain Data

Author: de Keijzer Ander
van Keulen Maurice
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 13/09/2010
Field of study

University of Twente Research Information

End-to-End Entity Resolution for Big Data: A Survey

Author: Christophides Vassilis
Efthymiou Vasilis
Palpanas Themis
Papadakis George
Stefanidis Kostas
Publication venue
Publication date: 01/02/1988
Field of study

One of the most important tasks for improving data quality and the reliability of data analytics results is Entity Resolution (ER). ER aims to identify different descriptions that refer to the same real-world entity, and remains a challenging problem. While previous works have studied specific aspects of ER (and mostly in traditional settings), in this survey, we provide for the first time an end-to-end view of modern ER workflows, and of the novel aspects of entity indexing and matching methods in order to cope with more than one of the Big Data characteristics simultaneously. We present the basic concepts, processing steps and execution strategies that have been proposed by different communities, i.e., database, semantic Web and machine learning, in order to cope with the loose structuredness, extreme diversity, high speed and large scale of entity descriptions used by real-world applications. Finally, we provide a synthetic discussion of the existing approaches, and conclude with a detailed presentation of open research directions

arXiv.org e-Print Archive

University of Richmond

Database Learning: Toward a Database that Becomes Smarter Every Time

Author: Acharya S.
Agrawal S.
Bishop C. M.
Carbonell J. G.
Carlson A.
Condie T.
Ganti V.
Idreos S.
Lawrence N.
Meliou A.
Micchelli C. A.
Mozafari B.
Mozafari B.
Mozafari B.
Olston C.
Park Y.
Rusu F.
Sarawagi S.
Sidirourgos L.
Skilling J.
Wasserman L.
Williams C. K.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/03/2017
Field of study

In today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the answer to each query reveals some degree of knowledge about the answer to another query because their answers stem from the same underlying distribution that has produced the entire dataset. Exploiting and refining this knowledge should allow us to answer queries more analytically, rather than by reading enormous amounts of raw data. Also, processing more queries should continuously enhance our knowledge of the underlying distribution, and hence lead to increasingly faster response times for future queries. We call this novel idea---learning from past query answers---Database Learning. We exploit the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations. Empowered by this idea, we build a query engine on top of Spark SQL, called Verdict. We conduct extensive experiments on real-world query traces from a large customer of a major database vendor. Our results demonstrate that Verdict supports 73.7% of these queries, speeding them up by up to 23.0x for the same accuracy level compared to existing AQP systems.Comment: This manuscript is an extended report of the work published in ACM SIGMOD conference 201

arXiv.org e-Print Archive

Crossref

Fundamental Approaches to Software Engineering

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2021
Field of study

This open access book constitutes the proceedings of the 23rd International Conference on Fundamental Approaches to Software Engineering, FASE 2020, which took place in Dublin, Ireland, in April 2020, and was held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020. The 23 full papers, 1 tool paper and 6 testing competition papers presented in this volume were carefully reviewed and selected from 81 submissions. The papers cover topics such as requirements engineering, software architectures, specification, software quality, validation, verification of functional and non-functional properties, model-driven development and model transformation, software processes, security and software evolution

Directory of Open Access Books (DOAB)

A distributed in-memory database system for large-scale spatial-temporal trajectory data

Author: Alves Peixoto Douglas
Publication venue: 'University of Queensland Library'
Publication date: 16/08/2019
Field of study

University of Queensland eSpace