Search CORE

6 research outputs found

Faster Query Answering in Probabilistic Databases using Read-Once Functions

Author: Perduca Vittorio
Roy Sudeepa
Tannen Val
Publication venue
Publication date: 04/12/2010
Field of study

A boolean expression is in read-once form if each of its variables appears exactly once. When the variables denote independent events in a probability space, the probability of the event denoted by the whole expression in read-once form can be computed in polynomial time (whereas the general problem for arbitrary expressions is #P-complete). Known approaches to checking read-once property seem to require putting these expressions in disjunctive normal form. In this paper, we tell a better story for a large subclass of boolean event expressions: those that are generated by conjunctive queries without self-joins and on tuple-independent probabilistic databases. We first show that given a tuple-independent representation and the provenance graph of an SPJ query plan without self-joins, we can, without using the DNF of a result event expression, efficiently compute its co-occurrence graph. From this, the read-once form can already, if it exists, be computed efficiently using existing techniques. Our second and key contribution is a complete, efficient, and simple to implement algorithm for computing the read-once forms (whenever they exist) directly, using a new concept, that of co-table graph, which can be significantly smaller than the co-occurrence graph.Comment: Accepted in ICDT 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

system architecture for approximate query processing

Author: Ezio Lefons
Filippo Tangorra
Francesco Di Tria
Publication venue
Publication date: 01/01/2016
Field of study

Decision making is an activity that addresses the problem of extracting knowledge and information from data stored in data warehouses, in order to improve the business processes of information systems. Usually, decision making is based on On-Line Analytical Processing, data mining, or approximate query processing. In the last case, answers to analytical queries are provided in a fast manner, although affected with a small percentage of error. In the paper, we present the architecture of an approximate query answering system. Then, we illustrate our ADAP (Analytical Data Profile) system, which is based on an engine able to provide fast responses to the main statistical functions by using orthogonal polynomials series to approximate the data distribution of multidimensional relations. Moreover, several experimental results to measure the approximation error are shown and the response-time to analytical queries is reported.</p

Crossref

Archivio istituzionale della ricerca - Università di Bari

Open Access Repository

Deriving predicate statistics in datalog

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Crossref

Rank-aware, Approximate Query Processing on the Semantic Web

Author: Wagner Andreas Josef
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

Search over the Semantic Web corpus frequently leads to queries having large result sets. So, in order to discover relevant data elements, users must rely on ranking techniques to sort results according to their relevance. At the same time, applications oftentimes deal with information needs, which do not require complete and exact results. In this thesis, we face the problem of how to process queries over Web data in an approximate and rank-aware fashion

KITopen

Graph-Based Synopses for Relational Selectivity Estimation

Author: Joshua Spiegel et al.
Publication venue
Publication date: 01/01/2006
Field of study

This paper introduces the Tuple Graph (TuG) synopses, a new class of data summaries that enable accurate selectivity estimates for complex relational queries. The proposed summarization framework adopts a “semi-structured” view of the relational database, modeling a relational data set as a graph of tuples and join queries as graph traversals respectively. The key idea is to approximate the structure of the induced data graph in a concise synopsis, and to estimate the selectivity of a query by performing the corresponding traversal over the summarized graph. We detail the TuG synopsis model that is based on this novel approach, and we describe an efficient and scalable construction algorithm for building accurate TuGs within a specific storage budget. We validate the performance of TuGs with an extensive experimental study on real-life and synthetic data sets. Our results verify the effectiveness of TuGs in generating accurate selectivity estimates for complex join queries, and demonstrate their benefits over existing summarization techniques

CiteSeerX