Search CORE

17,227 research outputs found

Explaining inference queries with Bayesian optimization

Author: Lockhart Brandon
Publication venue
Publication date: 08/04/2021
Field of study

Obtaining an explanation for an SQL query result can enrich the analysis experience, reveal data errors, and provide deeper insight into the data. Inference query explanation seeks to explain unexpected aggregate query results on inference data; such queries are challenging to explain because an explanation may need to be derived from the source, training, or inference data in an ML pipeline. In this work, we model an objective function as a black-box function and propose BOExplain, a novel framework for explaining inference queries using Bayesian optimization (BO). An explanation is a predicate defining the input tuples that should be removed so that the query result of interest is significantly affected. BO - a technique for finding the global optimum of a black-box function - is used to find the best predicate. We develop two new techniques (individual contribution encoding and warm start) to handle categorical variables. We perform experiments showing that the predicates found by BOExplain have a higher degree of explanation compared to those found by the state-of-the-art query explanation engines. We also show that BOExplain is effective at deriving explanations for inference queries from source and training data on three real-world datasets

Simon Fraser University Institutional Repository

A network-aware framework for energy-efficient data acquisition in wireless sensor networks

Author: Andreou
Andreou
Chao
Demetrios Zeinalipour-Yazti
Diallo
Fagin
George S. Samaras
Heinzelman
Hitha
Liao
Luu
Panayiotis G. Andreou
Panos K. Chrysanthis
Roantree
Sharaf
Shen
Souto
Virmani
Yu
Publication venue: 'Elsevier BV'
Publication date: 16/09/2014
Field of study

Wireless sensor networks enable users to monitor the physical world at an extremely high fidelity. In order to collect the data generated by these tiny-scale devices, the data management community has proposed the utilization of declarative data-acquisition frameworks. While these frameworks have facilitated the energy-efficient retrieval of data from the physical environment, they were agnostic of the underlying network topology and also did not support advanced query processing semantics. In this paper we present KSpot+, a distributed network-aware framework that optimizes network efficiency by combining three components: (i) the tree balancing module, which balances the workload of each sensor node by constructing efficient network topologies; (ii) the workload balancing module, which minimizes data reception inefficiencies by synchronizing the sensor network activity intervals; and (iii) the query processing module, which supports advanced query processing semantics. In order to validate the efficiency of our approach, we have developed a prototype implementation of KSpot+ in nesC and JAVA. In our experimental evaluation, we thoroughly assess the performance of KSpot+ using real datasets and show that KSpot+ provides significant energy reductions under a variety of conditions, thus significantly prolonging the longevity of a WSN

CLoK

Crossref

Efficient Iterative Processing in the SciDB Parallel Array Engine

Author: Balazinska Magdalena
Connolly Andrew
Krughoff Simon
Soroush Emad
Publication venue
Publication date: 31/05/2015
Field of study

Many scientific data-intensive applications perform iterative computations on array data. There exist multiple engines specialized for array processing. These engines efficiently support various types of operations, but none includes native support for iterative processing. In this paper, we develop a model for iterative array computations and a series of optimizations. We evaluate the benefits of an optimized, native support for iterative array processing on the SciDB engine and real workloads from the astronomy domain

arXiv.org e-Print Archive

CiteSeerX

Crossref

Statistical structures for internet-scale data management

Author: Ntarmos N.
Triantafillou P.
Weikum G.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability

CiteSeerX

Springer - Publisher Connector

Enlighten

MPG.PuRe