Search CORE

31,032 research outputs found

Distributed top-k aggregation queries at large

Author: A. Marian
Gerhard Weikum
H. David
I.F. Ilyas
K. Church
K. Schnaitter
Matthias Bender
N. Bruno
Peter Triantafillou
R. Akbarinia
R. Fagin
Ralf Schenkel
S. Chaudhuri
S. Madden
Sebastian Michel
T. Cormen
Thomas Neumann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Springer - Publisher Connector

Enlighten

MPG.PuRe

A network-aware framework for energy-efficient data acquisition in wireless sensor networks

Author: Andreou
Andreou
Chao
Demetrios Zeinalipour-Yazti
Diallo
Fagin
George S. Samaras
Heinzelman
Hitha
Liao
Luu
Panayiotis G. Andreou
Panos K. Chrysanthis
Roantree
Sharaf
Shen
Souto
Virmani
Yu
Publication venue: 'Elsevier BV'
Publication date: 16/09/2014
Field of study

Wireless sensor networks enable users to monitor the physical world at an extremely high fidelity. In order to collect the data generated by these tiny-scale devices, the data management community has proposed the utilization of declarative data-acquisition frameworks. While these frameworks have facilitated the energy-efficient retrieval of data from the physical environment, they were agnostic of the underlying network topology and also did not support advanced query processing semantics. In this paper we present KSpot+, a distributed network-aware framework that optimizes network efficiency by combining three components: (i) the tree balancing module, which balances the workload of each sensor node by constructing efficient network topologies; (ii) the workload balancing module, which minimizes data reception inefficiencies by synchronizing the sensor network activity intervals; and (iii) the query processing module, which supports advanced query processing semantics. In order to validate the efficiency of our approach, we have developed a prototype implementation of KSpot+ in nesC and JAVA. In our experimental evaluation, we thoroughly assess the performance of KSpot+ using real datasets and show that KSpot+ provides significant energy reductions under a variety of conditions, thus significantly prolonging the longevity of a WSN

CLoK

Crossref

PF-OLA: A High-Performance Framework for Parallel On-Line Aggregation

Author: Qin Chengjie
Rusu Florin
Publication venue
Publication date: 20/02/2013
Field of study

Online aggregation provides estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution. This allows for the interactive data exploration of the largest datasets. In this paper we introduce the first framework for parallel online aggregation in which the estimation virtually does not incur any overhead on top of the actual execution. We define a generic interface to express any estimation model that abstracts completely the execution details. We design a novel estimator specifically targeted at parallel online aggregation. When executed by the framework over a massive

8\text{TB}

TPC-H instance, the estimator provides accurate confidence bounds early in the execution even when the cardinality of the final result is seven orders of magnitude smaller than the dataset size and without incurring overhead.Comment: 36 page

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Efficient Processing of Exact Top-k Queries over Disk-Resident Sorted Lists

Author: A. Marian
A. Silberschatz
A. Spink
B. Arai
B. Bloom
Baihua Zheng
D.D. Lewis
F. Korn
G. Adomavicius
H.P. Hung
HweeHwa Pang
K. Yi
L. Zhu
M. Hua
M. Theobald
M.A. Soliman
M.L. Yiu
N. Bruno
N. Mamoulis
R. Baeza-Yates
R. Fagin
S. Brin
S. Chaudhuri
S. Hwang
Xuhua Ding
Y. Tao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2010
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Interactive querying and data visualization for abuse detection in social network sites

Author: De Turck Filip
Ordonez Ante Leandro
Van Seghbroeck Gregory
Vanhove Thomas
Wauters Tim
Publication venue
Publication date: 01/01/2016
Field of study

Crossref

Ghent University Academic Bibliography