Search CORE

538,924 research outputs found

GreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams

Author: Chang Ching
Li Feifei
Bestavros Azer
Kollios
Publication venue: Boston University Computer Science Department
Publication date: 01/01/1997
Field of study

We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the joins contain much smaller number of tuples than the tuples contained in the sliding windows. Therefore, a stream buffer management policy is needed in that case. We show that the buffer replacement policy is an important determinant of the quality of the produced results. To that end, we propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering technique for managing these buffers. GDJ exploits the temporal correlations (at both long and short time scales), which we found to be prevalent in many real data streams. We note that our algorithm is readily applicable to multiple data streams and multiple joins and requires almost no additional system resources. We report results of an experimental study using both synthetic and real-world data sets. Our results demonstrate the superiority and flexibility of our approach when contrasted to other recently proposed techniques

Boston University Institutional Repository (OpenBU)

Sampling-Based Query Re-Optimization

Author: Bruno N.
Graefe G.
Ioannidis Y. E.
Poosala V.
Reddy N.
Stillger M.
Publication venue
Publication date: 21/01/2016
Field of study

Despite of decades of work, query optimizers still make mistakes on "difficult" queries because of bad cardinality estimates, often due to the interaction of multiple predicates and correlations in the data. In this paper, we propose a low-cost post-processing step that can take a plan produced by the optimizer, detect when it is likely to have made such a mistake, and take steps to fix it. Specifically, our solution is a sampling-based iterative procedure that requires almost no changes to the original query optimizer or query evaluation mechanism of the system. We show that this indeed imposes low overhead and catches cases where three widely used optimizers (PostgreSQL and two commercial systems) make large errors.Comment: This is the extended version of a paper with the same title and authors that appears in the Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2016

arXiv.org e-Print Archive

Crossref

Communication Steps for Parallel Query Processing

Author: Beame Paul
Koutris Paraschos
Suciu Dan
Publication venue
Publication date: 01/01/2013
Field of study

We consider the problem of computing a relational query

q

on a large input database of size

n

, using a large number

p

of servers. The computation is performed in rounds, and each server can receive only

O(n/p^{1-\varepsilon})

bits of data, where

\varepsilon \in [0,1]

is a parameter that controls replication. We examine how many global communication steps are needed to compute

q

. We establish both lower and upper bounds, in two settings. For a single round of communication, we give lower bounds in the strongest possible model, where arbitrary bits may be exchanged; we show that any algorithm requires

\varepsilon \geq 1-1/\tau^*

, where

\tau^*

is the fractional vertex cover of the hypergraph of

q

. We also give an algorithm that matches the lower bound for a specific class of databases. For multiple rounds of communication, we present lower bounds in a model where routing decisions for a tuple are tuple-based. We show that for the class of tree-like queries there exists a tradeoff between the number of rounds and the space exponent

\varepsilon

. The lower bounds for multiple rounds are the first of their kind. Our results also imply that transitive closure cannot be computed in O(1) rounds of communication

arXiv.org e-Print Archive

CiteSeerX

Crossref

Fast Parallel Operations on Search Trees

Author: Akhremtsev Yaroslav
Sanders Peter
Publication venue
Publication date: 11/05/2016
Field of study

Using (a,b)-trees as an example, we show how to perform a parallel split with logarithmic latency and parallel join, bulk updates, intersection, union (or merge), and (symmetric) set difference with logarithmic latency and with information theoretically optimal work. We present both asymptotically optimal solutions and simplified versions that perform well in practice - they are several times faster than previous implementations

arXiv.org e-Print Archive

Crossref

Stochastic order results and equilibrium joining rules for the Bernoulli Feedback Queue

Author: Brooms Anthony C.
Collins E.J.
Publication venue: Birkbeck College, University of London
Publication date: 01/09/2013
Field of study

We consider customer joining behaviour for a system that consists of a FCFS queue with Bernoulli feedback. A consequence of the feedback characteristic is that the sojourn time of a customer already in the system depends on the joining decisions taken by future arrivals to the system. By establishing stochastic order results for coupled versions of the system, we establish the existence of homogeneous Nash equilibrium joining policies for both single and multiple customer types which are distinguished through distinct quality of service preference parameters. Further, it is shown that for a single customer type, the homogeneous policy is unique

Birkbeck Institutional Research Online