Search CORE

123 research outputs found

Randomized multi-pass streaming skyline algorithms

Author: Chomicki J.
Kossmann D.
Morse M. D.
Pei J.
Schweikardt N.
Tan K.-L.
Torlone R.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Simple Multi-Pass Streaming Algorithms for Skyline Points and Extreme Points

Author: Chan Timothy M.
Rahul Saladi
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)
Publication date: 01/01/2021
Field of study

In this paper, we present simple randomized multi-pass streaming algorithms for fundamental computational geometry problems of finding the skyline (maximal) points and the extreme points of the convex hull. For the skyline problem, one of our algorithm occupies O(h) space and performs O(log n) passes, where h is the number of skyline points. This improves the space bound of the currently best known result by Das Sarma, Lall, Nanongkai, and Xu [VLDB\u2709] by a logarithmic factor. For the extreme points problem, we present the first non-trivial result for any constant dimension greater than two: an O(h log^{O(1)}n) space and O(log^dn) pass algorithm, where h is the number of extreme points. Finally, we argue why randomization seems unavoidable for these problems, by proving lower bounds on the performance of deterministic algorithms for a related problem of finding maximal elements in a poset

Dagstuhl Research Online Publication Server

I/O-Efficient Planar Range Skyline and Attrition Priority Queues

Author: Kejlberg-Rasmussen Casper
Tao Yufei
Tsakalidis Konstantinos
Tsichlas Kostas
Yoon Jeonghun
Publication venue
Publication date: 01/01/2013
Field of study

In the planar range skyline reporting problem, we store a set P of n 2D points in a structure such that, given a query rectangle Q = [a_1, a_2] x [b_1, b_2], the maxima (a.k.a. skyline) of P \cap Q can be reported efficiently. The query is 3-sided if an edge of Q is grounded, giving rise to two variants: top-open (b_2 = \infty) and left-open (a_1 = -\infty) queries. All our results are in external memory under the O(n/B) space budget, for both the static and dynamic settings: * For static P, we give structures that answer top-open queries in O(log_B n + k/B), O(loglog_B U + k/B), and O(1 + k/B) I/Os when the universe is R^2, a U x U grid, and a rank space grid [O(n)]^2, respectively (where k is the number of reported points). The query complexity is optimal in all cases. * We show that the left-open case is harder, such that any linear-size structure must incur \Omega((n/B)^e + k/B) I/Os for a query. We show that this case is as difficult as the general 4-sided queries, for which we give a static structure with the optimal query cost O((n/B)^e + k/B). * We give a dynamic structure that supports top-open queries in O(log_2B^e (n/B) + k/B^1-e) I/Os, and updates in O(log_2B^e (n/B)) I/Os, for any e satisfying 0 \le e \le 1. This leads to a dynamic structure for 4-sided queries with optimal query cost O((n/B)^e + k/B), and amortized update cost O(log (n/B)). As a contribution of independent interest, we propose an I/O-efficient version of the fundamental structure priority queue with attrition (PQA). Our PQA supports FindMin, DeleteMin, and InsertAndAttrite all in O(1) worst case I/Os, and O(1/B) amortized I/Os per operation. We also add the new CatenateAndAttrite operation that catenates two PQAs in O(1) worst case and O(1/B) amortized I/Os. This operation is a non-trivial extension to the classic PQA of Sundar, even in internal memory.Comment: Appeared at PODS 2013, New York, 19 pages, 10 figures. arXiv admin note: text overlap with arXiv:1208.4511, arXiv:1207.234

arXiv.org e-Print Archive

Crossref

Hong Kong University of Science and Technology Institutional Repository

Recommended from our members

High Throughput Heavy Hitter Aggregation

Author: Polychroniou Orestis
Ross Kenneth A.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2012
Field of study

Heavy hitters are data items that occur at high frequency in a data set. Heavy hitters are among the most important items for an organization to summarize and understand during analytical processing. In data sets with sufficient skew, the number of heavy hitters can be relatively small. We take advantage of this small footprint to compute aggregate functions for the heavy hitters in fast cache memory. We design cache-resident, shared-nothing structures that hold only the most frequent elements from the table. Our approach works in three phases. It first samples and picks heavy hitter candidates. It then builds a hash table and computes the exact aggregates of these candidates. Finally, if necessary, a validation step identifies the true heavy hitters from among the candidates based on the query specification. We identify trade-offs between the hash table capacity and performance. Capacity determines how many candidates can be aggregated. We optimize performance by the use of perfect hashing and SIMD instructions. SIMD instructions are utilized in novel ways to minimize cache accesses, be- yond simple vectorized operations. We use bucketized and cuckoo hash tables to increase capacity, to adapt to different datasets and query constraints. The performance of our method is an order of magnitude faster than in-memory aggregation over a complete set of items if those items cannot be cache resident. Even for item sets that are cache resident, our SIMD techniques enable significant performance improvements over previous work

Columbia University Academic Commons

31. međunarodna konferencija Very Large Data Bases

Author: Galić Zdravko
Publication venue: Croatian geodetic society
Publication date: 01/01/2005
Field of study

Dana je vijest o održanoj 31. međunarodnoj konferenciji Very Large Data Bases

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia