1,125 research outputs found

    Memory-Adjustable Navigation Piles with Applications to Sorting and Convex Hulls

    Get PDF
    We consider space-bounded computations on a random-access machine (RAM) where the input is given on a read-only random-access medium, the output is to be produced to a write-only sequential-access medium, and the available workspace allows random reads and writes but is of limited capacity. The length of the input is NN elements, the length of the output is limited by the computation, and the capacity of the workspace is O(S)O(S) bits for some predetermined parameter SS. We present a state-of-the-art priority queue---called an adjustable navigation pile---for this restricted RAM model. Under some reasonable assumptions, our priority queue supports minimum\mathit{minimum} and insert\mathit{insert} in O(1)O(1) worst-case time and extract\mathit{extract} in O(N/S+lgS)O(N/S + \lg{} S) worst-case time for any SlgNS \geq \lg{} N. We show how to use this data structure to sort NN elements and to compute the convex hull of NN points in the two-dimensional Euclidean space in O(N2/S+NlgS)O(N^2/S + N \lg{} S) worst-case time for any SlgNS \geq \lg{} N. Following a known lower bound for the space-time product of any branching program for finding unique elements, both our sorting and convex-hull algorithms are optimal. The adjustable navigation pile has turned out to be useful when designing other space-efficient algorithms, and we expect that it will find its way to yet other applications.Comment: 21 page

    Randomized Algorithms for Tracking Distributed Count, Frequencies, and Ranks

    Full text link
    We show that randomization can lead to significant improvements for a few fundamental problems in distributed tracking. Our basis is the {\em count-tracking} problem, where there are kk players, each holding a counter nin_i that gets incremented over time, and the goal is to track an \eps-approximation of their sum n=inin=\sum_i n_i continuously at all times, using minimum communication. While the deterministic communication complexity of the problem is \Theta(k/\eps \cdot \log N), where NN is the final value of nn when the tracking finishes, we show that with randomization, the communication cost can be reduced to \Theta(\sqrt{k}/\eps \cdot \log N). Our algorithm is simple and uses only O(1) space at each player, while the lower bound holds even assuming each player has infinite computing power. Then, we extend our techniques to two related distributed tracking problems: {\em frequency-tracking} and {\em rank-tracking}, and obtain similar improvements over previous deterministic algorithms. Both problems are of central importance in large data monitoring and analysis, and have been extensively studied in the literature.Comment: 19 pages, 1 figur

    Improved Algorithms for Time Decay Streams

    Get PDF
    In the time-decay model for data streams, elements of an underlying data set arrive sequentially with the recently arrived elements being more important. A common approach for handling large data sets is to maintain a coreset, a succinct summary of the processed data that allows approximate recovery of a predetermined query. We provide a general framework that takes any offline-coreset and gives a time-decay coreset for polynomial time decay functions. We also consider the exponential time decay model for k-median clustering, where we provide a constant factor approximation algorithm that utilizes the online facility location algorithm. Our algorithm stores O(k log(h Delta)+h) points where h is the half-life of the decay function and Delta is the aspect ratio of the dataset. Our techniques extend to k-means clustering and M-estimators as well
    corecore