4,369 research outputs found

    Running Median Algorithm and Implementation for Integer Streaming Applications

    Get PDF
    A novel algorithm is proposed to compute the median of a running window of m integers in O(lg lg m) time. For a new window, the new median value is computed as a simple decision based on the previous median and the values removed and inserted into the window. This facilitates implementations based on data structures that support fast ordinal predecessor/successor operations. The results show accelerations of up to factors of six for integer data streaming in typical embedded processors

    Selection from read-only memory with limited workspace

    Full text link
    Given an unordered array of NN elements drawn from a totally ordered set and an integer kk in the range from 11 to NN, in the classic selection problem the task is to find the kk-th smallest element in the array. We study the complexity of this problem in the space-restricted random-access model: The input array is stored on read-only memory, and the algorithm has access to a limited amount of workspace. We prove that the linear-time prune-and-search algorithm---presented in most textbooks on algorithms---can be modified to use Θ(N)\Theta(N) bits instead of Θ(N)\Theta(N) words of extra space. Prior to our work, the best known algorithm by Frederickson could perform the task with Θ(N)\Theta(N) bits of extra space in O(NlgN)O(N \lg^{*} N) time. Our result separates the space-restricted random-access model and the multi-pass streaming model, since we can surpass the Ω(NlgN)\Omega(N \lg^{*} N) lower bound known for the latter model. We also generalize our algorithm for the case when the size of the workspace is Θ(S)\Theta(S) bits, where lg3NSN\lg^3{N} \leq S \leq N. The running time of our generalized algorithm is O(Nlg(N/S)+N(lgN)/lgS)O(N \lg^{*}(N/S) + N (\lg N) / \lg{} S), slightly improving over the O(Nlg(N(lgN)/S)+N(lgN)/lgS)O(N \lg^{*}(N (\lg N)/S) + N (\lg N) / \lg{} S) bound of Frederickson's algorithm. To obtain the improvements mentioned above, we developed a new data structure, called the wavelet stack, that we use for repeated pruning. We expect the wavelet stack to be a useful tool in other applications as well.Comment: 16 pages, 1 figure, Preliminary version appeared in COCOON-201

    Tracking the l_2 Norm with Constant Update Time

    Get PDF
    The l_2 tracking problem is the task of obtaining a streaming algorithm that, given access to a stream of items a_1,a_2,a_3,... from a universe [n], outputs at each time t an estimate to the l_2 norm of the frequency vector f^{(t)}in R^n (where f^{(t)}_i is the number of occurrences of item i in the stream up to time t). The previous work [Braverman-Chestnut-Ivkin-Nelson-Wang-Woodruff, PODS 2017] gave a streaming algorithm with (the optimal) space using O(epsilon^{-2}log(1/delta)) words and O(epsilon^{-2}log(1/delta)) update time to obtain an epsilon-accurate estimate with probability at least 1-delta. We give the first algorithm that achieves update time of O(log 1/delta) which is independent of the accuracy parameter epsilon, together with the nearly optimal space using O(epsilon^{-2}log(1/delta)) words. Our algorithm is obtained using the Count Sketch of [Charilkar-Chen-Farach-Colton, ICALP 2002]

    Fast and Lean Immutable Multi-Maps on the JVM based on Heterogeneous Hash-Array Mapped Tries

    Get PDF
    An immutable multi-map is a many-to-many thread-friendly map data structure with expected fast insert and lookup operations. This data structure is used for applications processing graphs or many-to-many relations as applied in static analysis of object-oriented systems. When processing such big data sets the memory overhead of the data structure encoding itself is a memory usage bottleneck. Motivated by reuse and type-safety, libraries for Java, Scala and Clojure typically implement immutable multi-maps by nesting sets as the values with the keys of a trie map. Like this, based on our measurements the expected byte overhead for a sparse multi-map per stored entry adds up to around 65B, which renders it unfeasible to compute with effectively on the JVM. In this paper we propose a general framework for Hash-Array Mapped Tries on the JVM which can store type-heterogeneous keys and values: a Heterogeneous Hash-Array Mapped Trie (HHAMT). Among other applications, this allows for a highly efficient multi-map encoding by (a) not reserving space for empty value sets and (b) inlining the values of singleton sets while maintaining a (c) type-safe API. We detail the necessary encoding and optimizations to mitigate the overhead of storing and retrieving heterogeneous data in a hash-trie. Furthermore, we evaluate HHAMT specifically for the application to multi-maps, comparing them to state-of-the-art encodings of multi-maps in Java, Scala and Clojure. We isolate key differences using microbenchmarks and validate the resulting conclusions on a real world case in static analysis. The new encoding brings the per key-value storage overhead down to 30B: a 2x improvement. With additional inlining of primitive values it reaches a 4x improvement
    corecore