slides

Tight bounds for data stream algorithms and communication problems

Abstract

In this thesis, we give efficient algorithms and near-tight lower bounds for the following problems in the streaming model. Improving on the works of Monemizadeh and Woodruff from SODA\u2710 and Andoni, Krauthgamer and Onak from FOCS\u2711, we give LpL_p-samplers requiring O(epsilonβˆ’plog2n)O(epsilon^{-p}log^2n) space for pin(1,2)pin(1,2). Our algorithm also works for pin[0,1]pin[0,1], taking tildeO(epsilonβˆ’1log2n)tilde{O}(epsilon^{-1}log^2n) space. As an application of our sampler, we give an O(log2n)O(log^2n) space algorithm for finding duplicates in data streams, improving the algorithms of Gopalan and Radhakrishnan from SODA\u2709. Given a stream that consists of a pattern of length mm and a text of length nn, the pattern matching problem is to output all occurrences of the pattern. Improving on the results of Porat and Porat from FOCS\u2709, we give a O(lognlogm)O(log{}nlog{}m) space algorithm that works entirely in the streaming model. Finally we show several near-tight lower bounds for the above problems through new results in communication complexity

    Similar works