4 research outputs found

    Streaming Robust Submodular Maximization: A Partitioned Thresholding Approach

    Get PDF
    We study the classical problem of maximizing a monotone submodular function subject to a cardinality constraint k, with two additional twists: (i) elements arrive in a streaming fashion, and (ii) m items from the algorithm's memory are removed after the stream is finished. We develop a robust submodular algorithm STAR-T. It is based on a novel partitioning structure and an exponentially decreasing thresholding rule. STAR-T makes one pass over the data and retains a short but robust summary. We show that after the removal of any m elements from the obtained summary, a simple greedy algorithm STAR-T-GREEDY that runs on the remaining elements achieves a constant-factor approximation guarantee. In two different data summarization tasks, we demonstrate that it matches or outperforms existing greedy and streaming methods, even if they are allowed the benefit of knowing the removed subset in advance.Comment: To appear in NIPS 201

    When Stuck, Flip a Coin:New Algorithms for Large-Scale Tasks

    Get PDF
    Many modern services need to routinely perform tasks on a large scale. This prompts us to consider the following question: How can we design efficient algorithms for large-scale computation? In this thesis, we focus on devising a general strategy to address the above question. Our approaches use tools from graph theory and convex optimization, and prove to be very effective on a number of problems that exhibit locality. A recurring theme in our work is to use randomization to obtain simple and practical algorithms. The techniques we developed enabled us to make progress on the following questions: - Parallel Computation of Approximately Maximum Matchings. We put forth a new approach to computing O(1)O(1)-approximate maximum matchings in the Massively Parallel Computation (MPC) model. In the regime in which the memory per machine is Θ(n)\Theta(n), i.e., linear in the size of the vertex-set, our algorithm requires only O((loglogn)2)O((\log \log{n})^2) rounds of computations. This is an almost exponential improvement over the barrier of Ω(logn)\Omega(\log {n}) rounds that all the previous results required in this regime. - Parallel Computation of Maximal Independent Sets. We propose a simple randomized algorithm that constructs maximal independent sets in the MPC model. If the memory per machine is Θ(n)\Theta(n) our algorithm runs in O(loglogn)O(\log \log{n}) MPC-rounds. In the same regime, all the previously known algorithms required O(logn)O(\log{n}) rounds of computation. - Network Routing under Link Failures. We design a new protocol for stateless message-routing in kk-connected graphs. Our routing scheme has two important features: (1) each router performs the routing decisions based only on the local information available to it; and, (2) a message is delivered successfully even if arbitrary k1k-1 links have failed. This significantly improves upon the previous work of which the routing schemes tolerate only up to k/21k/2 - 1 failed links in kk-connected graphs. - Streaming Submodular Maximization under Element Removals. We study the problem of maximizing submodular functions subject to cardinality constraint kk, in the context of streaming algorithms. In a regime in which up to mm elements can be removed from the stream, we design an algorithm that provides a constant-factor approximation for this problem. At the same time, the algorithm stores only O(klog2k+mlog3k)O(k \log^2{k} + m \log^3{k}) elements. Our algorithm improves quadratically upon the prior work, that requires storing O(km)O(k \cdot m) many elements to solve the same problem. - Fast Recovery for the Separated Sparsity Model. In the context of compressed sensing, we put forth two recovery algorithms of nearly-linear time for the separated sparsity signals (that naturally model neural spikes). This improves upon the previous algorithm that had a quadratic running time. We also derive a refined version of the natural dynamic programming (DP) approach to the recovery of the separated sparsity signals. This DP approach leads to a recovery algorithm that runs in linear time for an important class of separated sparsity signals. Finally, we consider a generalization of these signals into two dimensions, and we show that computing an exact projection for the two-dimensional model is NP-hard
    corecore