20 research outputs found

    Constant Approximation for kk-Median and kk-Means with Outliers via Iterative Rounding

    Full text link
    In this paper, we present a new iterative rounding framework for many clustering problems. Using this, we obtain an (α1+ϵ≤7.081+ϵ)(\alpha_1 + \epsilon \leq 7.081 + \epsilon)-approximation algorithm for kk-median with outliers, greatly improving upon the large implicit constant approximation ratio of Chen [Chen, SODA 2018]. For kk-means with outliers, we give an (α2+ϵ≤53.002+ϵ)(\alpha_2+\epsilon \leq 53.002 + \epsilon)-approximation, which is the first O(1)O(1)-approximation for this problem. The iterative algorithm framework is very versatile; we show how it can be used to give α1\alpha_1- and (α1+ϵ)(\alpha_1 + \epsilon)-approximation algorithms for matroid and knapsack median problems respectively, improving upon the previous best approximations ratios of 88 [Swamy, ACM Trans. Algorithms] and 17.4617.46 [Byrka et al, ESA 2015]. The natural LP relaxation for the kk-median/kk-means with outliers problem has an unbounded integrality gap. In spite of this negative result, our iterative rounding framework shows that we can round an LP solution to an almost-integral solution of small cost, in which we have at most two fractionally open facilities. Thus, the LP integrality gap arises due to the gap between almost-integral and fully-integral solutions. Then, using a pre-processing procedure, we show how to convert an almost-integral solution to a fully-integral solution losing only a constant-factor in the approximation ratio. By further using a sparsification technique, the additive factor loss incurred by the conversion can be reduced to any ϵ>0\epsilon > 0

    Placement Algorithms for Hierarchical Cooperative Caching and other . . .

    No full text
    In a large-scale information system, such as a digital library or the world wide web, a set of distributed caches can improve their effectiveness by cooperating with one another, both in serving each other's requests as well as in deciding what to store. This dissertation explores the potential of such cooperative caching, and provides basic placement algorithms using which the caches can coordinate their storage decisions. The first part of the dissertation focuses on variants of the placement problem involving a single object. The most well-known of these variants are the facility location problems, which have received considerable attention in the operations research literature due to their widespread applicability. We prove that a simple local search heuristic, proposed about 25 years ago, yields polynomial-time constant-factor approximations for several metric facility location problems. The second part of the dissertation addresses the simultaneous placement of a collection of objects in hierarchical networks. We provide both exact and approximate polynomial-time algorithms for this hierarchical placement problem. Our exact algorithm is based on a reduction to min-cost flow, and does not appear to be practical for large problem sizes. Hence we are motivated to look for simpler approximate algorithms. Our main result is a simple constant-factor approximation algorithm that admits an efficient distributed implementation

    Quasi-Fully Dynamic Algorithms for Two-Connectivity, Cycle Equivalence and Related Problems

    No full text
    In this paper weintroduce a new class of dynamic graph algorithms called quasi-fully dynamic algorithms, which are much more general than the backtracking algorithms and are much simpler than the fully dynamic algorithms. These algorithms are especially suitable for applications in which a certain core connected portion of the graph remains xed, and fully dynamic updates occur on the remaining edges in the graph. We present very simple quasi-fully dynamic algorithms with O(log n) worst case time, per operation, for 2-edge connectivity and cycle equivalence. The former is deterministic while the latter is Monte-Carlo type randomized. For 2-vertex connectivity, we give a randomized Las Vegas algorithm with O(log 4 n) expected amortized time per operation. We introduce the concept of quasi-k-edge-connectivity, which is a slightly relaxed version of k-edge connectivity, and show that it can be maintained in O(log n) worst case time per operation. We also analyze the performance of a natural extension of our quasi-fully dynamic algorithms to fully dynamic algorithms. The quasi-fully dynamic algorithm we present for cycle equivalence (which has several applications in optimizing compilers) is of special interest since the algorithm is quite simple, and no special-purpose incremental or backtracking algorithm is known for this problem.

    Coordinated Placement and Replacement for Large-Scale Distributed Caches

    No full text
    In a large-scale information system such as a digital library or the web, a set of distributed caches can improve their effectiveness by coordinating their data placement decisions. Using simulation, we examine three practical cooperative placement algorithms including one that is provably close to optimal, and we compare these algorithms to the optimal placement algorithm and several cooperative and non-cooperative replacement algorithms. We draw five conclusions from these experiments: (1) cooperative placement can significantly improve performance compared to local replacement algorithms particularly when the size of individual caches is limited compared to the universe of objects; (2) although the Amortized Placement algorithm is only guaranteed to be within 14 times the optimal, in practice it seems to provide an excellent approximation of the optimal; (3) in a cooperative caching scenario, the recent GreedyDual local replacement algorithm performs much better than the other local replacement algorithms; (4) our Hierarchical GreedyDual replacement algorithm yields further improvements over the GreedyDual algorithm especially when there are idle caches in the system; and (5) a key challenge to coordinated placement algorithms is generating good predictions of access patterns based on past accesses

    Server-storage virtualization: Integration and load balancing in data centers

    No full text
    Abstract—We describe the design of an agile data center with integrated server and storage virtualization technologies. Such data centers form a key building block for new cloud computing architectures. We also show how to leverage this integrated agility for non-disruptive load balancing in data centers across multiple resource layers- servers, switches, and storage. We propose a novel load balancing algorithm called VectorDot for handling the hierarchical and multi-dimensional resource constraints in such systems. The algorithm, inspired by the successful Toyoda method for multi-dimensional knapsacks, is the first of its kind. We evaluate our system on a range of synthetic and real data center testbeds comprising of VMware ESX servers, IBM SAN Volume Controller, Cisco and Brocade switches. Experiments under varied conditions demonstrate the end-to-end validity of our system and the ability of VectorDot to efficiently remove overloads on server, switch and storage nodes. I

    Equivalence and Related Problems

    No full text
    Abstract In this paper we introduce a new class of dynamic graph algorithms called quasi-fully dynamic algorithms, which are much more general than the backtracking algorithms and are much simpler than the fully dynamic algorithms. These algorithms are especially suitable for applications in which a certain core connected portion of the graph remains fixed, and fully dynamic updates occur on the remaining edges in the graph. We present very simple quasi-fully dynamic algorithms with O(log n) worst case time, per operation, for 2-edge connectivity and cycle equivalence. The former is deterministic while the latter is Monte-Carlo type randomized. For 2-vertex connectivity, we give a randomized Las Vegas algorithm with O(log4 n) expected amortized time per operation. We introduce the concept of quasi-k-edge-connectivity, which is a slightly relaxed version of k-edge connectivity, and show that it can be maintained in O(log n) worst case time per operation. We also analyze the performance of a natural extension of our quasi-fully dynamic algorithms to fully dynamic algorithms. The quasi-fully dynamic algorithm we present for cycle equivalence (which has several applications in optimizing compilers) is of special interest since the algorithm is quite simple, and no special-purpose incremental or backtracking algorithm is known for this problem. 1 Introduction Dynamic graph algorithms have received a great deal of attention in the last few years (see e.g., [4]). These algorithms maintain a property of a given graph under a sequence of suitably restricted updates and queries. Throughout this paper we will be concerned with edge updates (insertions/deletions) only: insertion/deletion of isolated vertices can be implemented trivially in all the known dynamic graph algorithms. The existing dynamic algorithms can be classified into three types depending on the nature of (edge) updates allowed: ffl Partially Dynamic: Only insertions are allowed (Incremental) or only deletions are allowed (Decremental)

    Coupled Placement in Modern Data Centers

    Get PDF
    We introduce the coupled placement problem for modern data centers spanning placement of application computation and data among available server and storage resources. While the two have traditionally been addressed independently in data centers, two modern trends make it beneficial to consider them together in a coupled manner: (a) rise in virtualization technologies, which enable applications packaged as VMs to be run on any server in the data center with spare compute resources, and (b) rise in multi-purpose hardware devices in the data center which provide compute resources of varying capabilities at different proximities from the storage nodes. We present a novel framework called CPA for addressing such coupled placement of application data and computation in modern data centers. Based on two well-studied problems – Stable Marriage and Knapsacks – the CPA framework is simple, fast, versatile and automatically enables high throughput applications to be placed on nearby server and storage node pairs. While a theoretical proof of CPA’s worst-case approximation guarantee remains an open question, we use extensive experimental analysis to evaluate CPA on large synthetic data centers comparing it to Linear Programming based methods and other traditional methods. Experiments show that CPA is consistently and surprisingly within 0 to 4 % of the Linear Programming based optimal values for various data center topologies and workload patterns. At the same time it is one to two orders of magnitude faster than the LP based methods and is able to scale to much larger problem sizes. The fast running time of CPA makes it highly suitable for large data center environments where hundreds to thousands of server and storage nodes are common. LP based approaches are prohibitively slow in such environments. CPA is also suitable for fast interactive analysis during consolidation of such environments from physical to virtual resources. 1
    corecore