2,840 research outputs found

    Fully Dynamic Consistent Facility Location

    Get PDF
    We consider classic clustering problems in fully dynamic data streams, where data elements can be both inserted and deleted. In this context, several parameters are of importance: (1) the quality of the solution after each insertion or deletion, (2) the time it takes to update the solution, and (3) how different consecutive solutions are. The question of obtaining efficient algorithms in this context for facility location, k-median and k-means has been raised in a recent paper by Hubert-Chan et al. [WWW'18] and also appears as a natural follow-up on the online model with recourse studied by Lattanzi and Vassilvitskii [ICML'17] (i.e.: in insertion-only streams). In this paper, we focus on general metric spaces and mainly on the facility location problem. We give an arguably simple algorithm that maintains a constant factor approximation, with O(n log n) update time, and total recourse O(n). This improves over the naive algorithm which consists in recomputing a solution at each time step and that can take up to O(n^2) update time, and O(n^2) total recourse. These bounds are nearly optimal: in general metric space, inserting a point take O(n) times to describe the distances to other points, and we give a simple lower bound of O(n) for the recourse. Moreover, we generalize this result for the k-medians and k-means problems: our algorithm maintains a constant factor approximation in time O˜(n+k^2). We complement our analysis with experiments showing that the cost of the solution maintained by our algorithm at any time t is very close to the cost of a solution obtained by quickly recomputing a solution from scratch at time t while having a much better running time

    Fully Dynamic Consistent Facility Location

    Get PDF
    We consider classic clustering problems in fully dynamic data streams, where data elements can be both inserted and deleted. In this context, several parameters are of importance: (1) the quality of the solution after each insertion or deletion, (2) the time it takes to update the solution, and (3) how different consecutive solutions are. The question of obtaining efficient algorithms in this context for facility location, k-median and k-means has been raised in a recent paper by Hubert-Chan et al. [WWW'18] and also appears as a natural follow-up on the online model with recourse studied by Lattanzi and Vassilvitskii [ICML'17] (i.e.: in insertion-only streams). In this paper, we focus on general metric spaces and mainly on the facility location problem. We give an arguably simple algorithm that maintains a constant factor approximation, with O(n log n) update time, and total recourse O(n). This improves over the naive algorithm which consists in recomputing a solution at each time step and that can take up to O(n^2) update time, and O(n^2) total recourse. These bounds are nearly optimal: in general metric space, inserting a point take O(n) times to describe the distances to other points, and we give a simple lower bound of O(n) for the recourse. Moreover, we generalize this result for the k-medians and k-means problems: our algorithm maintains a constant factor approximation in time O˜(n+k^2). We complement our analysis with experiments showing that the cost of the solution maintained by our algorithm at any time t is very close to the cost of a solution obtained by quickly recomputing a solution from scratch at time t while having a much better running time

    UDDSketch: Accurate Tracking of Quantiles in Data Streams

    Get PDF
    none5noopenI. Epicoco, C. Melle, M. Cafaro, M. Pulimeno, G. MorleoEpicoco, I.; Melle, C.; Cafaro, M.; Pulimeno, M.; Morleo, G

    Streaming Facility Location in High Dimension via New Geometric Hashing

    Full text link
    In Euclidean Uniform Facility Location, the input is a set of clients in Rd\mathbb{R}^d and the goal is to place facilities to serve them, so as to minimize the total cost of opening facilities plus connecting the clients. We study the classical setting of dynamic geometric streams, where the clients are presented as a sequence of insertions and deletions of points in the grid {1,,Δ}d\{1,\ldots,\Delta\}^d, and we focus on the high-dimensional regime, where the algorithm's space complexity must be polynomial (and certainly not exponential) in dlogΔd\cdot\log\Delta. We present a new algorithmic framework, based on importance sampling from the stream, for O(1)O(1)-approximation of the optimal cost using only poly(dlogΔ)\mathrm{poly}(d\cdot\log\Delta) space. This framework is easy to implement in two passes, one for sampling points and the other for estimating their contribution. Over random-order streams, we can extend this to a one-pass algorithm by using the two halves of the stream separately. Our main result, for arbitrary-order streams, computes O(d1.5)O(d^{1.5})-approximation in one pass by using the new framework but combining the two passes differently. This improves upon previous algorithms that either need space exponential in dd or only guarantee O(dlog2Δ)O(d\cdot\log^2\Delta)-approximation, and therefore our algorithms for high-dimensional streams are the first to avoid the O(logΔ)O(\log\Delta)-factor in approximation that is inherent to the widely-used quadtree decomposition. Our improvement is achieved by introducing a novel geometric hashing scheme that maps points in Rd\mathbb{R}^d into buckets of bounded diameter, with the key property that every point set of small-enough diameter is hashed into at most poly(d)\mathrm{poly}(d) distinct buckets. Finally, we complement our results by showing 1.0851.085-approximation requires space exponential in poly(dlogΔ)\mathrm{poly}(d\cdot\log\Delta), even for insertion-only streams.Comment: The abstract is shortened to meet the length constraint of arXi
    corecore