2,840 research outputs found
Fully Dynamic Consistent Facility Location
We consider classic clustering problems in fully dynamic data streams, where data elements can be both inserted and deleted. In this context, several parameters are of importance: (1) the quality of the solution after each insertion or deletion, (2) the time it takes to update the solution, and (3) how different consecutive solutions are. The question of obtaining efficient algorithms in this context for facility location, k-median and k-means has been raised in a recent paper by Hubert-Chan et al. [WWW'18] and also appears as a natural follow-up on the online model with recourse studied by Lattanzi and Vassilvitskii [ICML'17] (i.e.: in insertion-only streams). In this paper, we focus on general metric spaces and mainly on the facility location problem. We give an arguably simple algorithm that maintains a constant factor approximation, with O(n log n) update time, and total recourse O(n). This improves over the naive algorithm which consists in recomputing a solution at each time step and that can take up to O(n^2) update time, and O(n^2) total recourse. These bounds are nearly optimal: in general metric space, inserting a point take O(n) times to describe the distances to other points, and we give a simple lower bound of O(n) for the recourse. Moreover, we generalize this result for the k-medians and k-means problems: our algorithm maintains a constant factor approximation in time O˜(n+k^2). We complement our analysis with experiments showing that the cost of the solution maintained by our algorithm at any time t is very close to the cost of a solution obtained by quickly recomputing a solution from scratch at time t while having a much better running time
Fully Dynamic Consistent Facility Location
We consider classic clustering problems in fully dynamic data streams, where data elements can be both inserted and deleted. In this context, several parameters are of importance: (1) the quality of the solution after each insertion or deletion, (2) the time it takes to update the solution, and (3) how different consecutive solutions are. The question of obtaining efficient algorithms in this context for facility location, k-median and k-means has been raised in a recent paper by Hubert-Chan et al. [WWW'18] and also appears as a natural follow-up on the online model with recourse studied by Lattanzi and Vassilvitskii [ICML'17] (i.e.: in insertion-only streams). In this paper, we focus on general metric spaces and mainly on the facility location problem. We give an arguably simple algorithm that maintains a constant factor approximation, with O(n log n) update time, and total recourse O(n). This improves over the naive algorithm which consists in recomputing a solution at each time step and that can take up to O(n^2) update time, and O(n^2) total recourse. These bounds are nearly optimal: in general metric space, inserting a point take O(n) times to describe the distances to other points, and we give a simple lower bound of O(n) for the recourse. Moreover, we generalize this result for the k-medians and k-means problems: our algorithm maintains a constant factor approximation in time O˜(n+k^2). We complement our analysis with experiments showing that the cost of the solution maintained by our algorithm at any time t is very close to the cost of a solution obtained by quickly recomputing a solution from scratch at time t while having a much better running time
UDDSketch: Accurate Tracking of Quantiles in Data Streams
none5noopenI. Epicoco, C. Melle, M. Cafaro, M. Pulimeno, G. MorleoEpicoco, I.; Melle, C.; Cafaro, M.; Pulimeno, M.; Morleo, G
Streaming Facility Location in High Dimension via New Geometric Hashing
In Euclidean Uniform Facility Location, the input is a set of clients in
and the goal is to place facilities to serve them, so as to
minimize the total cost of opening facilities plus connecting the clients. We
study the classical setting of dynamic geometric streams, where the clients are
presented as a sequence of insertions and deletions of points in the grid
, and we focus on the high-dimensional regime, where the
algorithm's space complexity must be polynomial (and certainly not exponential)
in .
We present a new algorithmic framework, based on importance sampling from the
stream, for -approximation of the optimal cost using only
space. This framework is easy to implement in
two passes, one for sampling points and the other for estimating their
contribution. Over random-order streams, we can extend this to a one-pass
algorithm by using the two halves of the stream separately. Our main result,
for arbitrary-order streams, computes -approximation in one pass by
using the new framework but combining the two passes differently. This improves
upon previous algorithms that either need space exponential in or only
guarantee -approximation, and therefore our algorithms
for high-dimensional streams are the first to avoid the -factor
in approximation that is inherent to the widely-used quadtree decomposition.
Our improvement is achieved by introducing a novel geometric hashing scheme
that maps points in into buckets of bounded diameter, with the
key property that every point set of small-enough diameter is hashed into at
most distinct buckets.
Finally, we complement our results by showing -approximation requires
space exponential in , even for insertion-only
streams.Comment: The abstract is shortened to meet the length constraint of arXi
- …