2 research outputs found

    Faster Balanced Clusterings in High Dimension

    Full text link
    The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced kk-center, kk-median, and kk-means clustering problems where the size of each cluster is constrained by the given lower and upper bounds. The problems are motivated by the applications in processing large-scale data in high dimension. Existing methods often need to compute complicated matchings (or min cost flows) to satisfy the balance constraint, and thus suffer from high complexities especially in high dimension. We develop an effective framework for the three balanced clustering problems to address this issue, and our method is based on a novel spatial partition idea in geometry. For the balanced kk-center clustering, we provide a 44-approximation algorithm that improves the existing approximation factors; for the balanced kk-median and kk-means clusterings, our algorithms yield constant and (1+ϵ)(1+\epsilon)-approximation factors with any ϵ>0\epsilon>0. More importantly, our algorithms achieve linear or nearly linear running times when kk is a constant, and significantly improve the existing ones. Our results can be easily extended to metric balanced clusterings and the running times are sub-linear in terms of the complexity of nn-point metric

    Static and Streaming Data Structures for Fr\'echet Distance Queries

    Full text link
    Given a curve PP with points in Rd\mathbb{R}^d in a streaming fashion, and parameters ε>0\varepsilon>0 and kk, we construct a distance oracle that uses O(1ε)kdlogε1O(\frac{1}{\varepsilon})^{kd}\log\varepsilon^{-1} space, and given a query curve QQ with kk points in Rd\mathbb{R}^d, returns in O~(kd)\tilde{O}(kd) time a 1+ε1+\varepsilon approximation of the discrete Fr\'echet distance between QQ and PP. In addition, we construct simplifications in the streaming model, oracle for distance queries to a sub-curve (in the static setting), and introduce the zoom-in problem. Our algorithms work in any dimension dd, and therefore we generalize some useful tools and algorithms for curves under the discrete Fr\'echet distance to work efficiently in high dimensions
    corecore