2 research outputs found
Faster Balanced Clusterings in High Dimension
The problem of constrained clustering has attracted significant attention in
the past decades. In this paper, we study the balanced -center, -median,
and -means clustering problems where the size of each cluster is constrained
by the given lower and upper bounds. The problems are motivated by the
applications in processing large-scale data in high dimension. Existing methods
often need to compute complicated matchings (or min cost flows) to satisfy the
balance constraint, and thus suffer from high complexities especially in high
dimension. We develop an effective framework for the three balanced clustering
problems to address this issue, and our method is based on a novel spatial
partition idea in geometry. For the balanced -center clustering, we provide
a -approximation algorithm that improves the existing approximation factors;
for the balanced -median and -means clusterings, our algorithms yield
constant and -approximation factors with any . More
importantly, our algorithms achieve linear or nearly linear running times when
is a constant, and significantly improve the existing ones. Our results can
be easily extended to metric balanced clusterings and the running times are
sub-linear in terms of the complexity of -point metric
Static and Streaming Data Structures for Fr\'echet Distance Queries
Given a curve with points in in a streaming fashion, and
parameters and , we construct a distance oracle that uses
space, and given a query
curve with points in , returns in time a
approximation of the discrete Fr\'echet distance between
and .
In addition, we construct simplifications in the streaming model, oracle for
distance queries to a sub-curve (in the static setting), and introduce the
zoom-in problem. Our algorithms work in any dimension , and therefore we
generalize some useful tools and algorithms for curves under the discrete
Fr\'echet distance to work efficiently in high dimensions