45,519 research outputs found
Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs
We develop data structures for dynamic closest pair problems with arbitrary
distance functions, that do not necessarily come from any geometric structure
on the objects. Based on a technique previously used by the author for
Euclidean closest pairs, we show how to insert and delete objects from an
n-object set, maintaining the closest pair, in O(n log^2 n) time per update and
O(n) space. With quadratic space, we can instead use a quadtree-like structure
to achieve an optimal time bound, O(n) per update. We apply these data
structures to hierarchical clustering, greedy matching, and TSP heuristics, and
discuss other potential applications in machine learning, Groebner bases, and
local improvement algorithms for partition and placement problems. Experiments
show our new methods to be faster in practice than previously used heuristics.Comment: 20 pages, 9 figures. A preliminary version of this paper appeared at
the 9th ACM-SIAM Symp. on Discrete Algorithms, San Francisco, 1998, pp.
619-628. For source code and experimental results, see
http://www.ics.uci.edu/~eppstein/projects/pairs
Computing Similarity between a Pair of Trajectories
With recent advances in sensing and tracking technology, trajectory data is
becoming increasingly pervasive and analysis of trajectory data is becoming
exceedingly important. A fundamental problem in analyzing trajectory data is
that of identifying common patterns between pairs or among groups of
trajectories. In this paper, we consider the problem of identifying similar
portions between a pair of trajectories, each observed as a sequence of points
sampled from it.
We present new measures of trajectory similarity --- both local and global
--- between a pair of trajectories to distinguish between similar and
dissimilar portions. Our model is robust under noise and outliers, it does not
make any assumptions on the sampling rates on either trajectory, and it works
even if they are partially observed. Additionally, the model also yields a
scalar similarity score which can be used to rank multiple pairs of
trajectories according to similarity, e.g. in clustering applications. We also
present efficient algorithms for computing the similarity under our measures;
the worst-case running time is quadratic in the number of sample points.
Finally, we present an extensive experimental study evaluating the
effectiveness of our approach on real datasets, comparing with it with earlier
approaches, and illustrating many issues that arise in trajectory data. Our
experiments show that our approach is highly accurate in distinguishing similar
and dissimilar portions as compared to earlier methods even with sparse
sampling
Incremental and Decremental Maintenance of Planar Width
We present an algorithm for maintaining the width of a planar point set
dynamically, as points are inserted or deleted. Our algorithm takes time
O(kn^epsilon) per update, where k is the amount of change the update causes in
the convex hull, n is the number of points in the set, and epsilon is any
arbitrarily small constant. For incremental or decremental update sequences,
the amortized time per update is O(n^epsilon).Comment: 7 pages; 2 figures. A preliminary version of this paper was presented
at the 10th ACM/SIAM Symp. Discrete Algorithms (SODA '99); this is the
journal version, and will appear in J. Algorithm
Sketch-based Randomized Algorithms for Dynamic Graph Regression
A well-known problem in data science and machine learning is {\em linear
regression}, which is recently extended to dynamic graphs. Existing exact
algorithms for updating the solution of dynamic graph regression problem
require at least a linear time (in terms of : the size of the graph).
However, this time complexity might be intractable in practice. In the current
paper, we utilize {\em subsampled randomized Hadamard transform} and
\textsf{CountSketch} to propose the first randomized algorithms. Suppose that
we are given an matrix embedding of the graph, where .
Let be the number of samples required for a guaranteed approximation error,
which is a sublinear function of . Our first algorithm reduces time
complexity of pre-processing to .
Then after an edge insertion or an edge deletion, it updates the approximate
solution in time. Our second algorithm reduces time complexity of
pre-processing to , where is the number of nonzero elements of . Then after
an edge insertion or an edge deletion or a node insertion or a node deletion,
it updates the approximate solution in time, with
. Finally, we show
that under some assumptions, if our first algorithm
outperforms our second algorithm and if our second
algorithm outperforms our first algorithm
- …