6,819 research outputs found

    Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs

    Full text link
    We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based on a technique previously used by the author for Euclidean closest pairs, we show how to insert and delete objects from an n-object set, maintaining the closest pair, in O(n log^2 n) time per update and O(n) space. With quadratic space, we can instead use a quadtree-like structure to achieve an optimal time bound, O(n) per update. We apply these data structures to hierarchical clustering, greedy matching, and TSP heuristics, and discuss other potential applications in machine learning, Groebner bases, and local improvement algorithms for partition and placement problems. Experiments show our new methods to be faster in practice than previously used heuristics.Comment: 20 pages, 9 figures. A preliminary version of this paper appeared at the 9th ACM-SIAM Symp. on Discrete Algorithms, San Francisco, 1998, pp. 619-628. For source code and experimental results, see http://www.ics.uci.edu/~eppstein/projects/pairs

    Approximating the Held-Karp Bound for Metric TSP in Nearly Linear Time

    Full text link
    We give a nearly linear time randomized approximation scheme for the Held-Karp bound [Held and Karp, 1970] for metric TSP. Formally, given an undirected edge-weighted graph GG on mm edges and ϵ>0\epsilon > 0, the algorithm outputs in O(mlog4n/ϵ2)O(m \log^4n /\epsilon^2) time, with high probability, a (1+ϵ)(1+\epsilon)-approximation to the Held-Karp bound on the metric TSP instance induced by the shortest path metric on GG. The algorithm can also be used to output a corresponding solution to the Subtour Elimination LP. We substantially improve upon the O(m2log2(m)/ϵ2)O(m^2 \log^2(m)/\epsilon^2) running time achieved previously by Garg and Khandekar. The LP solution can be used to obtain a fast randomized (32+ϵ)\big(\frac{3}{2} + \epsilon\big)-approximation for metric TSP which improves upon the running time of previous implementations of Christofides' algorithm

    Space- and Time-Efficient Algorithm for Maintaining Dense Subgraphs on One-Pass Dynamic Streams

    Get PDF
    While in many graph mining applications it is crucial to handle a stream of updates efficiently in terms of {\em both} time and space, not much was known about achieving such type of algorithm. In this paper we study this issue for a problem which lies at the core of many graph mining applications called {\em densest subgraph problem}. We develop an algorithm that achieves time- and space-efficiency for this problem simultaneously. It is one of the first of its kind for graph problems to the best of our knowledge. In a graph G=(V,E)G = (V, E), the "density" of a subgraph induced by a subset of nodes SVS \subseteq V is defined as E(S)/S|E(S)|/|S|, where E(S)E(S) is the set of edges in EE with both endpoints in SS. In the densest subgraph problem, the goal is to find a subset of nodes that maximizes the density of the corresponding induced subgraph. For any ϵ>0\epsilon>0, we present a dynamic algorithm that, with high probability, maintains a (4+ϵ)(4+\epsilon)-approximation to the densest subgraph problem under a sequence of edge insertions and deletions in a graph with nn nodes. It uses O~(n)\tilde O(n) space, and has an amortized update time of O~(1)\tilde O(1) and a query time of O~(1)\tilde O(1). Here, O~\tilde O hides a O(\poly\log_{1+\epsilon} n) term. The approximation ratio can be improved to (2+ϵ)(2+\epsilon) at the cost of increasing the query time to O~(n)\tilde O(n). It can be extended to a (2+ϵ)(2+\epsilon)-approximation sublinear-time algorithm and a distributed-streaming algorithm. Our algorithm is the first streaming algorithm that can maintain the densest subgraph in {\em one pass}. The previously best algorithm in this setting required O(logn)O(\log n) passes [Bahmani, Kumar and Vassilvitskii, VLDB'12]. The space required by our algorithm is tight up to a polylogarithmic factor.Comment: A preliminary version of this paper appeared in STOC 201

    Data Structures for Halfplane Proximity Queries and Incremental Voronoi Diagrams

    Full text link
    We consider preprocessing a set SS of nn points in convex position in the plane into a data structure supporting queries of the following form: given a point qq and a directed line \ell in the plane, report the point of SS that is farthest from (or, alternatively, nearest to) the point qq among all points to the left of line \ell. We present two data structures for this problem. The first data structure uses O(n1+ε)O(n^{1+\varepsilon}) space and preprocessing time, and answers queries in O(21/εlogn)O(2^{1/\varepsilon} \log n) time, for any 0<ε<10 < \varepsilon < 1. The second data structure uses O(nlog3n)O(n \log^3 n) space and polynomial preprocessing time, and answers queries in O(logn)O(\log n) time. These are the first solutions to the problem with O(logn)O(\log n) query time and o(n2)o(n^2) space. The second data structure uses a new representation of nearest- and farthest-point Voronoi diagrams of points in convex position. This representation supports the insertion of new points in clockwise order using only O(logn)O(\log n) amortized pointer changes, in addition to O(logn)O(\log n)-time point-location queries, even though every such update may make Θ(n)\Theta(n) combinatorial changes to the Voronoi diagram. This data structure is the first demonstration that deterministically and incrementally constructed Voronoi diagrams can be maintained in o(n)o(n) amortized pointer changes per operation while keeping O(logn)O(\log n)-time point-location queries.Comment: 17 pages, 6 figures. Various small improvements. To appear in Algorithmic

    Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications

    Get PDF
    We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A of pairs (a_i,w_i) for i = 1,..,n and w_i>0, a segment A(i,j) is a consecutive subsequence of A starting with index i and ending with index j. The width of A(i,j) is w(i,j) = sum_{i <= k <= j} w_k, and the density is (sum_{i<= k <= j} a_k)/ w(i,j). The maximum-density segment problem takes A and two values L and U as input and asks for a segment of A with the largest possible density among those of width at least L and at most U. When U is unbounded, we provide a relatively simple, O(n)-time algorithm, improving upon the O(n \log L)-time algorithm by Lin, Jiang and Chao. When both L and U are specified, there are no previous nontrivial results. We solve the problem in O(n) time if w_i=1 for all i, and more generally in O(n+n\log(U-L+1)) time when w_i>=1 for all i.Comment: 23 pages, 13 figures. A significant portion of these results appeared under the title, "Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics," in Proceedings of the Second Workshop on Algorithms in Bioinformatics (WABI), volume 2452 of Lecture Notes in Computer Science (Springer-Verlag, Berlin), R. Guigo and D. Gusfield editors, 2002, pp. 157--17
    corecore