247 research outputs found

    Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs

    Full text link
    We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based on a technique previously used by the author for Euclidean closest pairs, we show how to insert and delete objects from an n-object set, maintaining the closest pair, in O(n log^2 n) time per update and O(n) space. With quadratic space, we can instead use a quadtree-like structure to achieve an optimal time bound, O(n) per update. We apply these data structures to hierarchical clustering, greedy matching, and TSP heuristics, and discuss other potential applications in machine learning, Groebner bases, and local improvement algorithms for partition and placement problems. Experiments show our new methods to be faster in practice than previously used heuristics.Comment: 20 pages, 9 figures. A preliminary version of this paper appeared at the 9th ACM-SIAM Symp. on Discrete Algorithms, San Francisco, 1998, pp. 619-628. For source code and experimental results, see http://www.ics.uci.edu/~eppstein/projects/pairs

    Probability and Problems in Euclidean Combinatorial Optimization

    Get PDF
    This article summarizes the current status of several streams of research that deal with the probability theory of problems of combinatorial optimization. There is a particular emphasis on functionals of finite point sets. The most famous example of such functionals is the length associated with the Euclidean traveling salesman problem (TSP), but closely related problems include the minimal spanning tree problem, minimal matching problems and others. Progress is also surveyed on (1) the approximation and determination of constants whose existence is known by subadditive methods, (2) the central limit problems for several functionals closely related to Euclidean functionals, and (3) analogies in the asymptotic behavior between worst-case and expected-case behavior of Euclidean problems. No attempt has been made in this survey to cover the many important applications of probability to linear programming, arrangement searching or other problems that focus on lines or planes

    Asymptotically Optimal Algorithms for Pickup and Delivery Problems with Application to Large-Scale Transportation Systems

    Full text link
    The Stacker Crane Problem is NP-Hard and the best known approximation algorithm only provides a 9/5 approximation ratio. The objective of this paper is threefold. First, by embedding the problem within a stochastic framework, we present a novel algorithm for the SCP that: (i) is asymptotically optimal, i.e., it produces, almost surely, a solution approaching the optimal one as the number of pickups/deliveries goes to infinity; and (ii) has computational complexity O(n^{2+\eps}), where nn is the number of pickup/delivery pairs and \eps is an arbitrarily small positive constant. Second, we asymptotically characterize the length of the optimal SCP tour. Finally, we study a dynamic version of the SCP, whereby pickup and delivery requests arrive according to a Poisson process, and which serves as a model for large-scale demand-responsive transport (DRT) systems. For such a dynamic counterpart of the SCP, we derive a necessary and sufficient condition for the existence of stable vehicle routing policies, which depends only on the workspace geometry, the stochastic distributions of pickup and delivery points, the arrival rate of requests, and the number of vehicles. Our results leverage a novel connection between the Euclidean Bipartite Matching Problem and the theory of random permutations, and, for the dynamic setting, exhibit novel features that are absent in traditional spatially-distributed queueing systems.Comment: 27 pages, plus Appendix, 7 figures, extended version of paper being submitted to IEEE Transactions of Automatic Contro

    From Balls and Bins to Points and Vertices

    Get PDF
    Given a graph G = (V, E) with |V| = n, we consider the following problem. Place m = n points on the vertices of G independently and uniformly at random. Once the points are placed, relocate them using a bijection from the points to the vertices that minimizes the maximum distance between the random place of the points and their target vertices. We look for an upper bound on this maximum relocation distance that holds with high probability (over the initial placements of the points). For general graphs and in the case m ≤ n, we prove the #P -hardness of the problem and that the maximum relocation distance is O(√n) with high probability. We present a Fully Polynomial Randomized Approximation Scheme when the input graph admits a polynomial-size family of witness cuts while for trees we provide a 2-approximation algorithm. Many applications concern the variation in which m = (1 − ǫ)n for some 0 < ǫ < 1. We provide several bounds for the maximum relocation distance according to different graph topologies

    Doctor of Philosophy

    Get PDF
    dissertationKernel smoothing provides a simple way of finding structures in data sets without the imposition of a parametric model, for example, nonparametric regression and density estimates. However, in many data-intensive applications, the data set could be large. Thus, evaluating a kernel density estimate or kernel regression over the data set directly can be prohibitively expensive in big data. This dissertation is working on how to efficiently find a smaller data set that can approximate the original data set with a theoretical guarantee in the kernel smoothing setting and how to extend it to more general smooth range spaces. For kernel density estimates, we propose randomized and deterministic algorithms with quality guarantees that are orders of magnitude more efficient than previous algorithms, which do not require knowledge of the kernel or its bandwidth parameter and are easily parallelizable. Our algorithms are applicable to any large-scale data processing framework. We then further investigate how to measure the error between two kernel density estimates, which is usually measured either in L1 or L2 error. In this dissertation, we investigate the challenges in using a stronger error, L ∞ (or worst case) error. We present efficient solutions for how to estimate the L∞ error and how to choose the bandwidth parameter for a kernel density estimate built on a subsample of a large data set. We next extend smoothed versions of geometric range spaces from kernel range spaces to more general types of ranges, so that an element of the ground set can be contained in a range with a non-binary value in [0,1]. We investigate the approximation of these range spaces through ϵ-nets and ϵ-samples. Finally, we study coresets algorithms for kernel regression. The size of the coresets are independent of the size of the data set, rather they only depend on the error guarantee, and in some cases the size of domain and amount of smoothing. We evaluate our methods on very large time series and spatial data, demonstrate that they can be constructed extremely efficiently, and allow for great computational gains

    Equidistribution in All Dimensions of Worst-Case Point Sets for the TSP

    Get PDF
    Given a set S of n points in the unit square [0, 1]d , an optimal traveling salesman tour of S is a tour of S that is of minimum length. A worst-case point set for the Traveling Salesman Problem in the unit square is a point set S(n) whose optimal traveling salesman tour achieves the maximum possible length among all point sets S ⊂ [0, 1]d , where |S| = n. An open problem is to determine the structure of S(n) . We show that for any rectangular parallelepiped R contained in [0, 1]d , the number of points in S(n) ∩ R is asymptotic to n times the volume of R. Analogous results are proved for the minimum spanning tree, minimum-weight matching, and rectilinear Steiner minimum tree. These equidistribution theorems are the first results concerning the structure of worst-case point sets like S(n)

    A powerful heuristic for telephone gossiping

    Get PDF
    A refined heuristic for computing schedules for gossiping in the telephone model is presented. The heuristic is fast: for a network with n nodes and m edges, requiring R rounds for gossiping, the running time is O(R n log(n) m) for all tested classes of graphs. This moderate time consumption allows to compute gossiping schedules for networks with more than 10,000 PUs and 100,000 connections. The heuristic is good: in practice the computed schedules never exceed the optimum by more than a few rounds. The heuristic is versatile: it can also be used for broadcasting and more general information dispersion patterns. It can handle both the unit-cost and the linear-cost model. Actually, the heuristic is so good, that for CCC, shuffle-exchange, butterfly de Bruijn, star and pancake networks the constructed gossiping schedules are better than the best theoretically derived ones. For example, for gossiping on a shuffle-exchange network with 2^{13} PUs, the former upper bound was 49 rounds, while our heuristic finds a schedule requiring 31 rounds. Also for broadcasting the heuristic improves on many formerly known results. A second heuristic, works even better for CCC, butterfly, star and pancake networks. For example, with this heuristic we found that gossiping on a pancake network with 7! PUs can be performed in 15 rounds, 2 fewer than achieved by the best theoretical construction. This second heuristic is less versatile than the first, but by refined search techniques it can tackle even larger problems, the main limitation being the storage capacity. Another advantage is that the constructed schedules can be represented concisely

    Convex Combinatorial Optimization

    Full text link
    We introduce the convex combinatorial optimization problem, a far reaching generalization of the standard linear combinatorial optimization problem. We show that it is strongly polynomial time solvable over any edge-guaranteed family, and discuss several applications
    • …