8 research outputs found

    A Combinatorial Algorithm for All-Pairs Shortest Paths in Directed Vertex-Weighted Graphs with Applications to Disc Graphs

    Full text link
    We consider the problem of computing all-pairs shortest paths in a directed graph with real weights assigned to vertices. For an n×nn\times n 0-1 matrix C,C, let KCK_{C} be the complete weighted graph on the rows of CC where the weight of an edge between two rows is equal to their Hamming distance. Let MWT(C)MWT(C) be the weight of a minimum weight spanning tree of KC.K_{C}. We show that the all-pairs shortest path problem for a directed graph GG on nn vertices with nonnegative real weights and adjacency matrix AGA_G can be solved by a combinatorial randomized algorithm in time O~(n2n+min{MWT(AG),MWT(AGt)})\widetilde{O}(n^{2}\sqrt {n + \min\{MWT(A_G), MWT(A_G^t)\}}) As a corollary, we conclude that the transitive closure of a directed graph GG can be computed by a combinatorial randomized algorithm in the aforementioned time. O~(n2n+min{MWT(AG),MWT(AGt)})\widetilde{O}(n^{2}\sqrt {n + \min\{MWT(A_G), MWT(A_G^t)\}}) We also conclude that the all-pairs shortest path problem for uniform disk graphs, with nonnegative real vertex weights, induced by point sets of bounded density within a unit square can be solved in time O~(n2.75)\widetilde{O}(n^{2.75})

    Squarepants in a Tree: Sum of Subtree Clustering and Hyperbolic Pants Decomposition

    Full text link
    We provide efficient constant factor approximation algorithms for the problems of finding a hierarchical clustering of a point set in any metric space, minimizing the sum of minimimum spanning tree lengths within each cluster, and in the hyperbolic or Euclidean planes, minimizing the sum of cluster perimeters. Our algorithms for the hyperbolic and Euclidean planes can also be used to provide a pants decomposition, that is, a set of disjoint simple closed curves partitioning the plane minus the input points into subsets with exactly three boundary components, with approximately minimum total length. In the Euclidean case, these curves are squares; in the hyperbolic case, they combine our Euclidean square pants decomposition with our tree clustering method for general metric spaces.Comment: 22 pages, 14 figures. This version replaces the proof of what is now Lemma 5.2, as the previous proof was erroneou

    Distributed PCP Theorems for Hardness of Approximation in P

    Get PDF
    We present a new distributed model of probabilistically checkable proofs (PCP). A satisfying assignment x{0,1}nx \in \{0,1\}^n to a CNF formula φ\varphi is shared between two parties, where Alice knows x1,,xn/2x_1, \dots, x_{n/2}, Bob knows xn/2+1,,xnx_{n/2+1},\dots,x_n, and both parties know φ\varphi. The goal is to have Alice and Bob jointly write a PCP that xx satisfies φ\varphi, while exchanging little or no information. Unfortunately, this model as-is does not allow for nontrivial query complexity. Instead, we focus on a non-deterministic variant, where the players are helped by Merlin, a third party who knows all of xx. Using our framework, we obtain, for the first time, PCP-like reductions from the Strong Exponential Time Hypothesis (SETH) to approximation problems in P. In particular, under SETH we show that there are no truly-subquadratic approximation algorithms for Bichromatic Maximum Inner Product over {0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate Regular Expression Matching, and Diameter in Product Metric. All our inapproximability factors are nearly-tight. In particular, for the first two problems we obtain nearly-polynomial factors of 2(logn)1o(1)2^{(\log n)^{1-o(1)}}; only (1+o(1))(1+o(1))-factor lower bounds (under SETH) were known before

    Subquadratic High-Dimensional Hierarchical Clustering

    Get PDF
    International audienceWe consider the widely-used average-linkage, single-linkage, and Ward's methods for computing hierarchical clusterings of high-dimensional Euclidean inputs. It is easy to show that there is no efficient implementation of these algorithms in high dimensional Euclidean space since it implicitly requires to solve the closest pair problem, a notoriously difficult problem. However, how fast can these algorithms be implemented if we allow approxima-tion? More precisely: these algorithms successively merge the clusters that are at closest average (for average-linkage), minimum distance (for single-linkage), or inducing the least sum-of-square error (for Ward's). We ask whether one could obtain a significant running-time improvement if the algorithm can merge γ-approximate closest clusters (namely, clusters that are at distance (average, minimum , or sum-of-square error) at most γ times the distance of the closest clusters). We show that one can indeed take advantage of the relaxation and compute the approximate hierarchical clustering tree using r Opnq γ-approximate nearest neighbor queries. This leads to an algorithm running in time r Opndq`n 1`Op1{γq for d-dimensional Euclidean space. We then provide experiments showing that these algorithms perform as well as the non-approximate version for classic classification tasks while achieving a significant speed-up

    Scribe: A Clustering Approach To Semantic Information Retrieval

    Get PDF
    Information retrieval is the process of fulfilling a user?s need for information by locating items in a data collection that are similar to a complex query that is often posed in natural language. Latent Semantic Indexing (LSI) was the predominant technique employed at the National Institute of Standards and Technology?s Text Retrieval Conference for many years until limitations of its scalability to large data sets were discovered. This thesis describes SCRIBE, a modification of LSI with improved scalability. SCRIBE clusters its semantic index into discrete volumes described by high-dimensional extensions to computer graphics data structures. SCRIBE?s clustering strategy limits the number of items that must be searched and provides for sub-linear time complexity in the number of documents. Experimental results with a large, natural language document collection demonstrate that SCRIBE achieves retrieval accuracy similar to LSI but requires 1/10 the time

    Subquadratic Approximation Algorithms For Clustering Problems in High Dimensional Spaces

    No full text
    One of the central problems in information retrieval, data mining, computational biology, statistical analysis, computer vision, geographic analysis, pattern recognition, distributed protocols is the question of classification of data according to some clustering rule. Often the data is noisy and even approximate classification is of extreme importance. The difficulty of such classification stems from the fact that usually the data has many incomparable attributes, and often results in the question of clustering problems in high dimensional spaces. Since they require measuring distance between every pair of data points, standard algorithms for computing the exact clustering solutions use quadratic or "nearly quadratic" running time; i.e., O(dn 2\Gammaff(d) ) time where n is the number of data points, d is the dimen- Computer Science Department, University of Toronto. Part of this work was done while visiting Bell Communications Research. y Bell Communications Research, MCC-1C365..

    Subquadratic Approximation Algorithms For Clustering Problems in High Dimensional Spaces

    No full text
    One of the central problems in information retrieval, data mining, computational biology, statistical analy-sis, computer vision, geographic analysis, pattern recognition, distributed protocols is the question of classification of data according to some clustering rule. Of-ten the data is noisy and even approximate classification is of extreme importance. The difficulty of such classi cation stems from the fact that usually the data has many incomparable attributes, and often results in the question of clustering problems in high dimensional spaces. Since they require measuring distance between every pair of data points, standard algorithms for computing the exact clustering solutions use quadratic or "nearly quadratic" running time; i.e., O(dn 2, (d) ) time where n is the number of data points, d is the dimen
    corecore