Search CORE

8 research outputs found

A Combinatorial Algorithm for All-Pairs Shortest Paths in Directed Vertex-Weighted Graphs with Applications to Disc Graphs

Author: Lingas Andrzej
Sledneu Dzmitry
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/11/2011
Field of study

We consider the problem of computing all-pairs shortest paths in a directed graph with real weights assigned to vertices. For an

n\times n

0-1 matrix

C,

let

K_{C}

be the complete weighted graph on the rows of

C

where the weight of an edge between two rows is equal to their Hamming distance. Let

MWT(C)

be the weight of a minimum weight spanning tree of

K_{C}.

We show that the all-pairs shortest path problem for a directed graph

G

n

vertices with nonnegative real weights and adjacency matrix

A_G

can be solved by a combinatorial randomized algorithm in time

\widetilde{O}(n^{2}\sqrt {n + \min\{MWT(A_G), MWT(A_G^t)\}})

As a corollary, we conclude that the transitive closure of a directed graph

G

can be computed by a combinatorial randomized algorithm in the aforementioned time.

\widetilde{O}(n^{2}\sqrt {n + \min\{MWT(A_G), MWT(A_G^t)\}})

We also conclude that the all-pairs shortest path problem for uniform disk graphs, with nonnegative real vertex weights, induced by point sets of bounded density within a unit square can be solved in time

\widetilde{O}(n^{2.75})

arXiv.org e-Print Archive

Lund University Publications

Squarepants in a Tree: Sum of Subtree Clustering and Hyperbolic Pants Decomposition

Author: Alstrup S.
Aluru S.
Bern M. W.
David Eppstein
Erickson J.
Saitou N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/02/2008
Field of study

We provide efficient constant factor approximation algorithms for the problems of finding a hierarchical clustering of a point set in any metric space, minimizing the sum of minimimum spanning tree lengths within each cluster, and in the hyperbolic or Euclidean planes, minimizing the sum of cluster perimeters. Our algorithms for the hyperbolic and Euclidean planes can also be used to provide a pants decomposition, that is, a set of disjoint simple closed curves partitioning the plane minus the input points into subsets with exactly three boundary components, with approximately minimum total length. In the Euclidean case, these curves are squares; in the hyperbolic case, they combine our Euclidean square pants decomposition with our tree clustering method for general metric spaces.Comment: 22 pages, 14 figures. This version replaces the proof of what is now Lemma 5.2, as the previous proof was erroneou

arXiv.org e-Print Archive

Crossref

Distributed PCP Theorems for Hardness of Approximation in P

Author: Abboud Amir
Rubinstein Aviad
Williams Ryan
Publication venue
Publication date: 01/01/1952
Field of study

We present a new distributed model of probabilistically checkable proofs (PCP). A satisfying assignment

x \in \{0,1\}^n

to a CNF formula

\varphi

is shared between two parties, where Alice knows

x_1, \dots, x_{n/2}

, Bob knows

x_{n/2+1},\dots,x_n

, and both parties know

\varphi

. The goal is to have Alice and Bob jointly write a PCP that

x

satisfies

\varphi

, while exchanging little or no information. Unfortunately, this model as-is does not allow for nontrivial query complexity. Instead, we focus on a non-deterministic variant, where the players are helped by Merlin, a third party who knows all of

x

. Using our framework, we obtain, for the first time, PCP-like reductions from the Strong Exponential Time Hypothesis (SETH) to approximation problems in P. In particular, under SETH we show that there are no truly-subquadratic approximation algorithms for Bichromatic Maximum Inner Product over {0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate Regular Expression Matching, and Diameter in Product Metric. All our inapproximability factors are nearly-tight. In particular, for the first two problems we obtain nearly-polynomial factors of

2^{(\log n)^{1-o(1)}}

; only

(1+o(1))

-factor lower bounds (under SETH) were known before

arXiv.org e-Print Archive

Biblioteca Virtual del Patrimonio Bibliográfico (Virtual Library of Bibliographical Heritage)

Crossref

Subquadratic High-Dimensional Hierarchical Clustering

Author: Abboud Amir
Cohen-Addad Vincent
Houdrougé Hussein
Publication venue: HAL CCSD
Publication date: 08/12/2019
Field of study

International audienceWe consider the widely-used average-linkage, single-linkage, and Ward's methods for computing hierarchical clusterings of high-dimensional Euclidean inputs. It is easy to show that there is no efficient implementation of these algorithms in high dimensional Euclidean space since it implicitly requires to solve the closest pair problem, a notoriously difficult problem. However, how fast can these algorithms be implemented if we allow approxima-tion? More precisely: these algorithms successively merge the clusters that are at closest average (for average-linkage), minimum distance (for single-linkage), or inducing the least sum-of-square error (for Ward's). We ask whether one could obtain a significant running-time improvement if the algorithm can merge γ-approximate closest clusters (namely, clusters that are at distance (average, minimum , or sum-of-square error) at most γ times the distance of the closest clusters). We show that one can indeed take advantage of the relaxation and compute the approximate hierarchical clustering tree using r Opnq γ-approximate nearest neighbor queries. This leads to an algorithm running in time r Opndq`n 1`Op1{γq for d-dimensional Euclidean space. We then provide experiments showing that these algorithms perform as well as the non-approximate version for classic classification tasks while achieving a significant speed-up

Scribe: A Clustering Approach To Semantic Information Retrieval

Author: Langley Joseph R
Publication venue: Scholars Junction
Publication date: 05/08/2006
Field of study

Information retrieval is the process of fulfilling a user?s need for information by locating items in a data collection that are similar to a complex query that is often posed in natural language. Latent Semantic Indexing (LSI) was the predominant technique employed at the National Institute of Standards and Technology?s Text Retrieval Conference for many years until limitations of its scalability to large data sets were discovered. This thesis describes SCRIBE, a modification of LSI with improved scalability. SCRIBE clusters its semantic index into discrete volumes described by high-dimensional extensions to computer graphics data structures. SCRIBE?s clustering strategy limits the number of items that must be searched and provides for sub-linear time complexity in the number of documents. Experimental results with a large, natural language document collection demonstrate that SCRIBE achieves retrieval accuracy similar to LSI but requires 1/10 the time

Scholars Junction - Mississippi State University Institutional Repository

Subquadratic Approximation Algorithms For Clustering Problems in High Dimensional Spaces

Author: Allan Borodin
Rafail Ostrovsky
Yuval Rabani
Publication venue
Publication date
Field of study

One of the central problems in information retrieval, data mining, computational biology, statistical analysis, computer vision, geographic analysis, pattern recognition, distributed protocols is the question of classification of data according to some clustering rule. Often the data is noisy and even approximate classification is of extreme importance. The difficulty of such classification stems from the fact that usually the data has many incomparable attributes, and often results in the question of clustering problems in high dimensional spaces. Since they require measuring distance between every pair of data points, standard algorithms for computing the exact clustering solutions use quadratic or "nearly quadratic" running time; i.e., O(dn 2\Gammaff(d) ) time where n is the number of data points, d is the dimen- Computer Science Department, University of Toronto. Part of this work was done while visiting Bell Communications Research. y Bell Communications Research, MCC-1C365..

CiteSeerX

Subquadratic Approximation Algorithms For Clustering Problems in High Dimensional Spaces

Author: Allan Borodin
Rafail Ostrovsky
Yuval Rabani
Publication venue
Publication date
Field of study

One of the central problems in information retrieval, data mining, computational biology, statistical analy-sis, computer vision, geographic analysis, pattern recognition, distributed protocols is the question of classification of data according to some clustering rule. Of-ten the data is noisy and even approximate classification is of extreme importance. The difficulty of such classi cation stems from the fact that usually the data has many incomparable attributes, and often results in the question of clustering problems in high dimensional spaces. Since they require measuring distance between every pair of data points, standard algorithms for computing the exact clustering solutions use quadratic or "nearly quadratic" running time; i.e., O(dn 2, (d) ) time where n is the number of data points, d is the dimen

CiteSeerX