849 research outputs found
Asymptotic of geometrical navigation on a random set of points of the plane
A navigation on a set of points is a rule for choosing which point to
move to from the present point in order to progress toward a specified target.
We study some navigations in the plane where is a non uniform Poisson point
process (in a finite domain) with intensity going to . We show the
convergence of the traveller path lengths, the number of stages done, and the
geometry of the traveller trajectories, uniformly for all starting points and
targets, for several navigations of geometric nature. Other costs are also
considered. This leads to asymptotic results on the stretch factors of random
Yao-graphs and random -graphs
On the connectivity of visibility graphs
The visibility graph of a finite set of points in the plane has the points as
vertices and an edge between two vertices if the line segment between them
contains no other points. This paper establishes bounds on the edge- and
vertex-connectivity of visibility graphs.
Unless all its vertices are collinear, a visibility graph has diameter at
most 2, and so it follows by a result of Plesn\'ik (1975) that its
edge-connectivity equals its minimum degree. We strengthen the result of
Plesn\'ik by showing that for any two vertices v and w in a graph of diameter
2, if deg(v) <= deg(w) then there exist deg(v) edge-disjoint vw-paths of length
at most 4. Furthermore, we find that in visibility graphs every minimum edge
cut is the set of edges incident to a vertex of minimum degree.
For vertex-connectivity, we prove that every visibility graph with n vertices
and at most l collinear vertices has connectivity at least (n-1)/(l-1), which
is tight. We also prove the qualitatively stronger result that the
vertex-connectivity is at least half the minimum degree. Finally, in the case
that l=4 we improve this bound to two thirds of the minimum degree.Comment: 16 pages, 8 figure
Objective-Based Hierarchical Clustering of Deep Embedding Vectors
We initiate a comprehensive experimental study of objective-based
hierarchical clustering methods on massive datasets consisting of deep
embedding vectors from computer vision and NLP applications. This includes a
large variety of image embedding (ImageNet, ImageNetV2, NaBirds), word
embedding (Twitter, Wikipedia), and sentence embedding (SST-2) vectors from
several popular recent models (e.g. ResNet, ResNext, Inception V3, SBERT). Our
study includes datasets with up to million entries with embedding
dimensions up to .
In order to address the challenge of scaling up hierarchical clustering to
such large datasets we propose a new practical hierarchical clustering
algorithm B++&C. It gives a 5%/20% improvement on average for the popular
Moseley-Wang (MW) / Cohen-Addad et al. (CKMM) objectives (normalized) compared
to a wide range of classic methods and recent heuristics. We also introduce a
theoretical algorithm B2SAT&C which achieves a -approximation for the
CKMM objective in polynomial time. This is the first substantial improvement
over the trivial -approximation achieved by a random binary tree. Prior to
this work, the best poly-time approximation of was due
to Charikar et al. (SODA'19)
Nonparametric Feature Extraction from Dendrograms
We propose feature extraction from dendrograms in a nonparametric way. The
Minimax distance measures correspond to building a dendrogram with single
linkage criterion, with defining specific forms of a level function and a
distance function over that. Therefore, we extend this method to arbitrary
dendrograms. We develop a generalized framework wherein different distance
measures can be inferred from different types of dendrograms, level functions
and distance functions. Via an appropriate embedding, we compute a vector-based
representation of the inferred distances, in order to enable many numerical
machine learning algorithms to employ such distances. Then, to address the
model selection problem, we study the aggregation of different dendrogram-based
distances respectively in solution space and in representation space in the
spirit of deep representations. In the first approach, for example for the
clustering problem, we build a graph with positive and negative edge weights
according to the consistency of the clustering labels of different objects
among different solutions, in the context of ensemble methods. Then, we use an
efficient variant of correlation clustering to produce the final clusters. In
the second approach, we investigate the sequential combination of different
distances and features sequentially in the spirit of multi-layered
architectures to obtain the final features. Finally, we demonstrate the
effectiveness of our approach via several numerical studies
Comparative genomics: multiple genome rearrangement and efficient algorithm development
Multiple genome rearrangement by signed reversal is discussed: For a collection of genomes represented by signed permutations, reconstruct their evolutionary history by using signed reversals, i.e. find a bifurcating tree where sampled genomes are assigned to leaf nodes and ancestral genomes (i.e. signed permutations) are hypothesized at internal nodes such that the total reversal distance summed over all edges of the tree is minimized. It is equivalent to finding an optimal Steiner tree that connects the given genomes by signed reversal paths. The key for the problem is to reconstruct all optimal Steiner nodes/ancestral genomes.;The problem is NP-hard and can only be solved by efficient approximation algorithms. Various algorithms/programs have been designed to solve the problem, such as BPAnalysis, GRAPPA, grid search algorithm, MGR greedy split algorithm (Chapter 1). However, they may have expensive computational costs or low inference accuracy. In this thesis, several new algorithms are developed, including nearest path search algorithm (Chapter 2), neighbor-perturbing algorithm (Chapter 3), branch-and-bound algorithm (Chapter 3), perturbing-improving algorithm (Chapter 4), partitioning algorithm (Chapter 5), etc. With theoretical proofs, computer simulations, and biological applications, these algorithms are shown to be 2-approximation algorithms and more efficient than the existing algorithms
Light Euclidean Spanners with Steiner Points
The FOCS'19 paper of Le and Solomon, culminating a long line of research on
Euclidean spanners, proves that the lightness (normalized weight) of the greedy
-spanner in is for any
and any (where
hides polylogarithmic factors of ), and also shows the
existence of point sets in for which any -spanner
must have lightness . Given this tight bound on the
lightness, a natural arising question is whether a better lightness bound can
be achieved using Steiner points.
Our first result is a construction of Steiner spanners in with
lightness , where is the spread of the
point set. In the regime of , this provides an
improvement over the lightness bound of Le and Solomon [FOCS 2019]; this regime
of parameters is of practical interest, as point sets arising in real-life
applications (e.g., for various random distributions) have polynomially bounded
spread, while in spanner applications often controls the precision,
and it sometimes needs to be much smaller than . Moreover, for
spread polynomially bounded in , this upper bound provides a
quadratic improvement over the non-Steiner bound of Le and Solomon [FOCS 2019],
We then demonstrate that such a light spanner can be constructed in
time for polynomially bounded spread, where
hides a factor of . Finally, we extend the
construction to higher dimensions, proving a lightness upper bound of
for any and any .Comment: 23 pages, 2 figures, to appear in ESA 2
NN-Steiner: A Mixed Neural-algorithmic Approach for the Rectilinear Steiner Minimum Tree Problem
Recent years have witnessed rapid advances in the use of neural networks to
solve combinatorial optimization problems. Nevertheless, designing the "right"
neural model that can effectively handle a given optimization problem can be
challenging, and often there is no theoretical understanding or justification
of the resulting neural model. In this paper, we focus on the rectilinear
Steiner minimum tree (RSMT) problem, which is of critical importance in IC
layout design and as a result has attracted numerous heuristic approaches in
the VLSI literature. Our contributions are two-fold. On the methodology front,
we propose NN-Steiner, which is a novel mixed neural-algorithmic framework for
computing RSMTs that leverages the celebrated PTAS algorithmic framework of
Arora to solve this problem (and other geometric optimization problems). Our
NN-Steiner replaces key algorithmic components within Arora's PTAS by suitable
neural components. In particular, NN-Steiner only needs four neural network
(NN) components that are called repeatedly within an algorithmic framework.
Crucially, each of the four NN components is only of bounded size independent
of input size, and thus easy to train. Furthermore, as the NN component is
learning a generic algorithmic step, once learned, the resulting mixed
neural-algorithmic framework generalizes to much larger instances not seen in
training. Our NN-Steiner, to our best knowledge, is the first neural
architecture of bounded size that has capacity to approximately solve RSMT (and
variants). On the empirical front, we show how NN-Steiner can be implemented
and demonstrate the effectiveness of our resulting approach, especially in
terms of generalization, by comparing with state-of-the-art methods (both
neural and non-neural based).Comment: This paper is the complete version with appendix of the paper
accepted in AAAI'24 with the same titl
- …