337 research outputs found
Max-sum diversity via convex programming
Diversity maximization is an important concept in information retrieval,
computational geometry and operations research. Usually, it is a variant of the
following problem: Given a ground set, constraints, and a function
that measures diversity of a subset, the task is to select a feasible subset
such that is maximized. The \emph{sum-dispersion} function , which is the sum of the pairwise distances in , is
in this context a prominent diversification measure. The corresponding
diversity maximization is the \emph{max-sum} or \emph{sum-sum diversification}.
Many recent results deal with the design of constant-factor approximation
algorithms of diversification problems involving sum-dispersion function under
a matroid constraint. In this paper, we present a PTAS for the max-sum
diversification problem under a matroid constraint for distances
of \emph{negative type}. Distances of negative type are, for
example, metric distances stemming from the and norm, as well
as the cosine or spherical, or Jaccard distance which are popular similarity
metrics in web and image search
Sublinear quasiconformality and the large-scale geometry of Heintze groups
This article analyzes sublinearly quasisymmetric homeo-morphisms (generalized
quasisymmetric mappings), and draws applications to the sublinear large-scale
geometry of negatively curved groups and spaces. It is proven that those
homeomorphisms lack analytical properties but preserve a conformal dimension
and appropriate function spaces, distinguishing certain (nonsymmetric)
Riemannian negatively curved homogeneous spaces, and Fuchsian buildings, up to
sublinearly biLipschitz equivalence (generalized quasiisometry).Comment: v1->v2: shortened, revised. Lemma 2.3 and definition of Cdim
corrected. Proof of main theorem simplified. Figure 4 adde
Contributions on metric spaces with applications in personalized medicine
Esta tesis tiene como objetivo proponer nuevas
representaciones distribucionales y métodos estadÃsticos en
espacios métricos para modelar de forma eficaz los datos
procedentes de la monitorización continua de los pacientes
durante las actividades propias de su vida diaria. Proponemos
nuevas pruebas de hipótesis para datos emparejados, modelos
de regresión, algoritmos de cuantificación de la incertidumbre,
pruebas de independencia estadÃstica y algoritmos de análisis
de conglomerados para las nuevas representaciones
distribucionales y otros objetos estadÃsticos complejos. Los
diferentes resultados recogidos a lo largo de la tesis muestran
las ventajas en términos de predicción, interpretabilidad y
capacidad de modelización de las nuevas propuestas frente a
los metodos existentes
Distance Measures for Embedded Graphs
We introduce new distance measures for comparing straight-line embedded
graphs based on the Fr\'echet distance and the weak Fr\'echet distance. These
graph distances are defined using continuous mappings and thus take the
combinatorial structure as well as the geometric embeddings of the graphs into
account. We present a general algorithmic approach for computing these graph
distances. Although we show that deciding the distances is NP-hard for general
embedded graphs, we prove that our approach yields polynomial time algorithms
if the graphs are trees, and for the distance based on the weak Fr\'echet
distance if the graphs are planar embedded. Moreover, we prove that deciding
the distances based on the Fr\'echet distance remains NP-hard for planar
embedded graphs and show how our general algorithmic approach yields an
exponential time algorithm and a polynomial time approximation algorithm for
this case.Comment: 27 pages, 14 Figure
Geometry-Aware Adaptation for Pretrained Models
Machine learning models -- including prominent zero-shot models -- are often
trained on datasets whose labels are only a small proportion of a larger label
space. Such spaces are commonly equipped with a metric that relates the labels
via distances between them. We propose a simple approach to exploit this
information to adapt the trained model to reliably predict new classes -- or,
in the case of zero-shot prediction, to improve its performance -- without any
additional training. Our technique is a drop-in replacement of the standard
prediction rule, swapping argmax with the Fr\'echet mean. We provide a
comprehensive theoretical analysis for this approach, studying (i)
learning-theoretic results trading off label space diameter, sample complexity,
and model dimension, (ii) characterizations of the full range of scenarios in
which it is possible to predict any unobserved class, and (iii) an optimal
active learning-like next class selection procedure to obtain optimal training
classes for when it is not possible to predict the entire range of unobserved
classes. Empirically, using easily-available external metrics, our proposed
approach, Loki, gains up to 29.7% relative improvement over SimCLR on ImageNet
and scales to hundreds of thousands of classes. When no such metric is
available, Loki can use self-derived metrics from class embeddings and obtains
a 10.5% improvement on pretrained zero-shot models such as CLIP
F?D: On understanding the role of deep feature spaces on face generation evaluation
Perceptual metrics, like the Fr\'echet Inception Distance (FID), are widely
used to assess the similarity between synthetically generated and ground truth
(real) images. The key idea behind these metrics is to compute errors in a deep
feature space that captures perceptually and semantically rich image features.
Despite their popularity, the effect that different deep features and their
design choices have on a perceptual metric has not been well studied. In this
work, we perform a causal analysis linking differences in semantic attributes
and distortions between face image distributions to Fr\'echet distances (FD)
using several popular deep feature spaces. A key component of our analysis is
the creation of synthetic counterfactual faces using deep face generators. Our
experiments show that the FD is heavily influenced by its feature space's
training dataset and objective function. For example, FD using features
extracted from ImageNet-trained models heavily emphasize hats over regions like
the eyes and mouth. Moreover, FD using features from a face gender classifier
emphasize hair length more than distances in an identity (recognition) feature
space. Finally, we evaluate several popular face generation models across
feature spaces and find that StyleGAN2 consistently ranks higher than other
face generators, except with respect to identity (recognition) features. This
suggests the need for considering multiple feature spaces when evaluating
generative models and using feature spaces that are tuned to nuances of the
domain of interest.Comment: Code and dataset to be released soo
Approximating Sparsest Cut in Low Rank Graphs via Embeddings from Approximately Low Dimensional Spaces
We consider the problem of embedding a finite set of points x_1, ...x_n in R^d that satisfy l_2^2 triangle inequalities into l_1, when the points are approximately low-dimensional. Goemans (unpublished, appears in a work of Magen and Moharammi (2008) ) showed that such points residing in exactly d dimensions can be embedded into l_1 with distortion at most sqrt{d}. We prove the following robust analogue of this statement: if there exists a r-dimensional subspace Pi such that the projections onto this subspace satisfy sum_{i,j in [n]} norm{Pi x_i - Pi x_j}_2^2 >= Omega(1) * sum_{i,j in [n]} norm{x_i - x_j}_2^2, then there is an embedding of the points into l_1 with O(sqrt{r}) average distortion. A consequence of this result is that the integrality gap of the well-known Goemans-Linial SDP relaxation for the Uniform Sparsest Cut problem is O(sqrt{r}) on graphs G whose r-th smallest normalized eigenvalue of the Laplacian satisfies lambda_r(G)/n >= Omega(1)*Phi_{SDP}(G). Our result improves upon the previously known bound of O(r) on the average distortion, and the integrality gap of the Goemans-Linial SDP under the same preconditions, proven in [Deshpande and Venkat, 2014], and [Deshpande, Harsha and Venkat 2016]
- …