337 research outputs found

    Max-sum diversity via convex programming

    Get PDF
    Diversity maximization is an important concept in information retrieval, computational geometry and operations research. Usually, it is a variant of the following problem: Given a ground set, constraints, and a function f(⋅)f(\cdot) that measures diversity of a subset, the task is to select a feasible subset SS such that f(S)f(S) is maximized. The \emph{sum-dispersion} function f(S)=∑x,y∈Sd(x,y)f(S) = \sum_{x,y \in S} d(x,y), which is the sum of the pairwise distances in SS, is in this context a prominent diversification measure. The corresponding diversity maximization is the \emph{max-sum} or \emph{sum-sum diversification}. Many recent results deal with the design of constant-factor approximation algorithms of diversification problems involving sum-dispersion function under a matroid constraint. In this paper, we present a PTAS for the max-sum diversification problem under a matroid constraint for distances d(⋅,⋅)d(\cdot,\cdot) of \emph{negative type}. Distances of negative type are, for example, metric distances stemming from the ℓ2\ell_2 and ℓ1\ell_1 norm, as well as the cosine or spherical, or Jaccard distance which are popular similarity metrics in web and image search

    Sublinear quasiconformality and the large-scale geometry of Heintze groups

    Get PDF
    This article analyzes sublinearly quasisymmetric homeo-morphisms (generalized quasisymmetric mappings), and draws applications to the sublinear large-scale geometry of negatively curved groups and spaces. It is proven that those homeomorphisms lack analytical properties but preserve a conformal dimension and appropriate function spaces, distinguishing certain (nonsymmetric) Riemannian negatively curved homogeneous spaces, and Fuchsian buildings, up to sublinearly biLipschitz equivalence (generalized quasiisometry).Comment: v1->v2: shortened, revised. Lemma 2.3 and definition of Cdim corrected. Proof of main theorem simplified. Figure 4 adde

    Contributions on metric spaces with applications in personalized medicine

    Get PDF
    Esta tesis tiene como objetivo proponer nuevas representaciones distribucionales y métodos estadísticos en espacios métricos para modelar de forma eficaz los datos procedentes de la monitorización continua de los pacientes durante las actividades propias de su vida diaria. Proponemos nuevas pruebas de hipótesis para datos emparejados, modelos de regresión, algoritmos de cuantificación de la incertidumbre, pruebas de independencia estadística y algoritmos de análisis de conglomerados para las nuevas representaciones distribucionales y otros objetos estadísticos complejos. Los diferentes resultados recogidos a lo largo de la tesis muestran las ventajas en términos de predicción, interpretabilidad y capacidad de modelización de las nuevas propuestas frente a los metodos existentes

    Distance Measures for Embedded Graphs

    Get PDF
    We introduce new distance measures for comparing straight-line embedded graphs based on the Fr\'echet distance and the weak Fr\'echet distance. These graph distances are defined using continuous mappings and thus take the combinatorial structure as well as the geometric embeddings of the graphs into account. We present a general algorithmic approach for computing these graph distances. Although we show that deciding the distances is NP-hard for general embedded graphs, we prove that our approach yields polynomial time algorithms if the graphs are trees, and for the distance based on the weak Fr\'echet distance if the graphs are planar embedded. Moreover, we prove that deciding the distances based on the Fr\'echet distance remains NP-hard for planar embedded graphs and show how our general algorithmic approach yields an exponential time algorithm and a polynomial time approximation algorithm for this case.Comment: 27 pages, 14 Figure

    Geometry-Aware Adaptation for Pretrained Models

    Full text link
    Machine learning models -- including prominent zero-shot models -- are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes -- or, in the case of zero-shot prediction, to improve its performance -- without any additional training. Our technique is a drop-in replacement of the standard prediction rule, swapping argmax with the Fr\'echet mean. We provide a comprehensive theoretical analysis for this approach, studying (i) learning-theoretic results trading off label space diameter, sample complexity, and model dimension, (ii) characterizations of the full range of scenarios in which it is possible to predict any unobserved class, and (iii) an optimal active learning-like next class selection procedure to obtain optimal training classes for when it is not possible to predict the entire range of unobserved classes. Empirically, using easily-available external metrics, our proposed approach, Loki, gains up to 29.7% relative improvement over SimCLR on ImageNet and scales to hundreds of thousands of classes. When no such metric is available, Loki can use self-derived metrics from class embeddings and obtains a 10.5% improvement on pretrained zero-shot models such as CLIP

    F?D: On understanding the role of deep feature spaces on face generation evaluation

    Full text link
    Perceptual metrics, like the Fr\'echet Inception Distance (FID), are widely used to assess the similarity between synthetically generated and ground truth (real) images. The key idea behind these metrics is to compute errors in a deep feature space that captures perceptually and semantically rich image features. Despite their popularity, the effect that different deep features and their design choices have on a perceptual metric has not been well studied. In this work, we perform a causal analysis linking differences in semantic attributes and distortions between face image distributions to Fr\'echet distances (FD) using several popular deep feature spaces. A key component of our analysis is the creation of synthetic counterfactual faces using deep face generators. Our experiments show that the FD is heavily influenced by its feature space's training dataset and objective function. For example, FD using features extracted from ImageNet-trained models heavily emphasize hats over regions like the eyes and mouth. Moreover, FD using features from a face gender classifier emphasize hair length more than distances in an identity (recognition) feature space. Finally, we evaluate several popular face generation models across feature spaces and find that StyleGAN2 consistently ranks higher than other face generators, except with respect to identity (recognition) features. This suggests the need for considering multiple feature spaces when evaluating generative models and using feature spaces that are tuned to nuances of the domain of interest.Comment: Code and dataset to be released soo

    Approximating Sparsest Cut in Low Rank Graphs via Embeddings from Approximately Low Dimensional Spaces

    Get PDF
    We consider the problem of embedding a finite set of points x_1, ...x_n in R^d that satisfy l_2^2 triangle inequalities into l_1, when the points are approximately low-dimensional. Goemans (unpublished, appears in a work of Magen and Moharammi (2008) ) showed that such points residing in exactly d dimensions can be embedded into l_1 with distortion at most sqrt{d}. We prove the following robust analogue of this statement: if there exists a r-dimensional subspace Pi such that the projections onto this subspace satisfy sum_{i,j in [n]} norm{Pi x_i - Pi x_j}_2^2 >= Omega(1) * sum_{i,j in [n]} norm{x_i - x_j}_2^2, then there is an embedding of the points into l_1 with O(sqrt{r}) average distortion. A consequence of this result is that the integrality gap of the well-known Goemans-Linial SDP relaxation for the Uniform Sparsest Cut problem is O(sqrt{r}) on graphs G whose r-th smallest normalized eigenvalue of the Laplacian satisfies lambda_r(G)/n >= Omega(1)*Phi_{SDP}(G). Our result improves upon the previously known bound of O(r) on the average distortion, and the integrality gap of the Goemans-Linial SDP under the same preconditions, proven in [Deshpande and Venkat, 2014], and [Deshpande, Harsha and Venkat 2016]
    • …
    corecore