19 research outputs found
The Burbea-Rao and Bhattacharyya centroids
We study the centroid with respect to the class of information-theoretic
Burbea-Rao divergences that generalize the celebrated Jensen-Shannon divergence
by measuring the non-negative Jensen difference induced by a strictly convex
and differentiable function. Although those Burbea-Rao divergences are
symmetric by construction, they are not metric since they fail to satisfy the
triangle inequality. We first explain how a particular symmetrization of
Bregman divergences called Jensen-Bregman distances yields exactly those
Burbea-Rao divergences. We then proceed by defining skew Burbea-Rao
divergences, and show that skew Burbea-Rao divergences amount in limit cases to
compute Bregman divergences. We then prove that Burbea-Rao centroids are
unique, and can be arbitrarily finely approximated by a generic iterative
concave-convex optimization algorithm with guaranteed convergence property. In
the second part of the paper, we consider the Bhattacharyya distance that is
commonly used to measure overlapping degree of probability distributions. We
show that Bhattacharyya distances on members of the same statistical
exponential family amount to calculate a Burbea-Rao divergence in disguise.
Thus we get an efficient algorithm for computing the Bhattacharyya centroid of
a set of parametric distributions belonging to the same exponential families,
improving over former specialized methods found in the literature that were
limited to univariate or "diagonal" multivariate Gaussians. To illustrate the
performance of our Bhattacharyya/Burbea-Rao centroid algorithm, we present
experimental performance results for -means and hierarchical clustering
methods of Gaussian mixture models.Comment: 13 page
Quantifying the Informativeness of Similarity Measurements
In this paper, we describe an unsupervised measure for quantifying the 'informativeness' of correlation matrices formed from the pairwise similarities or relationships among data instances. The measure quantifies the heterogeneity of the correlations and is defined as the distance between a correlation matrix and the nearest correlation matrix with constant off-diagonal entries. This non-parametric notion generalizes existing test statistics for equality of correlation coefficients by allowing for alternative distance metrics, such as the Bures and other distances from quantum information theory. For several distance and dissimilarity metrics, we derive closed-form expressions of informativeness, which can be applied as objective functions for machine learning applications. Empirically, we demonstrate that informativeness is a useful criterion for selecting kernel parameters, choosing the dimension for kernel-based nonlinear dimensionality reduction, and identifying structured graphs. We also consider the problem of finding a maximally informative correlation matrix around a target matrix, and explore parameterizing the optimization in terms of the coordinates of the sample or through a lower-dimensional embedding. In the latter case, we find that maximizing the Bures-based informativeness measure, which is maximal for centered rank-1 correlation matrices, is equivalent to minimizing a specific matrix norm, and present an algorithm to solve the minimization problem using the norm's proximal operator. The proposed correlation denoising algorithm consistently improves spectral clustering. Overall, we find informativeness to be a novel and useful criterion for identifying non-trivial correlation structure.
A precise bare simulation approach to the minimization of some distances. Foundations
In information theory -- as well as in the adjacent fields of statistics,
machine learning, artificial intelligence, signal processing and pattern
recognition -- many flexibilizations of the omnipresent Kullback-Leibler
information distance (relative entropy) and of the closely related Shannon
entropy have become frequently used tools. To tackle corresponding constrained
minimization (respectively maximization) problems by a newly developed
dimension-free bare (pure) simulation method, is the main goal of this paper.
Almost no assumptions (like convexity) on the set of constraints are needed,
within our discrete setup of arbitrary dimension, and our method is precise
(i.e., converges in the limit). As a side effect, we also derive an innovative
way of constructing new useful distances/divergences. To illustrate the core of
our approach, we present numerous examples. The potential for widespread
applicability is indicated, too; in particular, we deliver many recent
references for uses of the involved distances/divergences and entropies in
various different research fields (which may also serve as an interdisciplinary
interface)
Stability and Expressiveness of Deep Generative Models
In den letzten Jahren hat Deep Learning sowohl das maschinelle Lernen als auch die maschinelle Bildverarbeitung revolutioniert. Viele klassische Computer Vision-Aufgaben, wie z.B. die Objekterkennung und semantische Segmentierung, die traditionell sehr anspruchsvoll waren, können nun mit Hilfe von überwachten Deep Learning-Techniken gelöst werden. Überwachtes Lernen ist ein mächtiges Werkzeug, wenn annotierte Daten verfügbar sind und die betrachtete Aufgabe eine eindeutige Lösung hat. Diese Bedingungen sind allerdings nicht immer erfüllt. Ein vielversprechender Ansatz ist in diesem Fall die generative Modellierung. Im Gegensatz zu rein diskriminativen Modellen können generative Modelle mit Unsicherheiten umgehen und leistungsfähige Modelle lernen, auch wenn keine annotierten Trainingsdaten verfügbar sind. Obwohl aktuelle Ansätze zur generativen Modellierung vielversprechende Ergebnisse erzielen, beeinträchtigen zwei Aspekte ihre Expressivität: (i) Einige der erfolgreichsten Ansätze zur Modellierung von Bilddaten werden nicht mehr mit Hilfe von Optimierungsalgorithmen trainiert, sondern mit Algorithmen, deren Dynamik bisher nicht gut verstanden wurde. (ii) Generative Modelle sind oft durch den Speicherbedarf der Ausgaberepräsentation begrenzt. In dieser Arbeit gehen wir auf beide Probleme ein: Im ersten Teil der Arbeit stellen wir eine Theorie vor, die es erlaubt, die Trainingsdynamik von Generative Adversarial Networks (GANs), einem der vielversprechendsten Ansätze zur generativen Modellierung, besser zu verstehen. Wir nähern uns dieser Problemstellung, indem wir minimale Beispielprobleme des GAN-Trainings vorstellen, die analytisch verstanden werden können. Anschließend erhöhen wir schrittweise die Komplexität dieser Beispiele. Dadurch gewinnen wir neue Einblicke in die Trainingsdynamik von GANs und leiten neue Regularisierer her, die auch für allgemeine GANs sehr gut funktionieren. Insbesondere ermöglichen unsere neuen Regularisierer erstmals, ein GAN mit einer Auflösung von einem Megapixel zu trainieren, ohne dass wir die Auflösung der Trainingsverteilung schrittweise erhöhen müssen. Im zweiten Teil dieser Arbeit betrachten wir Ausgaberepräsentationen für generative Modelle in 3D und für 3D-Rekonstruktionstechniken. Durch die Einführung von impliziten Repräsentationen sind wir in der Lage, viele Techniken, die in 2D funktionieren, auf den 3D-Bereich auszudehnen ohne ihre Expressivität einzuschränken.In recent years, deep learning has revolutionized both machine learning and computer vision. Many classical computer vision tasks (e.g. object detection and semantic segmentation), which traditionally were very challenging, can now be solved using supervised deep learning techniques. While supervised learning is a powerful tool when labeled data is available and the task under consideration has a well-defined output, these conditions are not always satisfied. One promising approach in this case is given by generative modeling. In contrast to purely discriminative models, generative models can deal with uncertainty and learn powerful models even when labeled training data is not available. However, while current approaches to generative modeling achieve promising results, they suffer from two aspects that limit their expressiveness: (i) some of the most successful approaches to modeling image data are no longer trained using optimization algorithms, but instead employ algorithms whose dynamics are not well understood and (ii) generative models are often limited by the memory requirements of the output representation. We address both problems in this thesis: in the first part we introduce a theory which enables us to better understand the training dynamics of Generative Adversarial Networks (GANs), one of the most promising approaches to generative modeling. We tackle this problem by introducing minimal example problems of GAN training which can be understood analytically. Subsequently, we gradually increase the complexity of these examples. By doing so, we gain new insights into the training dynamics of GANs and derive new regularizers that also work well for general GANs. Our new regularizers enable us - for the first time - to train a GAN at one megapixel resolution without having to gradually increase the resolution of the training distribution. In the second part of this thesis we consider output representations in 3D for generative models and 3D reconstruction techniques. By introducing implicit representations to deep learning, we are able to extend many techniques that work in 2D to the 3D domain without sacrificing their expressiveness
Distribution-Dissimilarities in Machine Learning
Any binary classifier (or score-function) can be used to define a dissimilarity
between two distributions. Many well-known distribution-dissimilarities are
actually classifier-based: total variation, KL- or JS-divergence, Hellinger
distance, etc. And many recent popular generative modeling algorithms compute
or approximate these distribution-dissimilarities by explicitly training a
classifier: e.g. generative adversarial networks (GAN) and their variants.
This thesis introduces and studies such classifier-based
distribution-dissimilarities. After a general introduction, the first part
analyzes the influence of the classifiers' capacity on the dissimilarity's
strength for the special case of maximum mean discrepancies (MMD) and provides
applications. The second part studies applications of classifier-based
distribution-dissimilarities in the context of generative modeling and presents
two new algorithms: Wasserstein Auto-Encoders (WAE) and AdaGAN. The third and
final part focuses on adversarial examples, i.e. targeted but imperceptible
input-perturbations that lead to drastically different predictions of an
artificial classifier. It shows that adversarial vulnerability of neural network
based classifiers typically increases with the input-dimension, independently
of the network topology
Information theoretic refinement criteria for image synthesis
Aquest treball està enmarcat en el context de gràfics per computador partint de la intersecció de tres camps: rendering, teoria de la informació, i complexitat.Inicialment, el concepte de complexitat d'una escena es analitzat considerant tres perspectives des d'un punt de vista de la visibilitat geomètrica: complexitat en un punt interior, complexitat d'una animació, i complexitat d'una regió. L'enfoc principal d'aquesta tesi és l'exploració i desenvolupament de nous criteris de refinament pel problema de la il·luminació global. Mesures de la teoria de la informació basades en la entropia de Shannon i en la entropia generalitzada de Harvda-Charvát-Tsallis, conjuntament amb les f-divergències, són analitzades com a nuclis del refinement. Mostrem com ens aporten una rica varietat d'eficients i altament discriminatòries mesures que són aplicables al rendering en els seus enfocs de pixel-driven (ray-tracing) i object-space (radiositat jeràrquica).Primerament, basat en la entropia de Shannon, es defineixen un conjunt de mesures de qualitat i contrast del pixel. S'apliquen al supersampling en ray-tracing com a criteris de refinement, obtenint un algorisme nou de sampleig adaptatiu basat en entropia, amb un alt rati de qualitat versus cost. En segon lloc, basat en la entropia generalitzada de Harvda-Charvát-Tsallis, i en la informació mutua generalitzada, es defineixen tres nous criteris de refinament per la radiositat jeràrquica. En correspondencia amb tres enfocs clàssics, es presenten els oracles basats en la informació transportada, el suavitzat de la informació, i la informació mutua, amb resultats molt significatius per aquest darrer. Finalment, tres membres de la familia de les f-divergències de Csiszár's (divergències de Kullback-Leibler, chi-square, and Hellinger) son analitzats com a criteris de refinament mostrant bons resultats tant pel ray-tracing com per la radiositat jeràrquica.This work is framed within the context of computer graphics starting out from the intersection of three fields: rendering, information theory, and complexity.Initially, the concept of scene complexity is analysed considering three perspectives from a geometric visibility point of view: complexity at an interior point, complexity of an animation, and complexity of a region. The main focus of this dissertation is the exploration and development of new refinement criteria for the global illumination problem. Information-theoretic measures based on Shannon entropy and Harvda-Charvát-Tsallis generalised entropy, together with f-divergences, are analysed as kernels of refinement. We show how they give us a rich variety of efficient and highly discriminative measures which are applicable to rendering in its pixel-driven (ray-tracing) and object-space (hierarchical radiosity) approaches.Firstly, based on Shannon entropy, a set of pixel quality and pixel contrast measures are defined. They are applied to supersampling in ray-tracing as refinement criteria, obtaining a new entropy-based adaptive sampling algorithm with a high rate quality versus cost. Secondly, based on Harvda-Charvát-Tsallis generalised entropy, and generalised mutual information, three new refinement criteria are defined for hierarchical radiosity. In correspondence with three classic approaches, oracles based on transported information, information smoothness, and mutual information are presented, with very significant results for the latter. And finally, three members of the family of Csiszár's f-divergences (Kullback-Leibler, chi-square, and Hellinger divergences) are analysed as refinement criteria showing good results for both ray-tracing and hierarchical radiosity
Robust Procedures for Estimating and Testing in the Framework of Divergence Measures
The scope of the contributions to this book will be to present new and original research papers based on MPHIE, MHD, and MDPDE, as well as test statistics based on these estimators from a theoretical and applied point of view in different statistical problems with special emphasis on robustness. Manuscripts given solutions to different statistical problems as model selection criteria based on divergence measures or in statistics for high-dimensional data with divergence measures as loss function are considered. Reviews making emphasis in the most recent state-of-the art in relation to the solution of statistical problems base on divergence measures are also presented