86 research outputs found

    Gradual Domain Adaptation: Theory and Algorithms

    Full text link
    Unsupervised domain adaptation (UDA) adapts a model from a labeled source domain to an unlabeled target domain in a one-off way. Though widely applied, UDA faces a great challenge whenever the distribution shift between the source and the target is large. Gradual domain adaptation (GDA) mitigates this limitation by using intermediate domains to gradually adapt from the source to the target domain. In this work, we first theoretically analyze gradual self-training, a popular GDA algorithm, and provide a significantly improved generalization bound compared with Kumar et al. (2020). Our theoretical analysis leads to an interesting insight: to minimize the generalization error on the target domain, the sequence of intermediate domains should be placed uniformly along the Wasserstein geodesic between the source and target domains. The insight is particularly useful under the situation where intermediate domains are missing or scarce, which is often the case in real-world applications. Based on the insight, we propose G\textbf{G}enerative Gradual DO\textbf{O}main A\textbf{A}daptation with Optimal T\textbf{T}ransport (GOAT), an algorithmic framework that can generate intermediate domains in a data-dependent way. More concretely, we first generate intermediate domains along the Wasserstein geodesic between two given consecutive domains in a feature space, then apply gradual self-training to adapt the source-trained classifier to the target along the sequence of intermediate domains. Empirically, we demonstrate that our GOAT framework can improve the performance of standard GDA when the given intermediate domains are scarce, significantly broadening the real-world application scenarios of GDA. Our code is available at https://github.com/yifei-he/GOAT.Comment: arXiv admin note: substantial text overlap with arXiv:2204.0820

    OT-Net: A Reusable Neural Optimal Transport Solver

    Full text link
    With the widespread application of optimal transport (OT), its calculation becomes essential, and various algorithms have emerged. However, the existing methods either have low efficiency or cannot represent discontinuous maps. A novel reusable neural OT solver OT-Net is thus presented, which first learns Brenier's height representation via the neural network to obtain its potential, and then gained the OT map by computing the gradient of the potential. The algorithm has two merits, 1) it can easily represent discontinuous maps, which allows it to match any target distribution with discontinuous supports and achieve sharp boundaries. This can well eliminate mode collapse in the generated models. 2) The OT map can be calculated straightly by the proposed algorithm when new target samples are added, which greatly improves the efficiency and reusability of the map. Moreover, the theoretical error bound of the algorithm is analyzed, and we have demonstrated the empirical success of our approach in image generation, color transfer, and domain adaptation

    Deep generative models via explicit Wasserstein minimization

    Get PDF
    This thesis provides a procedure to fit generative networks to target distributions, with the goal of a small Wasserstein distance (or other optimal transport costs). The approach is based on two principles: (a) if the source randomness of the network is a continuous distribution (the “semi-discrete” setting), then the Wasserstein distance is realized by a deterministic optimal transport mapping; (b) given an optimal transport mapping between a generator network and a target distribution, the Wasserstein distance may be decreased via a regression between the generated data and the mapped target points. The procedure here therefore alternates these two steps, forming an optimal transport and regressing against it, gradually adjusting the generator network towards the target distribution. Mathematically, this approach is shown to minimize the Wasserstein distance to both the empirical target distribution, and also its underlying population counterpart. Empirically, good performance is demonstrated on the training and testing sets of the MNIST and Thin-8 data. As a side product, the thesis proposes several effective metrics of measure performance of deep generative models. The thesis closes with a discussion of the unsuitability of the Wasserstein distance for certain tasks, as has been identified in prior work

    Constrained Deep Networks: Lagrangian Optimization via Log-Barrier Extensions

    Full text link
    This study investigates the optimization aspects of imposing hard inequality constraints on the outputs of CNNs. In the context of deep networks, constraints are commonly handled with penalties for their simplicity, and despite their well-known limitations. Lagrangian-dual optimization has been largely avoided, except for a few recent works, mainly due to the computational complexity and stability/convergence issues caused by alternating explicit dual updates/projections and stochastic optimization. Several studies showed that, surprisingly for deep CNNs, the theoretical and practical advantages of Lagrangian optimization over penalties do not materialize in practice. We propose log-barrier extensions, which approximate Lagrangian optimization of constrained-CNN problems with a sequence of unconstrained losses. Unlike standard interior-point and log-barrier methods, our formulation does not need an initial feasible solution. Furthermore, we provide a new technical result, which shows that the proposed extensions yield an upper bound on the duality gap. This generalizes the duality-gap result of standard log-barriers, yielding sub-optimality certificates for feasible solutions. While sub-optimality is not guaranteed for non-convex problems, our result shows that log-barrier extensions are a principled way to approximate Lagrangian optimization for constrained CNNs via implicit dual variables. We report comprehensive weakly supervised segmentation experiments, with various constraints, showing that our formulation outperforms substantially the existing constrained-CNN methods, both in terms of accuracy, constraint satisfaction and training stability, more so when dealing with a large number of constraints

    Generative models : a critical review

    Full text link
    Dans cette thèse, nous introduisons et motivons la modélisation générative comme une tâche centrale pour l’apprentissage automatique et fournissons une vue critique des algorithmes qui ont été proposés pour résoudre cette tâche. Nous montrons comment la modélisation générative peut être définie mathématiquement en essayant de faire une distribution d’estimation identique à une distribution de vérité de terrain inconnue. Ceci peut ensuite être quantifié en termes de valeur d’une divergence statistique entre les deux distributions. Nous décrivons l’approche du maximum de vraisemblance et comment elle peut être interprétée comme minimisant la divergence KL. Nous explorons un certain nombre d’approches dans la famille du maximum de vraisemblance, tout en discutant de leurs limites. Enfin, nous explorons l’approche antagoniste alternative qui consiste à étudier les différences entre une distribution d’estimation et une distribution de données réelles. Nous discutons de la façon dont cette approche peut donner lieu à de nouvelles divergences et méthodes qui sont nécessaires pour réussir l’apprentissage par l’adversité. Nous discutons également des nouveaux paramètres d’évaluation requis par l’approche contradictoire. Le chapitre ref chap: fortnet montre qu’en apprenant des modèles génératifs des couches cachées d’un réseau profond, on peut identifier quand le réseau fonctionne sur des données différentes des données observées pendant la formation. Cela nous permet d’étudier les différences entre les modes de fonctionnement libre et de forçage des enseignants dans les réseaux récurrents. Cela conduit également à une meilleure robustesse face aux attaques adverses. Le chapitre ref chap: gibbsnet a exploré une procédure itérative pour la génération et l’inférence dans les réseaux profonds, qui est inspirée par la procédure MCMC de gibbs bloquées pour l’échantillonnage à partir de modèles basés sur l’énergie. Cela permet d’améliorer l’inpainting, la génération et l’inférence en supprimant l’exigence que les variables a priori sur les variables latentes aient une distribution connue. Le chapitre ref chap: discreg a étudié si les modèles génératifs pouvaient être améliorés en exploitant les connaissances acquises par des modèles de classification discriminants. Nous avons étudié cela en augmentant les autoencoders avec des pertes supplémentaires définies dans les états cachés d’un classificateur fixe. Dans la pratique, nous avons montré que cela conduisait à des modèles générateurs mettant davantage l’accent sur les aspects saillants des données, et discutait également des limites de cette approche.In this thesis we introduce and motivate generative modeling as a central task for machine learning and provide a critical view of the algorithms which have been proposed for solving this task. We overview how generative modeling can be de ned mathematically as trying to make an estimating distribution the same as an unknown ground truth distribution. This can then be quanti ed in terms of the value of a statistical divergence between the two distributions. We outline the maximum likelihood approach and how it can be interpreted as minimizing KL-divergence. We explore a number of approaches in the maximum likelihood family, while discussing their limitations. Finally, we explore the alternative adversarial approach which involves studying the di erences between an estimating distribution and a real data distribution. We discuss how this approach can give rise to new divergences and methods that are necessary to make adversarial learning successful. We also discuss new evaluation metrics which are required by the adversarial approach. Chapter 2 shows that by learning generative models of the hidden layers of a deep network can identify when the network is being run on data di ering from the data seen during training. This allows us to study di erences between freerunning and teacher forcing modes in recurrent networks. It also leads to improved robustness to adversarial attacks. Chapter 3 explored an iterative procedure for generation and inference in deep networks, which is inspired by the blocked gibbs MCMC procedure for sampling from energy-based models. This achieves improved inpainting, generation, and inference by removing the requirement that the prior over the latent variables have a known distribution. Chapter 4 studied whether generative models could be improved by exploiting the knowledge learned by discriminative classi cation models. We studied this by augmenting autoencoders with additional losses de ned in the hidden states of a xed classi er. In practice we showed that this led to generative models with better focus on salient aspects of the data, and also discussed limitations in this approach

    Domain Generalization for Medical Image Analysis: A Survey

    Full text link
    Medical Image Analysis (MedIA) has become an essential tool in medicine and healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and recent successes in deep learning (DL) have made significant contributions to its advances. However, DL models for MedIA remain challenging to deploy in real-world situations, failing for generalization under the distributional gap between training and testing samples, known as a distribution shift problem. Researchers have dedicated their efforts to developing various DL methods to adapt and perform robustly on unknown and out-of-distribution data distributions. This paper comprehensively reviews domain generalization studies specifically tailored for MedIA. We provide a holistic view of how domain generalization techniques interact within the broader MedIA system, going beyond methodologies to consider the operational implications on the entire MedIA workflow. Specifically, we categorize domain generalization methods into data-level, feature-level, model-level, and analysis-level methods. We show how those methods can be used in various stages of the MedIA workflow with DL equipped from data acquisition to model prediction and analysis. Furthermore, we include benchmark datasets and applications used to evaluate these approaches and analyze the strengths and weaknesses of various methods, unveiling future research opportunities
    • …
    corecore