86 research outputs found
Gradual Domain Adaptation: Theory and Algorithms
Unsupervised domain adaptation (UDA) adapts a model from a labeled source
domain to an unlabeled target domain in a one-off way. Though widely applied,
UDA faces a great challenge whenever the distribution shift between the source
and the target is large. Gradual domain adaptation (GDA) mitigates this
limitation by using intermediate domains to gradually adapt from the source to
the target domain. In this work, we first theoretically analyze gradual
self-training, a popular GDA algorithm, and provide a significantly improved
generalization bound compared with Kumar et al. (2020). Our theoretical
analysis leads to an interesting insight: to minimize the generalization error
on the target domain, the sequence of intermediate domains should be placed
uniformly along the Wasserstein geodesic between the source and target domains.
The insight is particularly useful under the situation where intermediate
domains are missing or scarce, which is often the case in real-world
applications. Based on the insight, we propose enerative Gradual
Dmain daptation with Optimal ransport
(GOAT), an algorithmic framework that can generate intermediate domains in a
data-dependent way. More concretely, we first generate intermediate domains
along the Wasserstein geodesic between two given consecutive domains in a
feature space, then apply gradual self-training to adapt the source-trained
classifier to the target along the sequence of intermediate domains.
Empirically, we demonstrate that our GOAT framework can improve the performance
of standard GDA when the given intermediate domains are scarce, significantly
broadening the real-world application scenarios of GDA. Our code is available
at https://github.com/yifei-he/GOAT.Comment: arXiv admin note: substantial text overlap with arXiv:2204.0820
OT-Net: A Reusable Neural Optimal Transport Solver
With the widespread application of optimal transport (OT), its calculation
becomes essential, and various algorithms have emerged. However, the existing
methods either have low efficiency or cannot represent discontinuous maps. A
novel reusable neural OT solver OT-Net is thus presented, which first learns
Brenier's height representation via the neural network to obtain its potential,
and then gained the OT map by computing the gradient of the potential. The
algorithm has two merits, 1) it can easily represent discontinuous maps, which
allows it to match any target distribution with discontinuous supports and
achieve sharp boundaries. This can well eliminate mode collapse in the
generated models. 2) The OT map can be calculated straightly by the proposed
algorithm when new target samples are added, which greatly improves the
efficiency and reusability of the map. Moreover, the theoretical error bound of
the algorithm is analyzed, and we have demonstrated the empirical success of
our approach in image generation, color transfer, and domain adaptation
Deep generative models via explicit Wasserstein minimization
This thesis provides a procedure to fit generative networks to target distributions, with the goal of a small Wasserstein distance (or other optimal transport costs). The approach is based on two principles: (a) if the source randomness of the network is a continuous distribution (the “semi-discrete” setting), then the Wasserstein distance is realized by a deterministic optimal transport mapping; (b) given an optimal transport mapping between a generator network and a target distribution, the Wasserstein distance may be decreased via a regression between the generated data and the mapped target points. The procedure here therefore alternates these two steps, forming an optimal transport and regressing against it, gradually adjusting the generator network towards the target distribution. Mathematically, this approach is shown to minimize the Wasserstein distance to both the empirical target distribution, and also its underlying population counterpart. Empirically, good performance is demonstrated on the training and testing sets of the MNIST and Thin-8 data. As a side product, the thesis proposes several effective metrics of measure performance of deep generative models. The thesis closes with a discussion of the unsuitability of the Wasserstein distance for certain tasks, as has been identified in prior work
Constrained Deep Networks: Lagrangian Optimization via Log-Barrier Extensions
This study investigates the optimization aspects of imposing hard inequality
constraints on the outputs of CNNs. In the context of deep networks,
constraints are commonly handled with penalties for their simplicity, and
despite their well-known limitations. Lagrangian-dual optimization has been
largely avoided, except for a few recent works, mainly due to the computational
complexity and stability/convergence issues caused by alternating explicit dual
updates/projections and stochastic optimization. Several studies showed that,
surprisingly for deep CNNs, the theoretical and practical advantages of
Lagrangian optimization over penalties do not materialize in practice. We
propose log-barrier extensions, which approximate Lagrangian optimization of
constrained-CNN problems with a sequence of unconstrained losses. Unlike
standard interior-point and log-barrier methods, our formulation does not need
an initial feasible solution. Furthermore, we provide a new technical result,
which shows that the proposed extensions yield an upper bound on the duality
gap. This generalizes the duality-gap result of standard log-barriers, yielding
sub-optimality certificates for feasible solutions. While sub-optimality is not
guaranteed for non-convex problems, our result shows that log-barrier
extensions are a principled way to approximate Lagrangian optimization for
constrained CNNs via implicit dual variables. We report comprehensive weakly
supervised segmentation experiments, with various constraints, showing that our
formulation outperforms substantially the existing constrained-CNN methods,
both in terms of accuracy, constraint satisfaction and training stability, more
so when dealing with a large number of constraints
Generative models : a critical review
Dans cette thèse, nous introduisons et motivons la modélisation générative comme une tâche centrale pour l’apprentissage automatique et fournissons une vue critique des algorithmes qui ont été proposés pour résoudre cette tâche. Nous montrons comment la modélisation générative peut être définie mathématiquement en essayant de faire une distribution d’estimation identique à une distribution de vérité de terrain inconnue. Ceci peut ensuite être quantifié en termes de valeur d’une divergence statistique entre les deux distributions. Nous décrivons l’approche du maximum de vraisemblance et comment elle peut être interprétée comme minimisant la divergence KL. Nous explorons un certain nombre d’approches dans la famille du maximum de vraisemblance, tout en discutant de leurs limites. Enfin, nous explorons l’approche antagoniste alternative qui consiste à étudier les différences entre une distribution d’estimation et une distribution de données réelles. Nous discutons de la façon dont cette approche peut donner lieu à de nouvelles divergences et méthodes qui sont nécessaires pour réussir l’apprentissage par l’adversité. Nous discutons également des nouveaux paramètres d’évaluation requis par l’approche contradictoire. Le chapitre ref chap: fortnet montre qu’en apprenant des modèles génératifs des couches cachées d’un réseau profond, on peut identifier quand le réseau fonctionne sur des données différentes des données observées pendant la formation. Cela nous permet d’étudier les différences entre les modes de fonctionnement libre et de forçage des enseignants dans les réseaux récurrents. Cela conduit également à une meilleure robustesse face aux attaques adverses. Le chapitre ref chap: gibbsnet a exploré une procédure itérative pour la génération et l’inférence dans les réseaux profonds, qui est inspirée par la procédure MCMC de gibbs bloquées pour l’échantillonnage à partir de modèles basés sur l’énergie. Cela permet d’améliorer l’inpainting, la génération et l’inférence en supprimant l’exigence que les variables a priori sur les variables latentes aient une distribution connue. Le chapitre ref chap: discreg a étudié si les modèles génératifs pouvaient être améliorés en exploitant les connaissances acquises par des modèles de classification discriminants. Nous avons étudié cela en augmentant les autoencoders avec des pertes supplémentaires définies dans les états cachés d’un classificateur fixe. Dans la pratique, nous avons montré que cela conduisait à des modèles générateurs mettant davantage l’accent sur les aspects saillants des données, et discutait également des limites de cette approche.In this thesis we introduce and motivate generative modeling as a central task
for machine learning and provide a critical view of the algorithms which have been
proposed for solving this task. We overview how generative modeling can be de ned
mathematically as trying to make an estimating distribution the same as an unknown
ground truth distribution. This can then be quanti ed in terms of the value of
a statistical divergence between the two distributions. We outline the maximum
likelihood approach and how it can be interpreted as minimizing KL-divergence. We
explore a number of approaches in the maximum likelihood family, while discussing
their limitations. Finally, we explore the alternative adversarial approach which
involves studying the di erences between an estimating distribution and a real data
distribution. We discuss how this approach can give rise to new divergences and
methods that are necessary to make adversarial learning successful. We also discuss
new evaluation metrics which are required by the adversarial approach.
Chapter 2 shows that by learning generative models of the hidden layers of a
deep network can identify when the network is being run on data di ering from
the data seen during training. This allows us to study di erences between freerunning
and teacher forcing modes in recurrent networks. It also leads to improved
robustness to adversarial attacks.
Chapter 3 explored an iterative procedure for generation and inference in deep
networks, which is inspired by the blocked gibbs MCMC procedure for sampling
from energy-based models. This achieves improved inpainting, generation, and
inference by removing the requirement that the prior over the latent variables have
a known distribution.
Chapter 4 studied whether generative models could be improved by exploiting
the knowledge learned by discriminative classi cation models. We studied this by
augmenting autoencoders with additional losses de ned in the hidden states of a
xed classi er. In practice we showed that this led to generative models with better
focus on salient aspects of the data, and also discussed limitations in this approach
Domain Generalization for Medical Image Analysis: A Survey
Medical Image Analysis (MedIA) has become an essential tool in medicine and
healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and
recent successes in deep learning (DL) have made significant contributions to
its advances. However, DL models for MedIA remain challenging to deploy in
real-world situations, failing for generalization under the distributional gap
between training and testing samples, known as a distribution shift problem.
Researchers have dedicated their efforts to developing various DL methods to
adapt and perform robustly on unknown and out-of-distribution data
distributions. This paper comprehensively reviews domain generalization studies
specifically tailored for MedIA. We provide a holistic view of how domain
generalization techniques interact within the broader MedIA system, going
beyond methodologies to consider the operational implications on the entire
MedIA workflow. Specifically, we categorize domain generalization methods into
data-level, feature-level, model-level, and analysis-level methods. We show how
those methods can be used in various stages of the MedIA workflow with DL
equipped from data acquisition to model prediction and analysis. Furthermore,
we include benchmark datasets and applications used to evaluate these
approaches and analyze the strengths and weaknesses of various methods,
unveiling future research opportunities
- …