554 research outputs found
Hybrid Energy-based Models for Image Generation and Classification
In recent years, deep neural networks (DNNs) have achieved state-of-the-art performance on a wide range of learning tasks. Among those tasks, two fundamental tasks are discriminative models and generative models. However, they are largely separated although prior works have shown that generative training is beneficial to classifiers to alleviate several notorious issues. Energy-based Model (EBM) especially the Joint Energy-based Model(JEM) only needs to train a single network with shared features for discriminative and generative tasks. However, EBMs are expensive to train and very unstable. It is crucial to understand the behavior of EBM training and thus improve the stability, speed, accuracy, and generative quality altogether. This dissertation mainly summarizes my research on EBMs for Hybrid Image Discriminative-Generative Models. We first proposed GMMC which models the joint density p(x, y). As an alternative to the SoftMax classifier utilized in JEM, GMMC has a well-formulated latent feature distribution, which fits well with the generative process of image synthesis. Then we came up with a variety of new training techniques to improve JEM\u27s accuracy, training stability, and speed altogether, and we named it JEM++. Based on JEM++, we analyzed and improved it from three different aspects, 1) the manifold, 2) the data augmentation, 3) the energy landscape. Hence, we propose Manifold-Aware EBM/JEM and Sharpness-Aware JEM to further improve the speed, generation quality, stability, and classification significantly. Beyond MCMC-based EBM, we found we can combine two recent emergent approaches Vision Transformer (ViT) and Denoising Diffusion Probabilistic Model (DDPM) to learn a simple but powerful model for image classification and generation. The new direction can get rid of most disadvantages of EBM, such as the expensive MCMC sampling and instability. Finally, we discuss future research topics including the speed, generation quality, and applications of hybrid models
Recent Deep Semi-supervised Learning Approaches and Related Works
The author of this work proposes an overview of the recent semi-supervised
learning approaches and related works. Despite the remarkable success of neural
networks in various applications, there exist few formidable constraints
including the need for a large amount of labeled data. Therefore,
semi-supervised learning, which is a learning scheme in which the scarce labels
and a larger amount of unlabeled data are utilized to train models (e.g., deep
neural networks) is getting more important. Based on the key assumptions of
semi-supervised learning, which are the manifold assumption, cluster
assumption, and continuity assumption, the work reviews the recent
semi-supervised learning approaches. In particular, the methods in regard to
using deep neural networks in a semi-supervised learning setting are primarily
discussed. In addition, the existing works are first classified based on the
underlying idea and explained, and then the holistic approaches that unify the
aforementioned ideas are detailed
When and How Mixup Improves Calibration
In many machine learning applications, it is important for the model to
provide confidence scores that accurately capture its prediction uncertainty.
Although modern learning methods have achieved great success in predictive
accuracy, generating calibrated confidence scores remains a major challenge.
Mixup, a popular yet simple data augmentation technique based on taking convex
combinations of pairs of training examples, has been empirically found to
significantly improve confidence calibration across diverse applications.
However, when and how Mixup helps calibration is still a mystery. In this
paper, we theoretically prove that Mixup improves calibration in
\textit{high-dimensional} settings by investigating natural statistical models.
Interestingly, the calibration benefit of Mixup increases as the model capacity
increases. We support our theories with experiments on common architectures and
datasets. In addition, we study how Mixup improves calibration in
semi-supervised learning. While incorporating unlabeled data can sometimes make
the model less calibrated, adding Mixup training mitigates this issue and
provably improves calibration. Our analysis provides new insights and a
framework to understand Mixup and calibration
Infinite Class Mixup
Mixup is a widely adopted strategy for training deep networks, where
additional samples are augmented by interpolating inputs and labels of training
pairs. Mixup has shown to improve classification performance, network
calibration, and out-of-distribution generalisation. While effective, a
cornerstone of Mixup, namely that networks learn linear behaviour patterns
between classes, is only indirectly enforced since the output interpolation is
performed at the probability level. This paper seeks to address this limitation
by mixing the classifiers directly instead of mixing the labels for each mixed
pair. We propose to define the target of each augmented sample as a uniquely
new classifier, whose parameters are a linear interpolation of the classifier
vectors of the input pair. The space of all possible classifiers is continuous
and spans all interpolations between classifier pairs. To make optimisation
tractable, we propose a dual-contrastive Infinite Class Mixup loss, where we
contrast the classifier of a mixed pair to both the classifiers and the
predicted outputs of other mixed pairs in a batch. Infinite Class Mixup is
generic in nature and applies to many variants of Mixup. Empirically, we show
that it outperforms standard Mixup and variants such as RegMixup and Remix on
balanced, long-tailed, and data-constrained benchmarks, highlighting its broad
applicability.Comment: BMVC 202
On the benefits of defining vicinal distributions in latent space
The vicinal risk minimization (VRM) principle is an empirical risk
minimization (ERM) variant that replaces Dirac masses with vicinal functions.
There is strong numerical and theoretical evidence showing that VRM outperforms
ERM in terms of generalization if appropriate vicinal functions are chosen.
Mixup Training (MT), a popular choice of vicinal distribution, improves the
generalization performance of models by introducing globally linear behavior in
between training examples. Apart from generalization, recent works have shown
that mixup trained models are relatively robust to input
perturbations/corruptions and at the same time are calibrated better than their
non-mixup counterparts. In this work, we investigate the benefits of defining
these vicinal distributions like mixup in latent space of generative models
rather than in input space itself. We propose a new approach - \textit{VarMixup
(Variational Mixup)} - to better sample mixup images by using the latent
manifold underlying the data. Our empirical studies on CIFAR-10, CIFAR-100, and
Tiny-ImageNet demonstrate that models trained by performing mixup in the latent
manifold learned by VAEs are inherently more robust to various input
corruptions/perturbations, are significantly better calibrated, and exhibit
more local-linear loss landscapes.Comment: Accepted at Elsevier Pattern Recognition Letters (2021), Best Paper
Award at CVPR 2021 Workshop on Adversarial Machine Learning in Real-World
Computer Vision (AML-CV), Also accepted at ICLR 2021 Workshops on
Robust-Reliable Machine Learning (Oral) and Generalization beyond the
training distribution (Abstract
- …