554 research outputs found

    Hybrid Energy-based Models for Image Generation and Classification

    Get PDF
    In recent years, deep neural networks (DNNs) have achieved state-of-the-art performance on a wide range of learning tasks. Among those tasks, two fundamental tasks are discriminative models and generative models. However, they are largely separated although prior works have shown that generative training is beneficial to classifiers to alleviate several notorious issues. Energy-based Model (EBM) especially the Joint Energy-based Model(JEM) only needs to train a single network with shared features for discriminative and generative tasks. However, EBMs are expensive to train and very unstable. It is crucial to understand the behavior of EBM training and thus improve the stability, speed, accuracy, and generative quality altogether. This dissertation mainly summarizes my research on EBMs for Hybrid Image Discriminative-Generative Models. We first proposed GMMC which models the joint density p(x, y). As an alternative to the SoftMax classifier utilized in JEM, GMMC has a well-formulated latent feature distribution, which fits well with the generative process of image synthesis. Then we came up with a variety of new training techniques to improve JEM\u27s accuracy, training stability, and speed altogether, and we named it JEM++. Based on JEM++, we analyzed and improved it from three different aspects, 1) the manifold, 2) the data augmentation, 3) the energy landscape. Hence, we propose Manifold-Aware EBM/JEM and Sharpness-Aware JEM to further improve the speed, generation quality, stability, and classification significantly. Beyond MCMC-based EBM, we found we can combine two recent emergent approaches Vision Transformer (ViT) and Denoising Diffusion Probabilistic Model (DDPM) to learn a simple but powerful model for image classification and generation. The new direction can get rid of most disadvantages of EBM, such as the expensive MCMC sampling and instability. Finally, we discuss future research topics including the speed, generation quality, and applications of hybrid models

    Recent Deep Semi-supervised Learning Approaches and Related Works

    Full text link
    The author of this work proposes an overview of the recent semi-supervised learning approaches and related works. Despite the remarkable success of neural networks in various applications, there exist few formidable constraints including the need for a large amount of labeled data. Therefore, semi-supervised learning, which is a learning scheme in which the scarce labels and a larger amount of unlabeled data are utilized to train models (e.g., deep neural networks) is getting more important. Based on the key assumptions of semi-supervised learning, which are the manifold assumption, cluster assumption, and continuity assumption, the work reviews the recent semi-supervised learning approaches. In particular, the methods in regard to using deep neural networks in a semi-supervised learning setting are primarily discussed. In addition, the existing works are first classified based on the underlying idea and explained, and then the holistic approaches that unify the aforementioned ideas are detailed

    When and How Mixup Improves Calibration

    Full text link
    In many machine learning applications, it is important for the model to provide confidence scores that accurately capture its prediction uncertainty. Although modern learning methods have achieved great success in predictive accuracy, generating calibrated confidence scores remains a major challenge. Mixup, a popular yet simple data augmentation technique based on taking convex combinations of pairs of training examples, has been empirically found to significantly improve confidence calibration across diverse applications. However, when and how Mixup helps calibration is still a mystery. In this paper, we theoretically prove that Mixup improves calibration in \textit{high-dimensional} settings by investigating natural statistical models. Interestingly, the calibration benefit of Mixup increases as the model capacity increases. We support our theories with experiments on common architectures and datasets. In addition, we study how Mixup improves calibration in semi-supervised learning. While incorporating unlabeled data can sometimes make the model less calibrated, adding Mixup training mitigates this issue and provably improves calibration. Our analysis provides new insights and a framework to understand Mixup and calibration

    Infinite Class Mixup

    Full text link
    Mixup is a widely adopted strategy for training deep networks, where additional samples are augmented by interpolating inputs and labels of training pairs. Mixup has shown to improve classification performance, network calibration, and out-of-distribution generalisation. While effective, a cornerstone of Mixup, namely that networks learn linear behaviour patterns between classes, is only indirectly enforced since the output interpolation is performed at the probability level. This paper seeks to address this limitation by mixing the classifiers directly instead of mixing the labels for each mixed pair. We propose to define the target of each augmented sample as a uniquely new classifier, whose parameters are a linear interpolation of the classifier vectors of the input pair. The space of all possible classifiers is continuous and spans all interpolations between classifier pairs. To make optimisation tractable, we propose a dual-contrastive Infinite Class Mixup loss, where we contrast the classifier of a mixed pair to both the classifiers and the predicted outputs of other mixed pairs in a batch. Infinite Class Mixup is generic in nature and applies to many variants of Mixup. Empirically, we show that it outperforms standard Mixup and variants such as RegMixup and Remix on balanced, long-tailed, and data-constrained benchmarks, highlighting its broad applicability.Comment: BMVC 202

    On the benefits of defining vicinal distributions in latent space

    Get PDF
    The vicinal risk minimization (VRM) principle is an empirical risk minimization (ERM) variant that replaces Dirac masses with vicinal functions. There is strong numerical and theoretical evidence showing that VRM outperforms ERM in terms of generalization if appropriate vicinal functions are chosen. Mixup Training (MT), a popular choice of vicinal distribution, improves the generalization performance of models by introducing globally linear behavior in between training examples. Apart from generalization, recent works have shown that mixup trained models are relatively robust to input perturbations/corruptions and at the same time are calibrated better than their non-mixup counterparts. In this work, we investigate the benefits of defining these vicinal distributions like mixup in latent space of generative models rather than in input space itself. We propose a new approach - \textit{VarMixup (Variational Mixup)} - to better sample mixup images by using the latent manifold underlying the data. Our empirical studies on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that models trained by performing mixup in the latent manifold learned by VAEs are inherently more robust to various input corruptions/perturbations, are significantly better calibrated, and exhibit more local-linear loss landscapes.Comment: Accepted at Elsevier Pattern Recognition Letters (2021), Best Paper Award at CVPR 2021 Workshop on Adversarial Machine Learning in Real-World Computer Vision (AML-CV), Also accepted at ICLR 2021 Workshops on Robust-Reliable Machine Learning (Oral) and Generalization beyond the training distribution (Abstract
    corecore