59 research outputs found
On the benefits of defining vicinal distributions in latent space
The vicinal risk minimization (VRM) principle is an empirical risk
minimization (ERM) variant that replaces Dirac masses with vicinal functions.
There is strong numerical and theoretical evidence showing that VRM outperforms
ERM in terms of generalization if appropriate vicinal functions are chosen.
Mixup Training (MT), a popular choice of vicinal distribution, improves the
generalization performance of models by introducing globally linear behavior in
between training examples. Apart from generalization, recent works have shown
that mixup trained models are relatively robust to input
perturbations/corruptions and at the same time are calibrated better than their
non-mixup counterparts. In this work, we investigate the benefits of defining
these vicinal distributions like mixup in latent space of generative models
rather than in input space itself. We propose a new approach - \textit{VarMixup
(Variational Mixup)} - to better sample mixup images by using the latent
manifold underlying the data. Our empirical studies on CIFAR-10, CIFAR-100, and
Tiny-ImageNet demonstrate that models trained by performing mixup in the latent
manifold learned by VAEs are inherently more robust to various input
corruptions/perturbations, are significantly better calibrated, and exhibit
more local-linear loss landscapes.Comment: Accepted at Elsevier Pattern Recognition Letters (2021), Best Paper
Award at CVPR 2021 Workshop on Adversarial Machine Learning in Real-World
Computer Vision (AML-CV), Also accepted at ICLR 2021 Workshops on
Robust-Reliable Machine Learning (Oral) and Generalization beyond the
training distribution (Abstract
AdaER: An Adaptive Experience Replay Approach for Continual Lifelong Learning
Continual lifelong learning is an machine learning framework inspired by
human learning, where learners are trained to continuously acquire new
knowledge in a sequential manner. However, the non-stationary nature of
streaming training data poses a significant challenge known as catastrophic
forgetting, which refers to the rapid forgetting of previously learned
knowledge when new tasks are introduced. While some approaches, such as
experience replay (ER), have been proposed to mitigate this issue, their
performance remains limited, particularly in the class-incremental scenario
which is considered natural and highly challenging. In this paper, we present a
novel algorithm, called adaptive-experience replay (AdaER), to address the
challenge of continual lifelong learning. AdaER consists of two stages: memory
replay and memory update. In the memory replay stage, AdaER introduces a
contextually-cued memory recall (C-CMR) strategy, which selectively replays
memories that are most conflicting with the current input data in terms of both
data and task. Additionally, AdaER incorporates an entropy-balanced reservoir
sampling (E-BRS) strategy to enhance the performance of the memory buffer by
maximizing information entropy. To evaluate the effectiveness of AdaER, we
conduct experiments on established supervised continual lifelong learning
benchmarks, specifically focusing on class-incremental learning scenarios. The
results demonstrate that AdaER outperforms existing continual lifelong learning
baselines, highlighting its efficacy in mitigating catastrophic forgetting and
improving learning performance.Comment: 18 pages, 26 figure
G-Mix: A Generalized Mixup Learning Framework Towards Flat Minima
Deep neural networks (DNNs) have demonstrated promising results in various
complex tasks. However, current DNNs encounter challenges with
over-parameterization, especially when there is limited training data
available. To enhance the generalization capability of DNNs, the Mixup
technique has gained popularity. Nevertheless, it still produces suboptimal
outcomes. Inspired by the successful Sharpness-Aware Minimization (SAM)
approach, which establishes a connection between the sharpness of the training
loss landscape and model generalization, we propose a new learning framework
called Generalized-Mixup, which combines the strengths of Mixup and SAM for
training DNN models. The theoretical analysis provided demonstrates how the
developed G-Mix framework enhances generalization. Additionally, to further
optimize DNN performance with the G-Mix framework, we introduce two novel
algorithms: Binary G-Mix and Decomposed G-Mix. These algorithms partition the
training data into two subsets based on the sharpness-sensitivity of each
example to address the issue of "manifold intrusion" in Mixup. Both theoretical
explanations and experimental results reveal that the proposed BG-Mix and
DG-Mix algorithms further enhance model generalization across multiple datasets
and models, achieving state-of-the-art performance.Comment: 19 pages, 23 figure
Reweighted Mixup for Subpopulation Shift
Subpopulation shift exists widely in many real-world applications, which
refers to the training and test distributions that contain the same
subpopulation groups but with different subpopulation proportions. Ignoring
subpopulation shifts may lead to significant performance degradation and
fairness concerns. Importance reweighting is a classical and effective way to
handle the subpopulation shift. However, recent studies have recognized that
most of these approaches fail to improve the performance especially when
applied to over-parameterized neural networks which are capable of fitting any
training samples. In this work, we propose a simple yet practical framework,
called reweighted mixup (RMIX), to mitigate the overfitting issue in
over-parameterized models by conducting importance weighting on the ''mixed''
samples. Benefiting from leveraging reweighting in mixup, RMIX allows the model
to explore the vicinal space of minority samples more, thereby obtaining more
robust model against subpopulation shift. When the subpopulation memberships
are unknown, the training-trajectories-based uncertainty estimation is equipped
in the proposed RMIX to flexibly characterize the subpopulation distribution.
We also provide insightful theoretical analysis to verify that RMIX achieves
better generalization bounds over prior works. Further, we conduct extensive
empirical studies across a wide range of tasks to validate the effectiveness of
the proposed method.Comment: Journal version of arXiv:2209.0892
MixupE: Understanding and Improving Mixup from Directional Derivative Perspective
Mixup is a popular data augmentation technique for training deep neural
networks where additional samples are generated by linearly interpolating pairs
of inputs and their labels. This technique is known to improve the
generalization performance in many learning paradigms and applications. In this
work, we first analyze Mixup and show that it implicitly regularizes infinitely
many directional derivatives of all orders. Based on this new insight, we
propose an improved version of Mixup, theoretically justified to deliver better
generalization performance than the vanilla Mixup. To demonstrate the
effectiveness of the proposed method, we conduct experiments across various
domains such as images, tabular data, speech, and graphs. Our results show that
the proposed method improves Mixup across multiple datasets using a variety of
architectures, for instance, exhibiting an improvement over Mixup by 0.8% in
ImageNet top-1 accuracy.Comment: 16 pages, Best Student Paper Award at UAI 202
Automatic Data Augmentation via Invariance-Constrained Learning
Underlying data structures, such as symmetries or invariances to
transformations, are often exploited to improve the solution of learning tasks.
However, embedding these properties in models or learning algorithms can be
challenging and computationally intensive. Data augmentation, on the other
hand, induces these symmetries during training by applying multiple
transformations to the input data. Despite its ubiquity, its effectiveness
depends on the choices of which transformations to apply, when to do so, and
how often. In fact, there is both empirical and theoretical evidence that the
indiscriminate use of data augmentation can introduce biases that outweigh its
benefits. This work tackles these issues by automatically adapting the data
augmentation while solving the learning task. To do so, it formulates data
augmentation as an invariance-constrained learning problem and leverages Monte
Carlo Markov Chain (MCMC) sampling to solve it. The result is a practical
algorithm that not only does away with a priori searches for augmentation
distributions, but also dynamically controls if and when data augmentation is
applied. Our experiments illustrate the performance of this method, which
achieves state-of-the-art results in automatic data augmentation benchmarks for
CIFAR datasets. Furthermore, this approach can be used to gather insights on
the actual symmetries underlying a learning task
LongReMix: Robust Learning with High Confidence Samples in a Noisy Label Environment
Deep neural network models are robust to a limited amount of label noise, but
their ability to memorise noisy labels in high noise rate problems is still an
open issue. The most competitive noisy-label learning algorithms rely on a
2-stage process comprising an unsupervised learning to classify training
samples as clean or noisy, followed by a semi-supervised learning that
minimises the empirical vicinal risk (EVR) using a labelled set formed by
samples classified as clean, and an unlabelled set with samples classified as
noisy. In this paper, we hypothesise that the generalisation of such 2-stage
noisy-label learning methods depends on the precision of the unsupervised
classifier and the size of the training set to minimise the EVR. We empirically
validate these two hypotheses and propose the new 2-stage noisy-label training
algorithm LongReMix. We test LongReMix on the noisy-label benchmarks CIFAR-10,
CIFAR-100, WebVision, Clothing1M, and Food101-N. The results show that our
LongReMix generalises better than competing approaches, particularly in high
label noise problems. Furthermore, our approach achieves state-of-the-art
performance in most datasets. The code will be available upon paper acceptance
Estimating Input Coefficients for Regional Input-Output Tables Using Deep Learning with Mixup
An input-output table is an important data for analyzing the economic
situation of a region. Generally, the input-output table for each region
(regional input-output table) in Japan is not always publicly available, so it
is necessary to estimate the table. In particular, various methods have been
developed for estimating input coefficients, which are an important part of the
input-output table. Currently, non-survey methods are often used to estimate
input coefficients because they require less data and computation, but these
methods have some problems, such as discarding information and requiring
additional data for estimation.
In this study, the input coefficients are estimated by approximating the
generation process with an artificial neural network (ANN) to mitigate the
problems of the non-survey methods and to estimate the input coefficients with
higher precision. To avoid over-fitting due to the small data used, data
augmentation, called mixup, is introduced to increase the data size by
generating virtual regions through region composition and scaling.
By comparing the estimates of the input coefficients with those of Japan as a
whole, it is shown that the accuracy of the method of this research is higher
and more stable than that of the conventional non-survey methods. In addition,
the estimated input coefficients for the three cities in Japan are generally
close to the published values for each city.Comment: 24 pages, 7 postscript figure
- …