14 research outputs found
On the Generalization Effects of Linear Transformations in Data Augmentation
Data augmentation is a powerful technique to improve performance in
applications such as image and text classification tasks. Yet, there is little
rigorous understanding of why and how various augmentations work. In this work,
we consider a family of linear transformations and study their effects on the
ridge estimator in an over-parametrized linear regression setting. First, we
show that transformations which preserve the labels of the data can improve
estimation by enlarging the span of the training data. Second, we show that
transformations which mix data can improve estimation by playing a
regularization effect. Finally, we validate our theoretical insights on MNIST.
Based on the insights, we propose an augmentation scheme that searches over the
space of transformations by how uncertain the model is about the transformed
data. We validate our proposed scheme on image and text datasets. For example,
our method outperforms RandAugment by 1.24% on CIFAR-100 using
Wide-ResNet-28-10. Furthermore, we achieve comparable accuracy to the SoTA
Adversarial AutoAugment on CIFAR datasets.Comment: International Conference on Machine learning (ICML) 2020. Added
experimental results on ImageNe
Kernel-convoluted Deep Neural Networks with Data Augmentation
The Mixup method (Zhang et al. 2018), which uses linearly interpolated data,
has emerged as an effective data augmentation tool to improve generalization
performance and the robustness to adversarial examples. The motivation is to
curtail undesirable oscillations by its implicit model constraint to behave
linearly at in-between observed data points and promote smoothness. In this
work, we formally investigate this premise, propose a way to explicitly impose
smoothness constraints, and extend it to incorporate with implicit model
constraints. First, we derive a new function class composed of
kernel-convoluted models (KCM) where the smoothness constraint is directly
imposed by locally averaging the original functions with a kernel function.
Second, we propose to incorporate the Mixup method into KCM to expand the
domains of smoothness. In both cases of KCM and the KCM adapted with the Mixup,
we provide risk analysis, respectively, under some conditions for kernels. We
show that the upper bound of the excess risk is not slower than that of the
original function class. The upper bound of the KCM with the Mixup remains
dominated by that of the KCM if the perturbation of the Mixup vanishes faster
than where is a sample size. Using CIFAR-10 and CIFAR-100
datasets, our experiments demonstrate that the KCM with the Mixup outperforms
the Mixup method in terms of generalization and robustness to adversarial
examples
SkillBot: Towards Data Augmentation using Transformer language model and linguistic evaluation
Creating accurate, closed-domain, and machine learning-based chatbots that perform language understanding (intent prediction/detection) and language generation (response generation) requires significant datasets derived from specific knowledge domains. The common challenge in developing a closed-domain chatbot application is the lack of a comprehensive dataset. Such scarcity of the dataset can be complemented by augmenting the dataset with the use of state- of-the-art technologies existing in the field of Natural Language Processing, called ‘Transformer Models’. Our applied computing project experimented with a ‘Generative Pre-trained Transformer’ model, a unidirectional transformer decoder model for augmenting an original dataset limited in size and manually authored. This model uses unidirectional contextual representation i.e., text input is processed from left to right while computing embeddings corresponding to the input sentences. The primary goal of the project was to leverage the potential of a pre-trained transformer-based language model in augmenting an existing, but limited dataset. Additionally, the idea for using the model for text generation and appending the generated embedding to the input embedding supplied was to preserve the intent for the augmented utterances as well as to find a different form of expressions for the same intent which could be expressed by the potential users in the future. Our experiment showed improved performance for understanding language and generation for the chatbot model trained on the augmented dataset indicating that a pre-trained language model can be beneficial for the effective working of natural language-based applications such as a chatbot model trained on the augmented dataset indicating that a pre-trained language model can be beneficial for the effective working of natural language-based applications such as a chatbo