14 research outputs found

    On the Generalization Effects of Linear Transformations in Data Augmentation

    Full text link
    Data augmentation is a powerful technique to improve performance in applications such as image and text classification tasks. Yet, there is little rigorous understanding of why and how various augmentations work. In this work, we consider a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting. First, we show that transformations which preserve the labels of the data can improve estimation by enlarging the span of the training data. Second, we show that transformations which mix data can improve estimation by playing a regularization effect. Finally, we validate our theoretical insights on MNIST. Based on the insights, we propose an augmentation scheme that searches over the space of transformations by how uncertain the model is about the transformed data. We validate our proposed scheme on image and text datasets. For example, our method outperforms RandAugment by 1.24% on CIFAR-100 using Wide-ResNet-28-10. Furthermore, we achieve comparable accuracy to the SoTA Adversarial AutoAugment on CIFAR datasets.Comment: International Conference on Machine learning (ICML) 2020. Added experimental results on ImageNe

    Kernel-convoluted Deep Neural Networks with Data Augmentation

    Full text link
    The Mixup method (Zhang et al. 2018), which uses linearly interpolated data, has emerged as an effective data augmentation tool to improve generalization performance and the robustness to adversarial examples. The motivation is to curtail undesirable oscillations by its implicit model constraint to behave linearly at in-between observed data points and promote smoothness. In this work, we formally investigate this premise, propose a way to explicitly impose smoothness constraints, and extend it to incorporate with implicit model constraints. First, we derive a new function class composed of kernel-convoluted models (KCM) where the smoothness constraint is directly imposed by locally averaging the original functions with a kernel function. Second, we propose to incorporate the Mixup method into KCM to expand the domains of smoothness. In both cases of KCM and the KCM adapted with the Mixup, we provide risk analysis, respectively, under some conditions for kernels. We show that the upper bound of the excess risk is not slower than that of the original function class. The upper bound of the KCM with the Mixup remains dominated by that of the KCM if the perturbation of the Mixup vanishes faster than O(n−1/2)O(n^{-1/2}) where nn is a sample size. Using CIFAR-10 and CIFAR-100 datasets, our experiments demonstrate that the KCM with the Mixup outperforms the Mixup method in terms of generalization and robustness to adversarial examples

    SkillBot: Towards Data Augmentation using Transformer language model and linguistic evaluation

    Get PDF
    Creating accurate, closed-domain, and machine learning-based chatbots that perform language understanding (intent prediction/detection) and language generation (response generation) requires significant datasets derived from specific knowledge domains. The common challenge in developing a closed-domain chatbot application is the lack of a comprehensive dataset. Such scarcity of the dataset can be complemented by augmenting the dataset with the use of state- of-the-art technologies existing in the field of Natural Language Processing, called ‘Transformer Models’. Our applied computing project experimented with a ‘Generative Pre-trained Transformer’ model, a unidirectional transformer decoder model for augmenting an original dataset limited in size and manually authored. This model uses unidirectional contextual representation i.e., text input is processed from left to right while computing embeddings corresponding to the input sentences. The primary goal of the project was to leverage the potential of a pre-trained transformer-based language model in augmenting an existing, but limited dataset. Additionally, the idea for using the model for text generation and appending the generated embedding to the input embedding supplied was to preserve the intent for the augmented utterances as well as to find a different form of expressions for the same intent which could be expressed by the potential users in the future. Our experiment showed improved performance for understanding language and generation for the chatbot model trained on the augmented dataset indicating that a pre-trained language model can be beneficial for the effective working of natural language-based applications such as a chatbot model trained on the augmented dataset indicating that a pre-trained language model can be beneficial for the effective working of natural language-based applications such as a chatbo
    corecore