2,257 research outputs found

    Empirical Evaluation of Variational Autoencoders for Data Augmentation

    Full text link
    Since the beginning of Neural Networks, different mechanisms have been required to provide a sufficient number of examples to avoid overfitting. Data augmentation, the most common one, is focused on the generation of new instances performing different distortions in the real samples. Usually, these transformations are problem-dependent, and they result in a synthetic set of, likely, unseen examples. In this work, we have studied a generative model, based on the paradigm of encoder-decoder, that works directly in the data space, that is, with images. This model encodes the input in a latent space where different transformations will be applied. After completing this, we can reconstruct the latent vectors to get new samples. We have analysed various procedures according to the distortions that we could carry out, as well as the effectiveness of this process to improve the accuracy of different classification systems. To do this, we could use both the latent space and the original space after reconstructing the altered version of these vectors. Our results have shown that using this pipeline (encoding-altering-decoding) helps the generalisation of the classifiers that have been selected.This work was developed in the framework of the PROMETEOII/2014/030 research project "Adaptive learning and multimodality in machine translation and text transcription", funded by the Generalitat Valenciana. The work of the first author is financed by Grant FPU14/03981, from the Spanish Ministry of Education, Culture and Sport.Jorge-Cano, J.; Vieco Pérez, J.; Paredes Palacios, R.; Sánchez Peiró, JA.; Benedí Ruiz, JM. (2018). Empirical Evaluation of Variational Autoencoders for Data Augmentation. ScitePress. 96-104. https://doi.org/10.5220/00066186009601049610

    One-Shot Learning using Mixture of Variational Autoencoders: a Generalization Learning approach

    Get PDF
    Deep learning, even if it is very successful nowadays, traditionally needs very large amounts of labeled data to perform excellent on the classification task. In an attempt to solve this problem, the one-shot learning paradigm, which makes use of just one labeled sample per class and prior knowledge, becomes increasingly important. In this paper, we propose a new one-shot learning method, dubbed MoVAE (Mixture of Variational AutoEncoders), to perform classification. Complementary to prior studies, MoVAE represents a shift of paradigm in comparison with the usual one-shot learning methods, as it does not use any prior knowledge. Instead, it starts from zero knowledge and one labeled sample per class. Afterward, by using unlabeled data and the generalization learning concept (in a way, more as humans do), it is capable to gradually improve by itself its performance. Even more, if there are no unlabeled data available MoVAE can still perform well in one-shot learning classification. We demonstrate empirically the efficiency of our proposed approach on three datasets, i.e. the handwritten digits (MNIST), fashion products (Fashion-MNIST), and handwritten characters (Omniglot), showing that MoVAE outperforms state-of-the-art one-shot learning algorithms

    Data Augmentation for Spoken Language Understanding via Joint Variational Generation

    Full text link
    Data scarcity is one of the main obstacles of domain adaptation in spoken language understanding (SLU) due to the high cost of creating manually tagged SLU datasets. Recent works in neural text generative models, particularly latent variable models such as variational autoencoder (VAE), have shown promising results in regards to generating plausible and natural sentences. In this paper, we propose a novel generative architecture which leverages the generative power of latent variable models to jointly synthesize fully annotated utterances. Our experiments show that existing SLU models trained on the additional synthetic examples achieve performance gains. Our approach not only helps alleviate the data scarcity issue in the SLU task for many datasets but also indiscriminately improves language understanding performances for various SLU models, supported by extensive experiments and rigorous statistical testing.Comment: 8 pages, 3 figures, 4 tables, Accepted in AAAI201

    DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN

    Full text link
    Recently, the introduction of the generative adversarial network (GAN) and its variants has enabled the generation of realistic synthetic samples, which has been used for enlarging training sets. Previous work primarily focused on data augmentation for semi-supervised and supervised tasks. In this paper, we instead focus on unsupervised anomaly detection and propose a novel generative data augmentation framework optimized for this task. In particular, we propose to oversample infrequent normal samples - normal samples that occur with small probability, e.g., rare normal events. We show that these samples are responsible for false positives in anomaly detection. However, oversampling of infrequent normal samples is challenging for real-world high-dimensional data with multimodal distributions. To address this challenge, we propose to use a GAN variant known as the adversarial autoencoder (AAE) to transform the high-dimensional multimodal data distributions into low-dimensional unimodal latent distributions with well-defined tail probability. Then, we systematically oversample at the `edge' of the latent distributions to increase the density of infrequent normal samples. We show that our oversampling pipeline is a unified one: it is generally applicable to datasets with different complex data distributions. To the best of our knowledge, our method is the first data augmentation technique focused on improving performance in unsupervised anomaly detection. We validate our method by demonstrating consistent improvements across several real-world datasets.Comment: Published as a conference paper at ICDM 2018 (IEEE International Conference on Data Mining

    A New Approach to Synthetic Image Evaluation

    Get PDF
    This study is dedicated to enhancing the effectiveness of Optical Character Recognition (OCR) systems, with a special emphasis on Arabic handwritten digit recognition. The choice to focus on Arabic handwritten digits is twofold: first, there has been relatively less research conducted in this area compared to its English counterparts; second, the recognition of Arabic handwritten digits presents more challenges due to the inherent similarities between different Arabic digits.OCR systems, engineered to decipher both printed and handwritten text, often face difficulties in accurately identifying low-quality or distorted handwritten text. The quality of the input image and the complexity of the text significantly influence their performance. However, data augmentation strategies can notably improve these systems\u27 performance. These strategies generate new images that closely resemble the original ones, albeit with minor variations, thereby enriching the model\u27s learning and enhancing its adaptability. The research found Conditional Variational Autoencoders (C-VAE) and Conditional Generative Adversarial Networks (C-GAN) to be particularly effective in this context. These two generative models stand out due to their superior image generation and feature extraction capabilities. A significant contribution of the study has been the formulation of the Synthetic Image Evaluation Procedure, a systematic approach designed to evaluate and amplify the generative models\u27 image generation abilities. This procedure facilitates the extraction of meaningful features, computation of the Fréchet Inception Distance (LFID) score, and supports hyper-parameter optimization and model modifications

    Examining the Size of the Latent Space of Convolutional Variational Autoencoders Trained With Spectral Topographic Maps of EEG Frequency Bands

    Get PDF
    Electroencephalography (EEG) is a technique of recording brain electrical potentials using electrodes placed on the scalp [1]. It is well known that EEG signals contain essential information in the frequency, temporal and spatial domains. For example, some studies have converted EEG signals into topographic power head maps to preserve spatial information [2]. Others have produced spectral topographic head maps of different EEG bands to both preserve information in The associate editor coordinating the review of this manuscript and approving it for publication was Ludovico Minati . the spatial domain and take advantage of the information in the frequency domain [3]. However, topographic maps contain highly interpolated data in between electrode locations and are often redundant. For this reason, convolutional neural networks are often used to reduce their dimensionality and learn relevant features automatically [4]
    • …