11 research outputs found

    Implementation of Synthesize GAN Model to Detect Outlier in National Stock Exchange Time Series Multivariate Data

    Get PDF
    This research work explores a novel approach for identifying outliers in stock related time series multivariate datasets, using Generative Adversarial Networks (GANs). The proposed framework harnesses the power of GANs to create synthetic data points that replicate the statistical characteristics of genuine stock related time series. The use of Generative Adversarial Networks to generate tabular data has become more important in a number of industries, including banking, healthcare, and data privacy. The process of synthesizing tabular data with GANs is also provided in this paper. It involves several critical steps, including data collection, preprocessing, and exploration, as well as the design and training using Generator and Discriminator networks. While the discriminator separates genuine samples from synthetic ones, the generator is in charge of producing synthetic data. Generating high quality tabular data with GANs is a complex task, but it has the potential to facilitate data generation in various domains while preserving data privacy and integrity. The results from the experiments confirm that the GAN framework is useful for detecting outliers.  The model demonstrates its proficiency in identifying outliers within stock-related time series data. In comparison, our proposed work also examines the statistics and machine learning models in related application fields

    Biosignal Generation and Latent Variable Analysis with Recurrent Generative Adversarial Networks

    Full text link
    The effectiveness of biosignal generation and data augmentation with biosignal generative models based on generative adversarial networks (GANs), which are a type of deep learning technique, was demonstrated in our previous paper. GAN-based generative models only learn the projection between a random distribution as input data and the distribution of training data.Therefore, the relationship between input and generated data is unclear, and the characteristics of the data generated from this model cannot be controlled. This study proposes a method for generating time-series data based on GANs and explores their ability to generate biosignals with certain classes and characteristics. Moreover, in the proposed method, latent variables are analyzed using canonical correlation analysis (CCA) to represent the relationship between input and generated data as canonical loadings. Using these loadings, we can control the characteristics of the data generated by the proposed method. The influence of class labels on generated data is analyzed by feeding the data interpolated between two class labels into the generator of the proposed GANs. The CCA of the latent variables is shown to be an effective method of controlling the generated data characteristics. We are able to model the distribution of the time-series data without requiring domain-dependent knowledge using the proposed method. Furthermore, it is possible to control the characteristics of these data by analyzing the model trained using the proposed method. To the best of our knowledge, this work is the first to generate biosignals using GANs while controlling the characteristics of the generated data

    Assessing the Potential of Data Augmentation in EEG Functional Connectivity for Early Detection of Alzheimer’s Disease

    Get PDF
    Electroencephalographic (EEG) signals are acquired non-invasively from electrodes placed on the scalp. Experts in the field can use EEG signals to distinguish between patients with Alzheimer’s disease (AD) and normal control (NC) subjects using classification models. However, the training of deep learning or machine learning models requires a large number of trials. Datasets related to Alzheimer’s disease are typically small in size due to the lack of AD patient samples. The lack of data samples required for the training process limits the use of deep learning techniques for further development in clinical settings. We propose to increase the number of trials in the training set by means of a decomposition–recombination system consisting of three steps. Firstly, the original signals from the training set are decomposed into multiple intrinsic mode functions via multivariate empirical mode decomposition. Next, these intrinsic mode functions are randomly recombined across trials. Finally, the recombined intrinsic mode functions are added together as artificial trials, which are used for training the models. We evaluated the decomposition–recombination system on a small dataset using each subject’s functional connectivity matrices as inputs. Three different neural networks, including ResNet, BrainNet CNN, and EEGNet, were used. Overall, the system helped improve ResNet training in both the mild AD dataset, with an increase of 5.24%, and in the mild cognitive impairment dataset, with an increase of 4.50%. The evaluation of the proposed data augmentation system shows that the performance of neural networks can be improved by enhancing the training set with data augmentation. This work shows the need for data augmentation on the training of neural networks in the case of small-size AD datasets.Fil: Jia, Hao. Universitat de Vic; España. Nankai University; ChinaFil: Huang, Zihao. Nankai University; ChinaFil: Caiafa, César Federico. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Argentino de Radioastronomía. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Argentino de Radioastronomía; ArgentinaFil: Duan, Feng. Nankai University; ChinaFil: Zhang, Yu. Lehigh University; Estados UnidosFil: Sun, Zhe. Juntendo University; ChinaFil: Solé Casals, Jordi. Universitat de Vic; Españ

    Cross-modal generative models for multi-modal plastic sorting

    Get PDF
    Automated sorting through chemometric analysis of plastic spectral data could be a key strategy towards improving plastic waste management. Deep learning is a promising chemometric tool, but further development through multi-modal deep learning has been limited by lack of data availability. A new Multi-modal Plastic Spectral Database (MMPSD) consisting of Fourier Transform Infrared (FTIR), Raman and Laser-induced Breakdown Spectroscopy (LIBS) data for each sample in the database is introduced in this work. MMPSD serves as the basis for novel cross-modality generative model technique termed Spectral Conversion Autoencoders (SCAE), which generates synthetic data from data of another modality. SCAE is advantageous over traditional generative models like Variational Autoencoders (VAE), as it can generate class specific synthetic data without the need to train multiple models for each data class. MMPSD also facilitated the exploration of multi-modal deep learning, which improved the classification accuracy as compared to a uni-modal approach from 0.933 to 0.970. SCAE can further be combined with multi-modal methods to achieve a higher accuracy of 0.963 while still using a single sensor to reduce costs, which can be applied for multi-modal augmentation from FTIR sensors used in industrial sorting

    Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder

    Get PDF
    IntroductionMajor depressive disorder (MDD) is the most common mental disorder worldwide, leading to impairment in quality and independence of life. Electroencephalography (EEG) biomarkers processed with machine learning (ML) algorithms have been explored for objective diagnoses with promising results. However, the generalizability of those models, a prerequisite for clinical application, is restricted by small datasets. One approach to train ML models with good generalizability is complementing the original with synthetic data produced by generative algorithms. Another advantage of synthetic data is the possibility of publishing the data for other researchers without risking patient data privacy. Synthetic EEG time-series have not yet been generated for two clinical populations like MDD patients and healthy controls.MethodsWe first reviewed 27 studies presenting EEG data augmentation with generative algorithms for classification tasks, like diagnosis, for the possibilities and shortcomings of recent methods. The subsequent empirical study generated EEG time-series based on two public datasets with 30/28 and 24/29 subjects (MDD/controls). To obtain baseline diagnostic accuracies, convolutional neural networks (CNN) were trained with time-series from each dataset. The data were synthesized with generative adversarial networks (GAN) consisting of CNNs. We evaluated the synthetic data qualitatively and quantitatively and finally used it for re-training the diagnostic model.ResultsThe reviewed studies improved their classification accuracies by between 1 and 40% with the synthetic data. Our own diagnostic accuracy improved up to 10% for one dataset but not significantly for the other. We found a rich repertoire of generative models in the reviewed literature, solving various technical issues. A major shortcoming in the field is the lack of meaningful evaluation metrics for synthetic data. The few studies analyzing the data in the frequency domain, including our own, show that only some features can be produced truthfully.DiscussionThe systematic review combined with our own investigation provides an overview of the available methods for generating EEG data for a classification task, their possibilities, and shortcomings. The approach is promising and the technical basis is set. For a broad application of these techniques in neuroscience research or clinical application, the methods need fine-tuning facilitated by domain expertise in (clinical) EEG research

    Generative Adversarial Network (GAN) for Medical Image Synthesis and Augmentation

    Get PDF
    Medical image processing aided by artificial intelligence (AI) and machine learning (ML) significantly improves medical diagnosis and decision making. However, the difficulty to access well-annotated medical images becomes one of the main constraints on further improving this technology. Generative adversarial network (GAN) is a DNN framework for data synthetization, which provides a practical solution for medical image augmentation and translation. In this study, we first perform a quantitative survey on the published studies on GAN for medical image processing since 2017. Then a novel adaptive cycle-consistent adversarial network (Ad CycleGAN) is proposed. We respectively use a malaria blood cell dataset (19,578 images) and a COVID-19 chest X-ray dataset (2,347 images) to test the new Ad CycleGAN. The quantitative metrics include mean squared error (MSE), root mean squared error (RMSE), peak signal-to-noise ratio (PSNR), universal image quality index (UIQI), spatial correlation coefficient (SCC), spectral angle mapper (SAM), visual information fidelity (VIF), Frechet inception distance (FID), and the classification accuracy of the synthetic images. The CycleGAN and variant autoencoder (VAE) are also implemented and evaluated as comparison. The experiment results on malaria blood cell images indicate that the Ad CycleGAN generates more valid images compared to CycleGAN or VAE. The synthetic images by Ad CycleGAN or CycleGAN have better quality than those by VAE. The synthetic images by Ad CycleGAN have the highest accuracy of 99.61%. In the experiment on COVID-19 chest X-ray, the synthetic images by Ad CycleGAN or CycleGAN have higher quality than those generated by variant autoencoder (VAE). However, the synthetic images generated through the homogenous image augmentation process have better quality than those synthesized through the image translation process. The synthetic images by Ad CycleGAN have higher accuracy of 95.31% compared to the accuracy of the images by CycleGAN of 93.75%. In conclusion, the proposed Ad CycleGAN provides a new path to synthesize medical images with desired diagnostic or pathological patterns. It is considered a new approach of conditional GAN with effective control power upon the synthetic image domain. The findings offer a new path to improve the deep neural network performance in medical image processing
    corecore