80 research outputs found

    The Generation of Synthetic Healthcare Data Using Deep Neural Networks

    Get PDF
    High-quality tabular data is a crucial requirement for developing data-driven applications, especially healthcare-related ones, because most of the data nowadays collected in this context is in tabular form. However, strict data protection laws introduced in Health Insurance Portability and Accountability (HIPAA) and General Data Protection Regulation (GDPR) present many obstacles to accessing and doing scientific research on healthcare datasets to protect patients’ privacy and confidentiality. Thus, synthetic data has become an ideal alternative for data scientists and healthcare professionals to circumvent such hurdles. Although many healthcare data providers still use the classical de-identification and anonymization techniques for generating synthetic data, deep learning-based generative models such as Generative Adversarial Networks (GANs) have shown a remarkable performance in generating tabular datasets with complex structures. Thus, this thesis examines the GANs’ potential and applicability within the healthcare industry, which often faces serious challenges with insufficient training data and patient records sensitivity. We investigate several state-of-the-art GAN-based models proposed for tabular synthetic data generation. Precisely, we assess the performance of TGAN, CTGAN, CTABGAN and WGAN-GP models on healthcare datasets with different sizes, numbers of variables, column data types, feature distributions, and inter-variable correlations. Moreover, a comprehensive evaluation framework is defined to evaluate the quality of the synthetic records and the viability of each model in preserving the patients’ privacy. After training the selected models and generating synthetic datasets, we evaluate the strengths and weaknesses of each model based on the statistical similarity metrics, machine learning-based evaluation scores, and distance-based privacy metrics. The results indicate that the proposed models can generate datasets that maintain the statistical characteristics, model compatibility, and privacy of the original ones. Moreover, synthetic tabular healthcare datasets can be a viable option in many data-driven applications. However, there is still room for further improvements in designing a perfect architecture for generating synthetic tabular data

    GAN Hyperparameters search through Genetic Algorithm

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceRecent developments in Deep Learning are remarkable when it comes to generative models. The main reason for such progress is because of Generative Adversarial Networks (GANs) [1]. Introduced in a paper by Ian Goodfellow in 2014 GANs are machine learning models that are made of two neural networks: a Generator and a Discriminator. These two compete amongst each other to generate new, synthetic instances of data that resemble the real one. Despite their great potential, there are present challenges in their training, which include training instability, mode collapse, and vanishing gradient. A lot of research has been done on how to overcome these challenges, however, there was no significant proof found on whether modern techniques consistently outperform vanilla GAN. The performances of GANs are also highly dependent on the dataset they are trained on. One of the main challenges is related to the search for hyperparameters. In this thesis, we try to overcome this challenge by applying an evolutionary algorithm to search for the best hyperparameters for a WGAN. We use Kullback-Leibler divergence to calculate the fitness of the individuals, and in the end, we select the best set of parameters generated by the evolutionary algorithm. The parameters of the best-selected individuals are maintained throughout the generations. We compare our approach with the standard hyperparameters given by the state-of-art

    Adversarial games in machine learning : challenges and applications

    Full text link
    L’apprentissage automatique repose pour un bon nombre de problèmes sur la minimisation d’une fonction de coût, pour ce faire il tire parti de la vaste littérature sur l’optimisation qui fournit des algorithmes et des garanties de convergences pour ce type de problèmes. Cependant récemment plusieurs modèles d’apprentissage automatique qui ne peuvent pas être formulé comme la minimisation d’un coût unique ont été propose, à la place ils nécessitent de définir un jeu entre plusieurs joueurs qui ont chaque leur propre objectif. Un de ces modèles sont les réseaux antagonistes génératifs (GANs). Ce modèle génératif formule un jeu entre deux réseaux de neurones, un générateur et un discriminateur, en essayant de tromper le discriminateur qui essaye de distinguer les vraies images des fausses, le générateur et le discriminateur s’améliore résultant en un équilibre de Nash, ou les images produites par le générateur sont indistinguable des vraies images. Malgré leur succès les GANs restent difficiles à entrainer à cause de la nature antagoniste du jeu, nécessitant de choisir les bons hyperparamètres et résultant souvent en une dynamique d’entrainement instable. Plusieurs techniques de régularisations ont été propose afin de stabiliser l’entrainement, dans cette thèse nous abordons ces instabilités sous l’angle d’un problème d’optimisation. Nous commençons par combler le fossé entre la littérature d’optimisation et les GANs, pour ce faire nous formulons GANs comme un problème d’inéquation variationnelle, et proposons de la littérature sur le sujet pour proposer des algorithmes qui convergent plus rapidement. Afin de mieux comprendre quels sont les défis de l’optimisation des jeux, nous proposons plusieurs outils afin d’analyser le paysage d’optimisation des GANs. En utilisant ces outils, nous montrons que des composantes rotationnelles sont présentes dans le voisinage des équilibres, nous observons également que les GANs convergent rarement vers un équilibre de Nash mais converge plutôt vers des équilibres stables locaux (LSSP). Inspirer par le succès des GANs nous proposons pour finir, une nouvelle famille de jeux que nous appelons adversarial example games qui consiste à entrainer simultanément un générateur et un critique, le générateur cherchant à perturber les exemples afin d’induire en erreur le critique, le critique cherchant à être robuste aux perturbations. Nous montrons qu’à l’équilibre de ce jeu, le générateur est capable de générer des perturbations qui transfèrent à toute une famille de modèles.Many machine learning (ML) problems can be formulated as minimization problems, with a large optimization literature that provides algorithms and guarantees to solve this type of problems. However, recently some ML problems have been proposed that cannot be formulated as minimization problems but instead require to define a game between several players where each player has a different objective. A successful application of such games in ML are generative adversarial networks (GANs), where generative modeling is formulated as a game between a generator and a discriminator, where the goal of the generator is to fool the discriminator, while the discriminator tries to distinguish between fake and real samples. However due to the adversarial nature of the game, GANs are notoriously hard to train, requiring careful fine-tuning of the hyper-parameters and leading to unstable training. While regularization techniques have been proposed to stabilize training, we propose in this thesis to look at these instabilities from an optimization perspective. We start by bridging the gap between the machine learning and optimization literature by casting GANs as an instance of the Variational Inequality Problem (VIP), and leverage the large literature on VIP to derive more efficient and stable algorithms to train GANs. To better understand what are the challenges of training GANs, we then propose tools to study the optimization landscape of GANs. Using these tools we show that GANs do suffer from rotation around their equilibrium, and that they do not converge to Nash-Equilibria. Finally inspired by the success of GANs to generate images, we propose a new type of games called Adversarial Example Games that are able to generate adversarial examples that transfer across different models and architectures
    • …
    corecore