28 research outputs found

    Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics

    Full text link
    Three recent breakthroughs due to AI in arts and science serve as motivation: An award winning digital image, protein folding, fast matrix multiplication. Many recent developments in artificial neural networks, particularly deep learning (DL), applied and relevant to computational mechanics (solid, fluids, finite-element technology) are reviewed in detail. Both hybrid and pure machine learning (ML) methods are discussed. Hybrid methods combine traditional PDE discretizations with ML methods either (1) to help model complex nonlinear constitutive relations, (2) to nonlinearly reduce the model order for efficient simulation (turbulence), or (3) to accelerate the simulation by predicting certain components in the traditional integration methods. Here, methods (1) and (2) relied on Long-Short-Term Memory (LSTM) architecture, with method (3) relying on convolutional neural networks. Pure ML methods to solve (nonlinear) PDEs are represented by Physics-Informed Neural network (PINN) methods, which could be combined with attention mechanism to address discontinuous solutions. Both LSTM and attention architectures, together with modern and generalized classic optimizers to include stochasticity for DL networks, are extensively reviewed. Kernel machines, including Gaussian processes, are provided to sufficient depth for more advanced works such as shallow networks with infinite width. Not only addressing experts, readers are assumed familiar with computational mechanics, but not with DL, whose concepts and applications are built up from the basics, aiming at bringing first-time learners quickly to the forefront of research. History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics, even in well-known references. Positioning and pointing control of a large-deformable beam is given as an example.Comment: 275 pages, 158 figures. Appeared online on 2023.03.01 at CMES-Computer Modeling in Engineering & Science

    Optimal uncertainty quantification of a risk measurement from a computer code

    Get PDF
    La quantification des incertitudes lors d'une étude de sûreté peut être réalisée en modélisant les paramètres d'entrée du système physique par des variables aléatoires. Afin de propager les incertitudes affectant les entrées, un modèle de simulation numérique reproduisant la physique du système est exécuté avec différentes combinaisons des paramètres d'entrée, générées suivant leur loi de probabilité jointe. Il est alors possible d'étudier la variabilité de la sortie du code, ou d'estimer certaines quantités d'intérêt spécifiques. Le code étant considéré comme une boîte noire déterministe, la quantité d'intérêt dépend uniquement du choix de la loi de probabilité des entrées. Toutefois, cette distribution de probabilité est elle-même incertaine. En général, elle est choisie grâce aux avis d'experts, qui sont subjectifs et parfois contradictoires, mais aussi grâce à des données expérimentales souvent en nombre insuffisant et entachées d'erreurs. Cette variabilité dans le choix de la distribution se propage jusqu'à la quantité d'intérêt. Cette thèse traite de la prise en compte de cette incertitude dite de deuxième niveau. L'approche proposée, connue sous le nom d'Optimal Uncertainty Quantification (OUQ) consiste à évaluer des bornes sur la quantité d'intérêt. De ce fait on ne considère plus une distribution fixée, mais un ensemble de mesures de probabilité sous contraintes de moments sur lequel la quantité d'intérêt est optimisée. Après avoir exposé des résultats théoriques visant à réduire l'optimisation de la quantité d'intérêt aux point extrémaux de l'espace de mesures de probabilité, nous présentons différentes quantités d'intérêt vérifiant les hypothèses du problème. Cette thèse illustre l'ensemble de la méthodologie sur plusieurs cas d'applications, l'un d'eux étant un cas réel étudiant l'évolution de la température de gaine du combustible nucléaire en cas de perte du réfrigérant.Uncertainty quantification in a safety analysis study can be conducted by considering the uncertain inputs of a physical system as a vector of random variables. The most widespread approach consists in running a computer model reproducing the physical phenomenon with different combinations of inputs in accordance with their probability distribution. Then, one can study the related uncertainty on the output or estimate a specific quantity of interest (QoI). Because the computer model is assumed to be a deterministic black-box function, the QoI only depends on the choice of the input probability measure. It is formally represented as a scalar function defined on a measure space. We propose to gain robustness on the quantification of this QoI. Indeed, the probability distributions characterizing the uncertain input may themselves be uncertain. For instance, contradictory expert opinion may make it difficult to select a single probability distribution, and the lack of information in the input variables affects inevitably the choice of the distribution. As the uncertainty on the input distributions propagates to the QoI, an important consequence is that different choices of input distributions will lead to different values of the QoI. The purpose of this thesis is to account for this second level uncertainty. We propose to evaluate the maximum of the QoI over a space of probability measures, in an approach known as optimal uncertainty quantification (OUQ). Therefore, we do not specify a single precise input distribution, but rather a set of admissible probability measures defined through moment constraints. The QoI is then optimized over this measure space. After exposing theoretical results showing that the optimization domain of the QoI can be reduced to the extreme points of the measure space, we present several interesting quantities of interest satisfying the assumption of the problem. This thesis illustrates the methodology in several application cases, one of them being a real nuclear engineering case that study the evolution of the peak cladding temperature of fuel rods in case of an intermediate break loss of coolant accident

    Reconhecimento de padrões em expressões faciais : algoritmos e aplicações

    Get PDF
    Orientador: Hélio PedriniTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O reconhecimento de emoções tem-se tornado um tópico relevante de pesquisa pela comunidade científica, uma vez que desempenha um papel essencial na melhoria contínua dos sistemas de interação humano-computador. Ele pode ser aplicado em diversas áreas, tais como medicina, entretenimento, vigilância, biometria, educação, redes sociais e computação afetiva. Há alguns desafios em aberto relacionados ao desenvolvimento de sistemas emocionais baseados em expressões faciais, como dados que refletem emoções mais espontâneas e cenários reais. Nesta tese de doutorado, apresentamos diferentes metodologias para o desenvolvimento de sistemas de reconhecimento de emoções baseado em expressões faciais, bem como sua aplicabilidade na resolução de outros problemas semelhantes. A primeira metodologia é apresentada para o reconhecimento de emoções em expressões faciais ocluídas baseada no Histograma da Transformada Census (CENTRIST). Expressões faciais ocluídas são reconstruídas usando a Análise Robusta de Componentes Principais (RPCA). A extração de características das expressões faciais é realizada pelo CENTRIST, bem como pelos Padrões Binários Locais (LBP), pela Codificação Local do Gradiente (LGC) e por uma extensão do LGC. O espaço de características gerado é reduzido aplicando-se a Análise de Componentes Principais (PCA) e a Análise Discriminante Linear (LDA). Os algoritmos K-Vizinhos mais Próximos (KNN) e Máquinas de Vetores de Suporte (SVM) são usados para classificação. O método alcançou taxas de acerto competitivas para expressões faciais ocluídas e não ocluídas. A segunda é proposta para o reconhecimento dinâmico de expressões faciais baseado em Ritmos Visuais (VR) e Imagens da História do Movimento (MHI), de modo que uma fusão de ambos descritores codifique informações de aparência, forma e movimento dos vídeos. Para extração das características, o Descritor Local de Weber (WLD), o CENTRIST, o Histograma de Gradientes Orientados (HOG) e a Matriz de Coocorrência em Nível de Cinza (GLCM) são empregados. A abordagem apresenta uma nova proposta para o reconhecimento dinâmico de expressões faciais e uma análise da relevância das partes faciais. A terceira é um método eficaz apresentado para o reconhecimento de emoções audiovisuais com base na fala e nas expressões faciais. A metodologia envolve uma rede neural híbrida para extrair características visuais e de áudio dos vídeos. Para extração de áudio, uma Rede Neural Convolucional (CNN) baseada no log-espectrograma de Mel é usada, enquanto uma CNN construída sobre a Transformada de Census é empregada para a extração das características visuais. Os atributos audiovisuais são reduzidos por PCA e LDA, então classificados por KNN, SVM, Regressão Logística (LR) e Gaussian Naïve Bayes (GNB). A abordagem obteve taxas de reconhecimento competitivas, especialmente em dados espontâneos. A penúltima investiga o problema de detectar a síndrome de Down a partir de fotografias. Um descritor geométrico é proposto para extrair características faciais. Experimentos realizados em uma base de dados pública mostram a eficácia da metodologia desenvolvida. A última metodologia trata do reconhecimento de síndromes genéticas em fotografias. O método visa extrair atributos faciais usando características de uma rede neural profunda e medidas antropométricas. Experimentos são realizados em uma base de dados pública, alcançando taxas de reconhecimento competitivasAbstract: Emotion recognition has become a relevant research topic by the scientific community, since it plays an essential role in the continuous improvement of human-computer interaction systems. It can be applied in various areas, for instance, medicine, entertainment, surveillance, biometrics, education, social networks, and affective computing. There are some open challenges related to the development of emotion systems based on facial expressions, such as data that reflect more spontaneous emotions and real scenarios. In this doctoral dissertation, we propose different methodologies to the development of emotion recognition systems based on facial expressions, as well as their applicability in the development of other similar problems. The first is an emotion recognition methodology for occluded facial expressions based on the Census Transform Histogram (CENTRIST). Occluded facial expressions are reconstructed using an algorithm based on Robust Principal Component Analysis (RPCA). Extraction of facial expression features is then performed by CENTRIST, as well as Local Binary Patterns (LBP), Local Gradient Coding (LGC), and an LGC extension. The generated feature space is reduced by applying Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) algorithms are used for classification. This method reached competitive accuracy rates for occluded and non-occluded facial expressions. The second proposes a dynamic facial expression recognition based on Visual Rhythms (VR) and Motion History Images (MHI), such that a fusion of both encodes appearance, shape, and motion information of the video sequences. For feature extraction, Weber Local Descriptor (WLD), CENTRIST, Histogram of Oriented Gradients (HOG), and Gray-Level Co-occurrence Matrix (GLCM) are employed. This approach shows a new direction for performing dynamic facial expression recognition, and an analysis of the relevance of facial parts. The third is an effective method for audio-visual emotion recognition based on speech and facial expressions. The methodology involves a hybrid neural network to extract audio and visual features from videos. For audio extraction, a Convolutional Neural Network (CNN) based on log Mel-spectrogram is used, whereas a CNN built on Census Transform is employed for visual extraction. The audio and visual features are reduced by PCA and LDA, and classified through KNN, SVM, Logistic Regression (LR), and Gaussian Naïve Bayes (GNB). This approach achieves competitive recognition rates, especially in a spontaneous data set. The second last investigates the problem of detecting Down syndrome from photographs. A geometric descriptor is proposed to extract facial features. Experiments performed on a public data set show the effectiveness of the developed methodology. The last methodology is about recognizing genetic disorders in photos. This method focuses on extracting facial features using deep features and anthropometric measurements. Experiments are conducted on a public data set, achieving competitive recognition ratesDoutoradoCiência da ComputaçãoDoutora em Ciência da Computação140532/2019-6CNPQCAPE

    Techniques in Ordinal Classification and Image-to-Image Translation

    Get PDF
    Dans cette thèse, nous explorons deux thèmes de recherche dans le domaine de l’apprentissage en profondeur et de l’imagerie médicale. La première est dans la classification ordinale, dans laquelle les classes à prévoir sont discrètes mais ont une relation d’ordonnancement. Les distributions de probabilités sous les classes ordinales peuvent posséder des propriétés indésirables, comme la non-unimodalité. Nous proposons une technique simple pour contraindre les distributions de probabilités ordinales discrètes à être unimodales par l’utilisation des distributions de Poisson et des distributions de probabilités binomiales. Nous évaluons cette approche sur la base d’une estimation de l’âge et d’un ensemble de données Kaggle sur la rétinopathie diabétique et obtenons des résultats compétitifs. Nous supposons que la contrainte d’unimodalité – en plus de rendre les distributions de probabilité plus interprétables – agit comme un régularisateur qui peut atténuer le dépassement, surtout dans un régime de données faible. Dans le second thème, nous explorons la traduction d’image à image contradictoire et motivons leur utilité dans le cadre d’un apprentissage semi-supervisé. Nous évaluons une méthode existante et en proposons une nouvelle que nous évaluons sur plusieurs bases de données comme celles utilisées dans notre travail sur la classification ordinale. Dans ce dernier cas, nous voulons établir une correspondance entre le domaine des scanners de patients symptomatiques et celui des scanners de patients non symptomatiques. Cela forme effectivement un modèle qui peut démêler les facteurs de variation sous-jacents et apprendre à détecter et à supprimer les zones symptomatiques de l’image, ce qui pourrait être exploité de plusieurs façons, comme aider un réseau qui s’appuie sur des étiquettes riches, ou générer des exemples synthétiques. Nous présentons des résultats qualitatifs intéressants et motivons plusieurs pistes prometteuses pour l’avenir.----------ABSTRACT: In this thesis we explore two research topics within the realm of deep learning and medical imaging. The first is in ordinal classification, in which the classes to be predicted are discrete but have an ordering relation. Probability distributions under ordinal classes can possess undesired properties, such as non-unimodality. We propose a straightforward technique to constrain discrete ordinal probability distributions to be unimodal via the use of the Poisson and binomial probability distributions. We evaluate this approach on an age estimation and Kaggle diabetic retinopathy dataset and obtain competitive results. We conjecture that the unimodality constraint – in addition to making the probability distributions more interpretable – acts as a regulariser which can mitigate overfitting, especially in a low data regime. In the second topic, we explore adversarial image-to-image translation and motivate their utility within the framework of semi-supervised learning. We evaluate an existing method and propose a new one which we evaluate on several datasets such as the ones employed in our work on ordinal classification. In the case of the latter, we want to map from the domain of symptomatic patient scans to non-symptomatic patient scans. This effectively trains a model which can disentangle the underlying factors of variation and learn to detect and remove symptomatic regions in the image, which could be leveraged in several ways, such as aiding a network which relies on rich labels, or generating synthetic examples. We present some interesting qualitative results and motivate several promising avenues to take for the future

    Challenges in nonlinear structural dynamics: New optimisation perspectives

    Get PDF
    nalysis of structural dynamics is of fundamental importance to countless engineering applications. Analyses in both research and industrial settings have traditionally relied on linear or close to linear approximations of the underlying physics. Perhaps the most pervasive framework, modal analysis, has become the default framework for consideration of linear dynamics. Modern hardware and software solutions have placed linear analysis of structural dynamics squarely in the mainstream. However, as demands for stronger and lighter structures increase, and as advanced manufacturing enables more and more intricate geometries, the assumption of linearity is becoming less and less realistic. This thesis envisages three grand challenges for the treatment of nonlinearity in structural dynamics. These are: nonlinear system identification, exact solutions to nonlinear differential equations, and a nonlinear extension to linear modal analysis. Of these challenges, this thesis presents results pertaining to the latter two. The first component of this thesis is the consideration of methods that may yield exact solutions to nonlinear differential equations. Here, the task of finding an exact solution is cast as a heuristic search problem. The structure of the search problem is analysed with a view to motivate methods that are predisposed to finding exact solutions. To this end, a novel methodology, the affine regression tree, is proposed. The novel approach is compared against alternatives from the literature in an expansive benchmark study. Also considered, are nonlinear extensions to linear modal analysis. Historically, several frameworks have been proposed, each of which is able to retain only a subset of the properties of the linear case. It is argued here that retention of the utilities of linear modal analysis should be viewed as the criteria for a practical nonlinear modal decomposition. A promising direction is seen to be the recently-proposed framework of Worden and Green. The approach takes a machine-learning viewpoint that requires statistical independence between the modal coordinates. In this thesis, a robust consideration of the method from several directions is attempted. Results from several analyses demonstrate that statistical-independence and other inductive biases can be sufficient for a meaningful nonlinear modal decomposition, opening the door to a practical, nonlinear extension to modal analysis. The results in this thesis take small but positive steps towards two pressing challenges facing nonlinear structural dynamics. It is hoped that further work will be able to build upon the results presented here to develop a greater understanding and treatment of nonlinearity in structural dynamics and elsewhere

    Uncertainty Quantification in Biophotonic Imaging using Invertible Neural Networks

    Get PDF
    Owing to high stakes in the field of healthcare, medical machine learning (ML) applications have to adhere to strict safety standards. In particular, their performance needs to be robust toward volatile clinical inputs. The aim of the work presented in this thesis was to develop a framework for uncertainty handling in medical ML applications as a way to increase their robustness and trustworthiness. In particular, it addresses three root causes for lack of robustness that can be deemed central to the successful clinical translation of ML methods: First, many tasks in medical imaging can be phrased in the language of inverse problems. Most common ML methods aimed at solving such inverse problems implicitly assume that they are well-posed, especially that the problem has a unique solution. However, the solution might be ambiguous. In this thesis, we introduce a data-driven method for analyzing the well-posedness of inverse problems. In addition, we propose a framework to validate the suggested method in a problem-aware manner. Second, simulation is an important tool for the development of medical ML systems due to small in vivo data sets and/or a lack of annotated references (e. g. spatially resolved blood oxygenation (sO 2 )). However, simulation introduces a new uncertainty to the ML pipeline as ML performance guarantees generally rely on the testing data being sufficiently similar to the training data. This thesis addresses the uncertainty by quantifying the domain gap between training and testing data via an out-of-distribution (OoD) detection approach. Third, we introduce a new paradigm for medical ML based on personalized models. In a data-scarce regime with high inter-patient variability, classical ML models cannot be assumed to generalize well to new patients. To overcome this problem, we propose to train ML models on a per-patient basis. This approach circumvents the inter-patient variability, but it requires training without a supervision signal. We address this issue via OoD detection, where the current status quo is encoded as in-distribution (ID) using a personalized ML model. Changes to the status quo are then detected as OoD. While these three facets might seem distinct, the suggested framework provides a unified view of them. The enabling technology is the so-called invertible neural network (INN), which can be used as a flexible and expressive (conditional) density estimator. In this way, they can encode solutions to inverse problems as a probability distribution as well as tackle OoD detection tasks via density-based scores, like the widely applicable information criterion (WAIC). The present work validates our framework on the example of biophotonic imaging. Biophotonic imaging promises the estimation of tissue parameters such as sO 2 in a non-invasive way by evaluating the “fingerprint” of the tissue in the light spectrum. We apply our framework to analyze the well-posedness of the tissue parameter estimation problem at varying spectral and spatial resolutions. We find that with sufficient spectral and/or spatial context, the sO 2 estimation problem is well-posed. Furthermore, we examine the realism of simulated biophotonic data using the proposed OoD approach to gauge the generalization capabilities of our ML models to in vivo data. Our analysis shows a considerable remaining domain gap between the in silico and in vivo spectra. Lastly, we validate the personalized ML approach on the example of non-invasive ischemia monitoring in minimally invasive kidney surgery, for which we developed the first-in-human laparoscopic multispectral imaging system. In our study, we find a strong OoD signal between perfused and ischemic kidney spectra. Furthermore, the proposed approach is video-rate capable. In conclusion, we successfully developed a framework for uncertainty handling in medical ML and validated it using a diverse set of medical ML tasks, highlighting the flexibility and potential impact of our approach. The framework opens the door to robust solutions to applications like (recording) device design, quality control for simulation pipelines, and personalized video-rate tissue parameter monitoring. In this way, this thesis facilitates the development of the next generation of trustworthy ML systems in medicine

    Advances in deep learning with limited supervision and computational resources

    Full text link
    Les réseaux de neurones profonds sont la pierre angulaire des systèmes à la fine pointe de la technologie pour une vaste gamme de tâches, comme la reconnaissance d'objets, la modélisation du langage et la traduction automatique. Mis à part le progrès important établi dans les architectures et les procédures de formation des réseaux de neurones profonds, deux facteurs ont été la clé du succès remarquable de l'apprentissage profond : la disponibilité de grandes quantités de données étiquetées et la puissance de calcul massive. Cette thèse par articles apporte plusieurs contributions à l'avancement de l'apprentissage profond, en particulier dans les problèmes avec très peu ou pas de données étiquetées, ou avec des ressources informatiques limitées. Le premier article aborde la question de la rareté des données dans les systèmes de recommandation, en apprenant les représentations distribuées des produits à partir des commentaires d'évaluation de produits en langage naturel. Plus précisément, nous proposons un cadre d'apprentissage multitâches dans lequel nous utilisons des méthodes basées sur les réseaux de neurones pour apprendre les représentations de produits à partir de textes de critiques de produits et de données d'évaluation. Nous démontrons que la méthode proposée peut améliorer la généralisation dans les systèmes de recommandation et atteindre une performance de pointe sur l'ensemble de données Amazon Reviews. Le deuxième article s'attaque aux défis computationnels qui existent dans l'entraînement des réseaux de neurones profonds à grande échelle. Nous proposons une nouvelle architecture de réseaux de neurones conditionnels permettant d'attribuer la capacité du réseau de façon adaptative, et donc des calculs, dans les différentes régions des entrées. Nous démontrons l'efficacité de notre modèle sur les tâches de reconnaissance visuelle où les objets d'intérêt sont localisés à la couche d'entrée, tout en maintenant une surcharge de calcul beaucoup plus faible que les architectures standards des réseaux de neurones. Le troisième article contribue au domaine de l'apprentissage non supervisé, avec l'aide du paradigme des réseaux antagoniste génératifs. Nous introduisons un cadre fléxible pour l'entraînement des réseaux antagonistes génératifs, qui non seulement assure que le générateur estime la véritable distribution des données, mais permet également au discriminateur de conserver l'information sur la densité des données à l'optimum global. Nous validons notre cadre empiriquement en montrant que le discriminateur est capable de récupérer l'énergie de la distribution des données et d'obtenir une qualité d'échantillons à la fine pointe de la technologie. Enfin, dans le quatrième article, nous nous attaquons au problème de l'apprentissage non supervisé à travers différents domaines. Nous proposons un modèle qui permet d'apprendre des transformations plusieurs à plusieurs à travers deux domaines, et ce, à partir des données non appariées. Nous validons notre approche sur plusieurs ensembles de données se rapportant à l'imagerie, et nous montrons que notre méthode peut être appliquée efficacement dans des situations d'apprentissage semi-supervisé.Deep neural networks are the cornerstone of state-of-the-art systems for a wide range of tasks, including object recognition, language modelling and machine translation. In the last decade, research in the field of deep learning has led to numerous key advances in designing novel architectures and training algorithms for neural networks. However, most success stories in deep learning heavily relied on two main factors: the availability of large amounts of labelled data and massive computational resources. This thesis by articles makes several contributions to advancing deep learning, specifically in problems with limited or no labelled data, or with constrained computational resources. The first article addresses sparsity of labelled data that emerges in the application field of recommender systems. We propose a multi-task learning framework that leverages natural language reviews in improving recommendation. Specifically, we apply neural-network-based methods for learning representations of products from review text, while learning from rating data. We demonstrate that the proposed method can achieve state-of-the-art performance on the Amazon Reviews dataset. The second article tackles computational challenges in training large-scale deep neural networks. We propose a conditional computation network architecture which can adaptively assign its capacity, and hence computations, across different regions of the input. We demonstrate the effectiveness of our model on visual recognition tasks where objects are spatially localized within the input, while maintaining much lower computational overhead than standard network architectures. The third article contributes to the domain of unsupervised learning with the generative adversarial networks paradigm. We introduce a flexible adversarial training framework, in which not only the generator converges to the true data distribution, but also the discriminator recovers the relative density of the data at the optimum. We validate our framework empirically by showing that the discriminator is able to accurately estimate the true energy of data while obtaining state-of-the-art quality of samples. Finally, in the fourth article, we address the problem of unsupervised domain translation. We propose a model which can learn flexible, many-to-many mappings across domains from unpaired data. We validate our approach on several image datasets, and we show that it can be effectively applied in semi-supervised learning settings

    Auto-Encoders, Distributed Training and Information Representation in Deep Neural Networks

    Get PDF
    L'objectif de cette thèse est de présenter ma modeste contribution à l'effort collectif de l'humanité pour comprendre l'intelligence et construire des machines intelligentes. Ceci est une thèse par articles (cinq au total), tous représentant une entreprise personnelle dans laquelle j'ai consacré beaucoup d'énergie. Les articles sont présentés en ordre chronologique, et ils touchent principalement à deux sujets : l'apprentissage de représentations et l'optimisation. Les articles des chapitres 3, 5 et 9 sont dans la première catégorie, et ceux des chapitres 7 et 11 sont dans la seconde catégorie. Dans le premier article, nous partons de l'idée de modéliser la géométrie des données en entraînant un auto-encodeur débruitant qui reconstruit les données après qu'on les ait perturbées. Nous établissons un lien entre les auto-encodeurs contractifs et les auto-encodeurs débruitants. Notre contribution majeure consiste à démontrer mathématiquement une propriété intéressante qu'ont les solutions optimales aux auto-encodeurs débruitants lorsqu'ils sont définis à partir de bruit additif gaussien. Plus spécifiquement, nous démontrons qu'ils apprennent le score de la densité de probabilité. Nous présentons un ensemble de méthodes pratiques par lesquelles ce résultat nous permet de transformer un auto-encodeur en modèle génératif. Nous menons certaines expériences dans le but d'apprendre la géométrie locale des distributions de données. Dans le second article, nous continuons dans la même ligne d'idées en construisant un modèle génératif basé sur l'apprentissage de distributions conditionnelles. Cet exercice se fait dans un cadre plus général et nous nous concentrons sur les propriétés de la chaine de Markov obtenu par échantillonnage de Gibbs. à l'aide d'une petite modification lors de la construction de la chaine de Markov, nous obtenons un modèle que l'on nomme "Generative Stochastic Networks". Plusieurs copies de ce modèle peuvent se combiner pour créer une hiérarchie de représentations abstraites servant à mieux représenter la nature des données. Nous présentons des expériences sur l'ensemble de données MNIST et sur le remplissage d'images trouées. Dans notre troisième article, nous présentons un nouveau paradigme pour l'optimisation parallèle. Nous proposons d'utiliser un ensemble de noeuds de calcul pour évaluer les coefficients nécessaires à faire de l'échantillonnage préférentiel sur les données d'entraînement. Cette idée ressemble beaucoup à l'apprentissage avec curriculum qui est une méthode dans laquelle l'ordre des données fournies au modèle est choisi avec beaucoup de soin dans le but de faciliter l'apprentissage. Nous comparons les résultats expérimentaux observés à ceux anticipés en terme de réduction de variance sur les gradients. Dans notre quatrième article, nous revenons au concept d'apprentissage de représentations et nous cherchons à savoir s'il serait possible de définir une notion utile de "contenu en information" dans le contexte de couches de réseaux neuronaux. Ceci nous intéresse en particulier parce qu'il y a une sorte de paradoxe avec les réseaux profonds qui sont déterministes. Les couches les plus profondes ont des meilleures représentations que les premières couches, mais si l'on regarde strictement avec le point de vue de l'entropie (venant de la théorie de l'information) il est impossible qu'une couche plus profonde contienne plus d'information qu'une couche à l'entrée. Nous développons une méthode d'entraînement de classifieur linéaire sur chaque couche du modèle étudié (dont les paramètres sont maintenant figés pendant l'étude). Nous appelons ces classifeurs des "sondes linéaires de classification", et nous nous en servons pour mieux comprendre la dynamique particulière de l'entraînement d'un réseau profond. Nous présentons des expériences menées sur des gros modèles (Inception v3 et ResNet-50), et nous découvrons une propriété étonnante : la performance de ces sondes augmente de manière monotone lorsque l'on descend dans les couches plus profondes. Dans le cinquième article, nous retournons à l'optimisation, et nous étudions la courbure de l'espace de la fonction de perte. Nous regardons les vecteurs propres dominants de la matrice hessienne, et nous explorons les gains potentiels dans ces directions s'il était possible de faire un pas d'une longueur optimale. Nous sommes principalement intéressés par les gains dans les directions associées aux valeurs propres négatives car celles-ci sont généralement ignorées par les méthodes populaire d'optimisation convexes. L'étude de la matrice hessienne demande des coûts énormes en calcul, et nous devons nous limiter à des expérience sur les données MNIST. Nous découvrons que des gains très importants peuvent être réalisés dans les directions de courbure négative, et que les longueurs de pas optimales sont beaucoup plus grandes que celles suggérées par la littérature existante.The goal of this thesis is to present a body of work that serves as my modest contribution to humanity's quest to understand intelligence and to implement intelligent systems. This is a thesis by articles, containing five articles, not all of equal impact, but all representing a very meaningful personal endeavor. The articles are presented in chronological order, and they cluster around two general topics : representation learning and optimization. Articles from chapters 3, 5, and 9 are in the former category, whereas articles from chapters 7 and 11 are in the latter. In the first article, we start with the idea of manifold learning through training a denoising auto-encoder to locally reconstruct data after perturbations. We establish a connection between contractive auto-encoders and denoising auto-encoders. More importantly, we prove mathematically a very interesting property from the optimal solution to denoising auto-encoders with additive gaussian noise. Namely, the fact that they learn exactly the score of the probability density function of the training distribution. We present a collection of ways in which this allows us to turn an auto-encoder into a generative model. We provide experiments all related to the goal of local manifold learning. In the second article, we continue with that idea of building a generative model by learning conditional distributions. We do that in a more general setting and we focus more on the properties of the Markov chain obtained by Gibbs sampling. With a small modification in the construction of the Markov chain, we obtain the more general "Generative Stochastic Networks", which we can then stack together into a structure that can represent more accurately the different levels of abstraction of the data modeled. We present experiments involving the generation of MNIST digits and image inpainting. In the third article, we present a novel idea for distributed optimization. Our proposal uses a collection of worker nodes to compute the importance weights to be used by one master node to perform Importance Sampling. This paradigm has a lot in common with the idea of curriculum learning, whereby the order of training examples is taken to have a significant impact on the training performance. We present results to compare the potential reduction in variance for gradient estimates with the practical reduction in variance observed. In the fourth article, we go back to the concept of representation learning by asking whether there would be any measurable quantity in a neural network layer that would correspond intuitively to its "information contents". This is particularly interesting because there is a kind of paradox in deterministic neural networks : deeper layers encode better representations of the input signal, but they carry less (or equal) information than the raw inputs (in terms of entropy). By training a linear classifier on every layer in a neural network (with frozen parameters), we are able to measure linearly separability of the representations at every layer. We call these "linear classifier probes", and we show how they can be used to better understand the dynamics of training a neural network. We present experiments with large models (Inception v3 and ResNet-50) and uncover a surprizing property : linear separability increases in a strictly monotonic relationship with the layer depth. In the fifth article, we revisit optimization again, but now we study the negative curvature of the loss function. We look at the most dominant eigenvalues and eigenvectors of the Hessian matrix, and we explore the gains to be made by modifying the model parameters along that direction with an optimal step size. We are mainly interested in the potential gains for directions of negative curvature, because those are ignored by the very popular convex optimization methods used by the deep learning community. Due to the large computational costs of anything dealing with the Hessian matrix, we run a small model on MNIST. We find that large gains can be made in directions of negative curvature, and that the optimal step sizes involved are larger than the current literature would recommend
    corecore