59 research outputs found

    Sparse feature learning for image analysis in segmentation, classification, and disease diagnosis.

    Get PDF
    The success of machine learning algorithms generally depends on intermediate data representation, called features that disentangle the hidden factors of variation in data. Moreover, machine learning models are required to be generalized, in order to reduce the specificity or bias toward the training dataset. Unsupervised feature learning is useful in taking advantage of large amount of unlabeled data, which is available to capture these variations. However, learned features are required to capture variational patterns in data space. In this dissertation, unsupervised feature learning with sparsity is investigated for sparse and local feature extraction with application to lung segmentation, interpretable deep models, and Alzheimer\u27s disease classification. Nonnegative Matrix Factorization, Autoencoder and 3D Convolutional Autoencoder are used as architectures or models for unsupervised feature learning. They are investigated along with nonnegativity, sparsity and part-based representation constraints for generalized and transferable feature extraction

    Machine learning approaches to model cardiac shape in large-scale imaging studies

    Get PDF
    Recent improvements in non-invasive imaging, together with the introduction of fully-automated segmentation algorithms and big data analytics, has paved the way for large-scale population-based imaging studies. These studies promise to increase our understanding of a large number of medical conditions, including cardiovascular diseases. However, analysis of cardiac shape in such studies is often limited to simple morphometric indices, ignoring large part of the information available in medical images. Discovery of new biomarkers by machine learning has recently gained traction, but often lacks interpretability. The research presented in this thesis aimed at developing novel explainable machine learning and computational methods capable of better summarizing shape variability, to better inform association and predictive clinical models in large-scale imaging studies. A powerful and flexible framework to model the relationship between three-dimensional (3D) cardiac atlases, encoding multiple phenotypic traits, and genetic variables is first presented. The proposed approach enables the detection of regional phenotype-genotype associations that would be otherwise neglected by conventional association analysis. Three learning-based systems based on deep generative models are then proposed. In the first model, I propose a classifier of cardiac shapes which exploits task-specific generative shape features, and it is designed to enable the visualisation of the anatomical effect these features encode in 3D, making the classification task transparent. The second approach models a database of anatomical shapes via a hierarchy of conditional latent variables and it is capable of detecting, quantifying and visualising onto a template shape the most discriminative anatomical features that characterize distinct clinical conditions. Finally, a preliminary analysis of a deep learning system capable of reconstructing 3D high-resolution cardiac segmentations from a sparse set of 2D views segmentations is reported. This thesis demonstrates that machine learning approaches can facilitate high-throughput analysis of normal and pathological anatomy and of its determinants without losing clinical interpretability.Open Acces

    Engineering analytics through explainable deep learning

    Get PDF
    Pattern recognition has its origins in engineering while machine learning developed from computer science. Today, artificial intelligence (AI) is a booming field with many practical applications and active research topics that deals with both pattern recognition and machine learning. We now use softwares and applications to automate routine labor, understand speech (using Natural Language Processing) or images (extracting hierarchical features and patterns for object detection and pattern recognition), make diagnoses in medicine, even intricate surgical procedures and support basic scientific research. This thesis deals with exploring the application of a specific branch of AI, or a specific tool, Deep Learning (DL) to real world engineering problems which otherwise had been difficult to solve using existing methods till date. Here we focus on different Deep Learning based methods to deal with two such engineering problems. We also explore the inner workings of such models through an explanation stage for each of the applied DL based strategies that gives us a sense of how such typical black box models work, or as we call it, an explanation stage for the DL model. This explanation framework is an important step as previously, Deep Learning based models were thought to be frameworks which produce good results (classification, object detection, object recognition to name a few), but with no explanations or immediately visible causes as to why it achieves the results it does. This made Deep Learning based models hard to trust amongst the scientific community. In this thesis, we aim to achieve just that by deploying two such explanation frameworks, one for a 2D image study case and another for a 3D image voxel study case, which will be discussed later in the subsequent chapters

    Adversarial content manipulation for analyzing and improving model robustness

    Get PDF
    The recent rapid progress in machine learning systems has opened up many real-world applications --- from recommendation engines on web platforms to safety critical systems like autonomous vehicles. A model deployed in the real-world will often encounter inputs far from its training distribution. For example, a self-driving car might come across a black stop sign in the wild. To ensure safe operation, it is vital to quantify the robustness of machine learning models to such out-of-distribution data before releasing them into the real-world. However, the standard paradigm of benchmarking machine learning models with fixed size test sets drawn from the same distribution as the training data is insufficient to identify these corner cases efficiently. In principle, if we could generate all valid variations of an input and measure the model response, we could quantify and guarantee model robustness locally. Yet, doing this with real world data is not scalable. In this thesis, we propose an alternative, using generative models to create synthetic data variations at scale and test robustness of target models to these variations. We explore methods to generate semantic data variations in a controlled fashion across visual and text modalities. We build generative models capable of performing controlled manipulation of data like changing visual context, editing appearance of an object in images or changing writing style of text. Leveraging these generative models we propose tools to study robustness of computer vision systems to input variations and systematically identify failure modes. In the text domain, we deploy these generative models to improve diversity of image captioning systems and perform writing style manipulation to obfuscate private attributes of the user. Our studies quantifying model robustness explore two kinds of input manipulations, model-agnostic and model-targeted. The model-agnostic manipulations leverage human knowledge to choose the kinds of changes without considering the target model being tested. This includes automatically editing images to remove objects not directly relevant to the task and create variations in visual context. Alternatively, in the model-targeted approach the input variations performed are directly adversarially guided by the target model. For example, we adversarially manipulate the appearance of an object in the image to fool an object detector, guided by the gradients of the detector. Using these methods, we measure and improve the robustness of various computer vision systems -- specifically image classification, segmentation, object detection and visual question answering systems -- to semantic input variations.Der schnelle Fortschritt von Methoden des maschinellen Lernens hat viele neue Anwendungen ermöglicht – von Recommender-Systemen bis hin zu sicherheitskritischen Systemen wie autonomen Fahrzeugen. In der realen Welt werden diese Systeme oft mit Eingaben außerhalb der Verteilung der Trainingsdaten konfrontiert. Zum Beispiel könnte ein autonomes Fahrzeug einem schwarzen Stoppschild begegnen. Um sicheren Betrieb zu gewährleisten, ist es entscheidend, die Robustheit dieser Systeme zu quantifizieren, bevor sie in der Praxis eingesetzt werden. Aktuell werden diese Modelle auf festen Eingaben von derselben Verteilung wie die Trainingsdaten evaluiert. Allerdings ist diese Strategie unzureichend, um solche Ausnahmefälle zu identifizieren. Prinzipiell könnte die Robustheit “lokal” bestimmt werden, indem wir alle zulässigen Variationen einer Eingabe generieren und die Ausgabe des Systems überprüfen. Jedoch skaliert dieser Ansatz schlecht zu echten Daten. In dieser Arbeit benutzen wir generative Modelle, um synthetische Variationen von Eingaben zu erstellen und so die Robustheit eines Modells zu überprüfen. Wir erforschen Methoden, die es uns erlauben, kontrolliert semantische Änderungen an Bild- und Textdaten vorzunehmen. Wir lernen generative Modelle, die kontrollierte Manipulation von Daten ermöglichen, zum Beispiel den visuellen Kontext zu ändern, die Erscheinung eines Objekts zu bearbeiten oder den Schreibstil von Text zu ändern. Basierend auf diesen Modellen entwickeln wir neue Methoden, um die Robustheit von Bilderkennungssystemen bezüglich Variationen in den Eingaben zu untersuchen und Fehlverhalten zu identifizieren. Im Gebiet von Textdaten verwenden wir diese Modelle, um die Diversität von sogenannten Automatische Bildbeschriftung-Modellen zu verbessern und Schreibtstil-Manipulation zu erlauben, um private Attribute des Benutzers zu verschleiern. Um die Robustheit von Modellen zu quantifizieren, werden zwei Arten von Eingabemanipulationen untersucht: Modell-agnostische und Modell-spezifische Manipulationen. Modell-agnostische Manipulationen basieren auf menschlichem Wissen, um bestimmte Änderungen auszuwählen, ohne das entsprechende Modell miteinzubeziehen. Dies beinhaltet das Entfernen von für die Aufgabe irrelevanten Objekten aus Bildern oder Variationen des visuellen Kontextes. In dem alternativen Modell-spezifischen Ansatz werden Änderungen vorgenommen, die für das Modell möglichst ungünstig sind. Zum Beispiel ändern wir die Erscheinung eines Objekts um ein Modell der Objekterkennung täuschen. Dies ist durch den Gradienten des Modells möglich. Mithilfe dieser Werkzeuge können wir die Robustheit von Systemen zur Bildklassifizierung oder -segmentierung, Objekterkennung und Visuelle Fragenbeantwortung quantifizieren und verbessern

    Analysis of Sub-Cortical Morphology in Benign Epilepsy with Centrotemporal Spikes

    Get PDF
    RÉSUMÉ Au Canada, l’épilepsie affecte environ 5 à 8 enfants par 3222 âgés de 2 à 37 ans dans la population globale. Quinze à 47 % de ces enfants ont une épilepsie bénigne avec des pointes centrotemporelles (BECTS), ce qui fait de BECTS le syndrome épileptique focal de l’enfant bénin le plus fréquent. Initialement, BECTS était considéré comme bénin parmi les autres épilepsies car il était généralement rapporté que les capacités cognitives ont été préservées ou ramenées à la normale pendant la rémission. Cependant, certaines études ont trouvé des déficits cognitifs et comportementaux, qui peuvent bien persister même après la rémission. Compte tenu des différences neurocognitives chez les enfants atteints de BECTS et de témoins normaux, la question est de savoir si des variations morphométriques subtiles dans les structures cérébrales sont également présentes chez ces patients et si elles expliquent des variations dans les performence cognitifs. En fait, malgré les preuves accumulées d’une étiologie neurodéveloppementale dans le BECTS, peu est connu sur les altérations structurelles sous-jacentes. À cet égard, la proposition de méthodes avancées en neuroimagerie permettrait d’évaluer quantitativement les variations de la morphologie cérébrale associées à ce trouble neurologique. En outre, l’étude du développement morphologique du cerveau et sa relation avec la cognition peut aider à élucider la base neuroanatomique des déficits cognitifs. Le but de cette thèse est donc de fournir un ensemble d’outils pour analyser les variations morphologiques sous-corticales subtiles provoquées par différentes maladies, telles que l’épilepsie bénigne avec des pointes centrotemporelles. La méthodologie adoptée dans cette thèse a conduit à trois objectifs de recherche spécifiques. La première étape vise à développer un nouveau cadre automatisé pour segmenter les structures sous-corticales sur les images à resonance magnètique (IRM). La deuxième étape vise à concevoir une nouvelle approche basée sur la correspondance spectrale pour capturer précisément la variabilité de forme chez les sujets épileptiques. La troisième étape conduit à une analyse de la relation entre les changements morphologiques du cerveau et les indices cognitifs. La première contribution vise plus spécifiquement la segmentation automatique des structures sous-corticales dans un processus de co-recalage et de co-segmentation multi-atlas. Contrairement aux approches standards de segmentation multi-atlas, la méthode proposée obtient la segmentation finale en utilisant un recalage en fonction de la population, tandis que les connaissances à prior basés sur les réseaux neuronaux par convolution (CNNs) sont incorporées dans la formulation d’énergie en tant que représentation d’image discriminative. Ainsi, cette méthode exploite des représentations apprises plus sophistiquées pour conduire le processus de co-recalage. De plus, étant donné un ensemble de volumes cibles, la méthode proposée calcule les probabilités de segmentation individuellement, puis segmente tous les volumes simultanément. Par conséquent, le fardeau de fournir un sous-ensemble de vérité connue approprié pour effectuer la segmentation multi-atlas est évité. Des résultats prometteurs démontrent le potentiel de notre méthode sur deux ensembles de données, contenant des annotations de structures sous-corticales. L’importance des estimations fiables des annotations est également mise en évidence, ce qui motive l’utilisation de réseaux neuronaux profonds pour remplacer les annotations de vérité connue en co-recalage avec une perte de performance minimale. La deuxième contribution vise à saisir la variabilité de forme entre deux populations de surfaces en utilisant une analyse morphologique multijoints. La méthode proposée exploite la représentation spectrale pour établir des correspondances de surface, puisque l’appariement est plus simple dans le domaine spectral plutôt que dans l’espace euclidien conventionnel. Le cadre proposé intègre la concordance spectrale à courbure moyenne dans un plateforme d’analyse de formes sous-corticales multijoints. L’analyse expérimentale sur des données cliniques a montré que les différences de groupe extraites étaient similaires avec les résultats dans d’autres études cliniques, tandis que les sorties d’analyse de forme ont été créées d’une manière à réduire le temps de calcul. Enfin, la troisième contribution établit l’association entre les altérations morphologiques souscorticales chez les enfants atteints d’épilepsie bénigne et les indices cognitifs. Cette étude permet de détecter les changements du putamen et du noyau caudé chez les enfants atteints de BECTS gauche, droit ou bilatéral. De plus, l ’association des différences volumétriques structurelles et des différences de forme avec la cognition a été étudiée. Les résultats confirment les altérations de la forme du putamen et du noyau caudé chez les enfants atteints de BECTS. De plus, nos résultats suggèrent que la variation de la forme sous-corticale affecte les fonctions cognitives. Cette étude démontre que les altérations de la forme et leur relation avec la cognition dépendent du côté de la focalisation de l’épilepsie. Ce projet nous a permis d’étudier si de nouvelles méthodes permettraient de traiter automatiquement les informations de neuro-imagerie chez les enfants atteints de BECTS et de détecter des variations morphologiques subtiles dans leurs structures sous-corticales. De plus, les résultats obtenus dans le cadre de cette thèse nous ont permis de conclure qu’il existe une association entre les variations morphologiques et la cognition par rapport au côté de la focalisation de la crise épileptique.----------ABSTRACT In Canada, epilepsy affects approximately 5 to 8 children per 3222 aged from 2 to 37 years in the overall population. Fifteen to 47% of these children have benign epilepsy with centrotemporal spikes (BECTS), making BECTS the most common benign childhood focal epileptic syndrome. Initially, BECTS was considered as benign among other epilepsies since it was generally reported that cognitive abilities were preserved or brought back to normal during remission. However, some studies have found cognitive and behavioral deficits, which may well persist even after remission. Given neurocognitive differences among children with BECTS and normal controls, the question is whether subtle morphometric variations in brain structures are also present in these patients, and whether they explain variations in cognitive indices. In fact, despite the accumulating evidence of a neurodevelopmental etiology in BECTS, little is known about underlying structural alterations. In this respect, proposing advanced neuroimaging methods will allow for quantitative assessment of variations in brain morphology associated with this neurological disorder. In addition, studying the brain morphological development and its relationship with cognition may help elucidate the neuroanatomical basis of cognitive deficits. Therefore, the focus of this thesis is to provide a set of tools for analyzing the subtle sub-cortical morphological alterations in different diseases, such as benign epilepsy with centrotemporal spikes. The methodology adopted in this thesis led to addressing three specific research objectives. The first step develops a new automated framework for segmenting subcortical structures on MR images. The second step designs a new approach based on spectral correspondence to precisely capture shape variability in epileptic individuals. The third step finds the association between brain morphological changes and cognitive indices. The first contribution aims more specifically at automatic segmentation of sub-cortical structures in a groupwise multi-atlas coregistration and cosegmentation process. Contrary to the standard multi-atlas segmentation approaches, the proposed method obtains the final segmentation using a population-wise registration, while Convolutional Neural Network (CNN)- based priors are incorporated in the energy formulation as a discriminative image representation. Thus, this method exploits more sophisticated learned representations to drive the coregistration process. Furthermore, given a set of target volumes the developed method computes the segmentation probabilities individually, and then segments all the volumes simultaneously. Therefore, the burden of providing an appropriate ground truth subset to perform multi-atlas segmentation is removed. Promising results demonstrate the potential of our method on two different datasets, containing annotations of sub-cortical structures. The importance of reliable label estimations is also highlighted, motivating the use of deep neural nets to replace ground truth annotations in coregistration with minimal loss in performance. The second contribution intends to capture shape variability between two population of surfaces using groupwise morphological analysis. The proposed method exploits spectral representation for establishing surface correspondences, since matching is simpler in the spectral domain rather than in the conventional Euclidean space. The designed framework integrates mean curvature-based spectral matching in to a groupwise subcortical shape analysis pipeline. Experimental analysis on real clinical dataset showed that the extracted group differences were in parallel with the findings in other clinical studies, while the shape analysis outputs were created in a computational efficient manner. Finally, the third contribution establishes the association between sub-cortical morphological alterations in children with benign epilepsy and cognitive indices. This study detects putamen and caudate changes in children with left, right, or bilateral BECTS to age and gender matched healthy individuals. In addition, the association of structural volumetric and shape differences with cognition is investigated. The findings confirm putamen and caudate shape alterations in children with BECTS. Also, our results suggest that variation in sub-cortical shape affects cognitive functions. More importantly, this study demonstrates that shape alterations and their relation with cognition depend on the side of epilepsy focus. This project enabled us to investigate whether new methods would allow to automatically process neuroimaging information from children afflicted with BECTS and detect subtle morphological variations in their sub-cortical structures. In addition, the results obtained in this thesis allowed us to conclude the existence of the association between morphological variations and cognition with respect to the side of seizure focus

    AI-generated Content for Various Data Modalities: A Survey

    Full text link
    AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. Due to its wide range of applications and the demonstrated potential of recent works, AIGC developments have been attracting lots of attention recently, and AIGC methods have been developed for various data modalities, such as image, video, text, 3D shape (as voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human avatar (body and head), 3D motion, and audio -- each presenting different characteristics and challenges. Furthermore, there have also been many significant developments in cross-modality AIGC methods, where generative methods can receive conditioning input in one modality and produce outputs in another. Examples include going from various modalities to image, video, 3D shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar), and audio modalities. In this paper, we provide a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods, highlighting the various challenges, representative works, and recent technical directions in each setting. We also survey the representative datasets throughout the modalities, and present comparative results for various modalities. Moreover, we also discuss the challenges and potential future research directions

    사람 동작 생성을 위한 의미 분석

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2017. 2. 이제희.One of main goals of computer-generated character animation is to reduce cost to create animated scenes. Using human motion in makes it easier to animate characters, so motion capture technology is used as a standard technique. However, it is difficult to get the desired motion because it requires a large space, high-performance cameras, actors, and a significant amount of work for post-processing. Data-driven character animation includes a set of techniques that make effective use of captured motion data. In this thesis, I introduce methods that analyze the semantics of motion data to enhance the utilization of the data. To accomplish this, various techniques in other fields are integrated so that we can understand the semantics of a unit motion clip, the implicit structure of a motion sequence, and a natural description of movements. Based upon that understanding, we can generate new animation systems. The first animation system in this thesis allows the user to generate an animation of basketball play from the tactics board. In order to handle complex basketball rule that players must follow, we use context-free grammars for motion representation. Our motion grammar enables the user to define implicit/explicit rules of human behavior and generates valid movement of basketball players. Interactions between players or between players and the environment are represented with semantic rules, which results in plausible animation. When we compose motion sequences, we rely on motion corpus storing the prepared motion clips and the transition between them. It is important to construct good motion corpus to create natural and rich animations, but it requires the efforts of experts. We introduce a semi-supervised learning technique for automatic generation of motion corpus. Stacked autoencoders are used to find latent features for large amounts of motion capture data and the features are used to effectively discover worthwhile motion clips. The other animation system uses natural language processing technology to understand the meaning of the animated scene that the user wants to make. Specifically, the script of an animated scene is used to synthesize the movements of characters. Like the sketch interface, scripts are very sparse input sources. Understanding motion allows the system to interpret abstract user input and generate scenes that meet user needs.1 Introduction 1 2 Background 8 2.1 RepresentationofHumanMovements 8 2.2 MotionAnnotation 11 2.3 MotionGrammars 12 2.4 NaturalLanguageProcessing 15 3 Motion Grammar 17 3.1 Overview 18 3.2 MotionGrammar 20 3.2.1 Instantiation, Semantics, and Plausibility 22 3.2.2 ASimpleExample 25 3.3 BasketballTacticsBoard 27 3.4 MotionSynthesis 29 3.5 Results 35 3.6 Discussion 39 4 Motion Embedding 49 4.1 Overview 50 4.2 MotionData 51 4.3 Autoencoders 52 4.3.1 Stackedautoencoders 53 4.4 MotionCorpus 53 4.4.1 Training 53 4.4.2 FindingMotionClips 55 4.5 Results 55 4.6 Discussion 57 5 Text to Animation 62 5.1 Overview 63 5.2 UnderstandingSemantics 64 5.3 ActionChains 65 5.3.1 WordEmbedding 66 5.3.2 MotionPlausibility 67 5.4 SceneGeneration 69 5.5 Results 70 5.6 Discussion 70 6 Conclusion 74 Bibliography 76 초록 100Docto
    corecore