87 research outputs found

    Layer-Wise Relevance Propagation for Explaining Deep Neural Network Decisions in MRI-Based Alzheimer's Disease Classification

    Get PDF
    Deep neural networks have led to state-of-the-art results in many medical imaging tasks including Alzheimer’s disease (AD) detection based on structural magnetic resonance imaging (MRI) data. However, the network decisions are often perceived as being highly non-transparent, making it difficult to apply these algorithms in clinical routine. In this study, we propose using layer-wise relevance propagation (LRP) to visualize convolutional neural network decisions for AD based on MRI data. Similarly to other visualization methods, LRP produces a heatmap in the input space indicating the importance/relevance of each voxel contributing to the final classification outcome. In contrast to susceptibility maps produced by guided backpropagation (“Which change in voxels would change the outcome most?”), the LRP method is able to directly highlight positive contributions to the network classification in the input space. In particular, we show that (1) the LRP method is very specific for individuals (“Why does this person have AD?”) with high inter-patient variability, (2) there is very little relevance for AD in healthy controls and (3) areas that exhibit a lot of relevance correlate well with what is known from literature. To quantify the latter, we compute size-corrected metrics of the summed relevance per brain area, e.g., relevance density or relevance gain. Although these metrics produce very individual “fingerprints” of relevance patterns for AD patients, a lot of importance is put on areas in the temporal lobe including the hippocampus. After discussing several limitations such as sensitivity toward the underlying model and computation parameters, we conclude that LRP might have a high potential to assist clinicians in explaining neural network decisions for diagnosing AD (and potentially other diseases) based on structural MRI data

    Modeling functional brain activity of human working memory using deep recurrent neural networks

    Full text link
    Dans les systèmes cognitifs, le rôle de la mémoire de travail est crucial pour le raisonnement visuel et la prise de décision. D’énormes progrès ont été réalisés dans la compréhension des mécanismes de la mémoire de travail humain/animal, ainsi que dans la formulation de différents cadres de réseaux de neurones artificiels à mémoire augmentée. L’objectif global de notre projet est de former des modèles de réseaux de neurones artificiels capables de consolider la mémoire sur une courte période de temps pour résoudre une tâche de mémoire et les relier à l’activité cérébrale des humains qui ont résolu la même tâche. Le projet est de nature interdisciplinaire en essayant de relier les aspects de l’intelligence artificielle (apprentissage profond) et des neurosciences. La tâche cognitive utilisée est la tâche N-back, très populaire en neurosciences cognitives dans laquelle les sujets sont présentés avec une séquence d’images, dont chacune doit être identifiée pour savoir si elle a déjà été vue ou non. L’ensemble de données d’imagerie fonctionnelle (IRMf) utilisé a été collecté dans le cadre du projet Courtois Neurmod. Nous étudions plusieurs variantes de modèles de réseaux neuronaux récurrents qui apprennent à résoudre la tâche de mémoire de travail N-back en les entraînant avec des séquences d’images. Ces réseaux de neurones entraînés optimisés pour la tâche de mémoire sont finalement utilisés pour générer des représentations de caractéristiques pour les images de stimuli vues par les sujets humains pendant leurs enregistrements tout en résolvant la tâche. Les représentations dérivées de ces réseaux de neurones servent ensuite à créer un modèle de codage pour prédire l’activité IRMf BOLD des sujets. On comprend alors la relation entre le modèle de réseau neuronal et l’activité cérébrale en analysant cette capacité prédictive du modèle dans différentes zones du cerveau impliquées dans la mémoire de travail. Ce travail présente une manière d’utiliser des réseaux de neurones artificiels pour modéliser le comportement et le traitement de l’information de la mémoire de travail du cerveau et d’utiliser les données d’imagerie cérébrale capturées sur des sujets humains lors de la tâche N-back pour potentiellement comprendre certains mécanismes de mémoire du cerveau en relation avec ces modèles de réseaux de neurones artificiels.In cognitive systems, the role of working memory is crucial for visual reasoning and decision making. Tremendous progress has been made in understanding the mechanisms of the human/animal working memory, as well as in formulating different frameworks of memory augmented artificial neural networks. The overall objective of our project is to train artificial neural network models that are capable of consolidating memory over a short period of time to solve a memory task and relate them to the brain activity of humans who solved the same task. The project is of interdisciplinary nature in trying to bridge aspects of Artificial Intelligence (deep learning) and Neuroscience. The cognitive task used is the N-back task, a very popular one in Cognitive Neuroscience in which the subjects are presented with a sequence of images, each of which needs to be identified as to whether it was already seen or not. The functional imaging (fMRI) dataset used has been collected as a part of the Courtois Neurmod Project. We study multiple variants of recurrent neural network models that learn to remember input images across timesteps. These trained neural networks optimized for the memory task are ultimately used to generate feature representations for the stimuli images seen by the human subjects during their recordings while solving the task. The representations derived from these neural networks are then to create an encoding model to predict the fMRI BOLD activity of the subjects. We then understand the relationship between the neural network model and brain activity by analyzing this predictive ability of the model in different areas of the brain that are involved in working memory. This work presents a way of using artificial neural networks to model the behavior and information processing of the working memory of the brain and to use brain imaging data captured from human subjects during the N-back task to potentially understand some memory mechanisms of the brain in relation to these artificial neural network models

    The emergence of number and syntax units in LSTM language models

    Get PDF
    Recent work has shown that LSTMs trained on a generic language modeling objective capture syntax-sensitive generalizations such as long-distance number agreement. We have however no mechanistic understanding of how they accomplish this remarkable feat. Some have conjectured it depends on heuristics that do not truly take hierarchical structure into account. We present here a detailed study of the inner mechanics of number tracking in LSTMs at the single neuron level. We discover that long-distance number information is largely managed by two `number units'. Importantly, the behaviour of these units is partially controlled by other units independently shown to track syntactic structure. We conclude that LSTMs are, to some extent, implementing genuinely syntactic processing mechanisms, paving the way to a more general understanding of grammatical encoding in LSTMs.Comment: To appear in Proceedings of NAACL, Minneapolis, MN, 201

    Uncovering convolutional neural network decisions for diagnosing multiple sclerosis on conventional MRI using layer-wise relevance propagation

    Get PDF
    Machine learning-based imaging diagnostics has recently reached or even superseded the level of clinical experts in several clinical domains. However, classification decisions of a trained machine learning system are typically non-transparent, a major hindrance for clinical integration, error tracking or knowledge discovery. In this study, we present a transparent deep learning framework relying on convolutional neural networks (CNNs) and layer-wise relevance propagation (LRP) for diagnosing multiple sclerosis (MS). MS is commonly diagnosed utilizing a combination of clinical presentation and conventional magnetic resonance imaging (MRI), specifically the occurrence and presentation of white matter lesions in T2-weighted images. We hypothesized that using LRP in a naive predictive model would enable us to uncover relevant image features that a trained CNN uses for decision-making. Since imaging markers in MS are well-established this would enable us to validate the respective CNN model. First, we pre-trained a CNN on MRI data from the Alzheimer's Disease Neuroimaging Initiative (n = 921), afterwards specializing the CNN to discriminate between MS patients and healthy controls (n = 147). Using LRP, we then produced a heatmap for each subject in the holdout set depicting the voxel-wise relevance for a particular classification decision. The resulting CNN model resulted in a balanced accuracy of 87.04% and an area under the curve of 96.08% in a receiver operating characteristic curve. The subsequent LRP visualization revealed that the CNN model focuses indeed on individual lesions, but also incorporates additional information such as lesion location, non-lesional white matter or gray matter areas such as the thalamus, which are established conventional and advanced MRI markers in MS. We conclude that LRP and the proposed framework have the capability to make diagnostic decisions of..

    Neural and Computational Principles of Real-World Sequence Processing

    Get PDF
    We are constantly processing sequential information in our day-to-day life, from listening to a piece of music (processing a stream of notes), watching a movie (processing a series of scenes), to having conversations with people around us (processing a stream of syllables, words, and sentences). What are the neural and computational principles underlying this ubiquitous cognitive process? In this thesis, I first review the background and prior studies regarding the neural and computational mechanisms of real-life sequence processing and present our research questions. I then present four research projects to answer those questions: By combining neuroimaging data analysis and computational modeling, I discovered the neural phenomena of integrating and forgetting temporal information during naturalistic sequence processing in the human cerebral cortex. Furthermore, I identified computational principles (e.g., hierarchical architecture) and processes (e.g., dynamical context gating) which can help to explain the neural state changes observed during naturalistic processing. These neural and computational findings not only validate the existing components of hierarchical temporal integration theory, but also rule out alternative models, and propose important new elements of the theory, including context gating at event boundaries. I next explored the computations for natural language processing in brains and machines, by (1) applying our neuroscience-inspired methods to examine the timescale and functional organization of neural network language models, thereby revealing their own architecture for processing information over multiple timescales; and by (2) investigating the context and entity representations in two neural networks with brain-inspired architectures, thereby revealing a gap between brain-inspired and performance-optimized architectures. Finally, I discuss the positions and contributions of our findings in the field and some future directions

    Spatio-temporal Deep Learning Architectures for Data-Driven Learning of Brain’s Network Connectivity

    Get PDF
    Brain disorders are often linked to disruptions in the dynamics of the brain\u27s intrinsic functional networks. It is crucial to identify these networks and determine disruptions in their interactions to classify, understand, and possibly cure brain disorders. Brain\u27s network interactions are commonly assessed via functional (network)\ connectivity, captured as an undirected matrix of Pearson correlation coefficients. Functional connectivity can represent static and dynamic relations. However, often these are modeled using a fixed choice for the data window. Alternatively, deep learning models may flexibly learn various representations from the same data based on the model architecture and the training task. The representations produced by deep learning models are often difficult to interpret and require additional posthoc methods, e.g., saliency maps. Also, deep learning models typically require many input samples to learn features and perform the downstream task well. This dissertation introduces deep learning architectures that work on functional MRI data to estimate disorder-specific brain network connectivity and provide high classification accuracy in discriminating controls and patients. To handle the relatively low number of labeled subjects in the field of neuroimaging, this research proposes deep learning architectures that leverage self-supervised pre-training to increase downstream classification. To increase the interpretability and avoid using a posthoc method, deep learning architectures are proposed that expose a directed graph layer representing the model\u27s learning about relevant brain connectivity. The proposed models estimate task-specific directed connectivity matrices for each subject using the same data but training different models on their own discriminative tasks. The proposed architectures are tested with multiple neuroimaging datasets to discriminate controls and patients with schizophrenia, autism, and dementia, as well as age and gender prediction. The proposed approach reveals that differences in connectivity among sensorimotor networks relative to default-mode networks are an essential indicator of dementia and gender. Dysconnectivity between networks, especially sensorimotor and visual, is linked with schizophrenic patients. However, schizophrenic patients show increased intra-network default-mode connectivity compared to healthy controls. Sensorimotor connectivity is vital for both dementia and schizophrenia prediction, but the differences are in inter and intra-network connectivity
    • …
    corecore