418 research outputs found

    Large-scale extraction of brain connectivity from the neuroscientific literature

    Get PDF
    Motivation: In neuroscience, as in many other scientific domains, the primary form of knowledge dissemination is through published articles. One challenge for modern neuroinformatics is finding methods to make the knowledge from the tremendous backlog of publications accessible for search, analysis and the integration of such data into computational models. A key example of this is metascale brain connectivity, where results are not reported in a normalized repository. Instead, these experimental results are published in natural language, scattered among individual scientific publications. This lack of normalization and centralization hinders the large-scale integration of brain connectivity results. In this article, we present text-mining models to extract and aggregate brain connectivity results from 13.2 million PubMed abstracts and 630 216 full-text publications related to neuroscience. The brain regions are identified with three different named entity recognizers (NERs) and then normalized against two atlases: the Allen Brain Atlas (ABA) and the atlas from the Brain Architecture Management System (BAMS). We then use three different extractors to assess inter-region connectivity. Results: NERs and connectivity extractors are evaluated against a manually annotated corpus. The complete in litero extraction models are also evaluated against invivo connectivity data from ABA with an estimated precision of 78%. The resulting database contains over 4 million brain region mentions and over 100 000 (ABA) and 122 000 (BAMS) potential brain region connections. This database drastically accelerates connectivity literature review, by providing a centralized repository of connectivity data to neuroscientists. Availability and implementation: The resulting models are publicly available at github.com/BlueBrain/bluima. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    Large-scale extraction of brain connectivity from the neuroscientific literature

    Get PDF
    In neuroscience, as in many other scientific domains, the primary form of knowledge dissemination is through published articles. One challenge for modern neuroinformatics is finding methods to make the knowledge from the tremendous backlog of publications accessible for search, analysis and the integration of such data into computational models. A key example of this is metascale brain connectivity, where results are not reported in a normalized repository. Instead, these experimental results are published in natural language, scattered among individual scientific publications. This lack of normalization and centralization hinders the large-scale integration of brain connectivity results. In this article, we present text-mining models to extract and aggregate brain connectivity results from 13.2 million PubMed abstracts and 630 216 full-text publications related to neuroscience. The brain regions are identified with three different named entity recognizers (NERs) and then normalized against two atlases: the Allen Brain Atlas (ABA) and the atlas from the Brain Architecture Management System (BAMS). We then use three different extractors to assess inter-region connectivity

    Agile in-litero experiments:how can semi-automated information extraction from neuroscientific literature help neuroscience model building?

    Get PDF
    In neuroscience, as in many other scientific domains, the primary form of knowledge dissemination is through published articles in peer-reviewed journals. One challenge for modern neuroinformatics is to design methods to make the knowledge from the tremendous backlog of publications accessible for search, analysis and its integration into computational models. In this thesis, we introduce novel natural language processing (NLP) models and systems to mine the neuroscientific literature. In addition to in vivo, in vitro or in silico experiments, we coin the NLP methods developed in this thesis as in litero experiments, aiming at analyzing and making accessible the extended body of neuroscientific literature. In particular, we focus on two important neuroscientific entities: brain regions and neural cells. An integrated NLP model is designed to automatically extract brain region connectivity statements from very large corpora. This system is applied to a large corpus of 25M PubMed abstracts and 600K full-text articles. Central to this system is the creation of a searchable database of brain region connectivity statements, allowing neuroscientists to gain an overview of all brain regions connected to a given region of interest. More importantly, the database enables researcher to provide feedback on connectivity results and links back to the original article sentence to provide the relevant context. The database is evaluated by neuroanatomists on real connectomics tasks (targets of Nucleus Accumbens) and results in significant effort reduction in comparison to previous manual methods (from 1 week to 2h). Subsequently, we introduce neuroNER to identify, normalize and compare instances of identify neuronsneurons in the scientific literature. Our method relies on identifying and analyzing each of the domain features used to annotate a specific neuron mention, like the morphological term 'basket' or brain region 'hippocampus'. We apply our method to the same corpus of 25M PubMed abstracts and 600K full-text articles and find over 500K unique neuron type mentions. To demonstrate the utility of our approach, we also apply our method towards cross-comparing the NeuroLex and Human Brain Project (HBP) cell type ontologies. By decoupling a neuron mention's identity into its specific compositional features, our method can successfully identify specific neuron types even if they are not explicitly listed within a predefined neuron type lexicon, thus greatly facilitating cross-laboratory studies. In order to build such large databases, several tools and infrastructureslarge-scale NLP were developed: a robust pipeline to preprocess full-text PDF articles, as well as bluima, an NLP processing pipeline specialized on neuroscience to perform text-mining at PubMed scale. During the development of those two NLP systems, we acknowledged the need for novel NLP approaches to rapidly develop custom text mining solutions. This led to the formalization of the agile text miningagile text-mining methodology to improve the communication and collaboration between subject matter experts and text miners. Agile text mining is characterized by short development cycles, frequent tasks redefinition and continuous performance monitoring through integration tests. To support our approach, we developed Sherlok, an NLP framework designed for the development of agile text mining applications

    Automatic target validation based on neuroscientific literature mining for tractography.

    Get PDF
    Target identification for tractography studies requires solid anatomical knowledge validated by an extensive literature review across species for each seed structure to be studied. Manual literature review to identify targets for a given seed region is tedious and potentially subjective. Therefore, complementary approaches would be useful. We propose to use text-mining models to automatically suggest potential targets from the neuroscientific literature, full-text articles and abstracts, so that they can be used for anatomical connection studies and more specifically for tractography. We applied text-mining models to three structures: two well-studied structures, since validated deep brain stimulation targets, the internal globus pallidus and the subthalamic nucleus and, the nucleus accumbens, an exploratory target for treating psychiatric disorders. We performed a systematic review of the literature to document the projections of the three selected structures and compared it with the targets proposed by text-mining models, both in rat and primate (including human). We ran probabilistic tractography on the nucleus accumbens and compared the output with the results of the text-mining models and literature review. Overall, text-mining the literature could find three times as many targets as two man-weeks of curation could. The overall efficiency of the text-mining against literature review in our study was 98% recall (at 36% precision), meaning that over all the targets for the three selected seeds, only one target has been missed by text-mining. We demonstrate that connectivity for a structure of interest can be extracted from a very large amount of publications and abstracts. We believe this tool will be useful in helping the neuroscience community to facilitate connectivity studies of particular brain regions. The text mining tools used for the study are part of the HBP Neuroinformatics Platform, publicly available at http://connectivity-brainer.rhcloud.com/

    A Knowledge-based Integrative Modeling Approach for <em>In-Silico</em> Identification of Mechanistic Targets in Neurodegeneration with Focus on Alzheimer’s Disease

    Get PDF
    Dementia is the progressive decline in cognitive function due to damage or disease in the body beyond what might be expected from normal aging. Based on neuropathological and clinical criteria, dementia includes a spectrum of diseases, namely Alzheimer's dementia, Parkinson's dementia, Lewy Body disease, Alzheimer's dementia with Parkinson's, Pick's disease, Semantic dementia, and large and small vessel disease. It is thought that these disorders result from a combination of genetic and environmental risk factors. Despite accumulating knowledge that has been gained about pathophysiological and clinical characteristics of the disease, no coherent and integrative picture of molecular mechanisms underlying neurodegeneration in Alzheimer’s disease is available. Existing drugs only offer symptomatic relief to the patients and lack any efficient disease-modifying effects. The present research proposes a knowledge-based rationale towards integrative modeling of disease mechanism for identifying potential candidate targets and biomarkers in Alzheimer’s disease. Integrative disease modeling is an emerging knowledge-based paradigm in translational research that exploits the power of computational methods to collect, store, integrate, model and interpret accumulated disease information across different biological scales from molecules to phenotypes. It prepares the ground for transitioning from ‘descriptive’ to “mechanistic” representation of disease processes. The proposed approach was used to introduce an integrative framework, which integrates, on one hand, extracted knowledge from the literature using semantically supported text-mining technologies and, on the other hand, primary experimental data such as gene/protein expression or imaging readouts. The aim of such a hybrid integrative modeling approach was not only to provide a consolidated systems view on the disease mechanism as a whole but also to increase specificity and sensitivity of the mechanistic model by providing disease-specific context. This approach was successfully used for correlating clinical manifestations of the disease to their corresponding molecular events and led to the identification and modeling of three important mechanistic components underlying Alzheimer’s dementia, namely the CNS, the immune system and the endocrine components. These models were validated using a novel in-silico validation method, namely biomarker-guided pathway analysis and a pathway-based target identification approach was introduced, which resulted in the identification of the MAPK signaling pathway as a potential candidate target at the crossroad of the triad components underlying disease mechanism in Alzheimer’s dementia

    Promises and pitfalls of deep neural networks in neuroimaging-based psychiatric research

    Full text link
    By promising more accurate diagnostics and individual treatment recommendations, deep neural networks and in particular convolutional neural networks have advanced to a powerful tool in medical imaging. Here, we first give an introduction into methodological key concepts and resulting methodological promises including representation and transfer learning, as well as modelling domain-specific priors. After reviewing recent applications within neuroimaging-based psychiatric research, such as the diagnosis of psychiatric diseases, delineation of disease subtypes, normative modeling, and the development of neuroimaging biomarkers, we discuss current challenges. This includes for example the difficulty of training models on small, heterogeneous and biased data sets, the lack of validity of clinical labels, algorithmic bias, and the influence of confounding variables

    The effect of using multiple connectivity metrics in brain Functional Connectivity studies

    Get PDF
    Tese de mestrado integrado, Engenharia BiomĂ©dica e BiofĂ­sica (Sinais e Imagens MĂ©dicas) Universidade de Lisboa, Faculdade de CiĂȘncias, 2022Resting-state functional magnetic resonance imaging (rs-fMRI) has the potential to assist as a diagnostic or prognostic tool for a diverse set of neurological and neuropsychiatric disorders, which are often difficult to differentiate. fMRI focuses on the study of the brain functional Connectome, which is characterized by the functional connections and neuronal activity among different brain regions, also interpreted as communications between pairs of regions. This Functional Connectivity (FC) is quantified through the statistical dependences between brain regions’ blood-oxygen-level-dependent (BOLD) signals time-series, being traditionally evaluated by correlation coefficient metrics and represented as FC matrices. However, several studies underlined limitations regarding the use of correlation metrics to fully capture information from these signals, leading investigators towards different statistical metrics that would fill those shortcomings. Recently, investigators have turned their attention to Deep Learning (DL) models, outperforming traditional Machine Learning (ML) techniques due to their ability to automatically extract relevant information from high-dimensional data, like FC data, using these models with rs-fMRI data to improve diagnostic predictions, as well as to understand pathological patterns in functional Connectome, that can lead to the discovery of new biomarkers. In spite of very encouraging performances, the black-box nature of DL algorithms makes difficult to know which input information led the model to a certain prediction, restricting its use in clinical settings. The objective of this dissertation is to exploit the power of DL models, understanding how FC matrices created from different statistical metrics can provide information about the brain FC, beyond the conventionally used correlation family. Two publicly available datasets where studied, the ABIDE I dataset, composed by healthy and autism spectrum disease (ASD) individuals, and the ADHD-200 dataset, with typically developed controls and individuals with attention-deficit/hyperactive disorder (ADHD). The computation of the FC matrices of both datasets, using different statistical metrics, was performed in MATLAB using MULAN’s toolbox functions, encompassing the correlation coefficient, non-linear correlation coefficient, mutual information, coherence and transfer entropy. The classification of FC data was performed using two DL models, the improved ConnectomeCNN model and the innovative ConnectomeCNN-Autoencoder model. Moreover, another goal is to study the effect of a multi-metric approach in classification performances, combining multiple FC matrices computed from the different statistical metrics used, as well as to study the use of Explainable Artificial Intelligence (XAI) techniques, namely Layer-wise Relevance Propagation method (LRP), to surpass the black-box problem of DL models used, in order to reveal the most important brain regions in ADHD. The results show that the use of other statistical metrics to compute FC matrices can be a useful complement to the traditional correlation metric methods for the classification between healthy subjects and subjects diagnosed with ADHD and ASD. Namely, non-linear metrics like h2 and mutual information, achieved similar and, in some cases, even slightly better performances than correlation methods. The use of FC multi-metric, despite not showing improvements in classification performance compared to the best individual method, presented promising results, namely the ability of this approach to select the best features from all the FC matrices combined, achieving a similar performance in relation to the best individual metric in each of the evaluation measures of the model, leading to a more complete classification. The LRP analysis applied to ADHD-200 dataset proved to be promising, identifying brain regions related to the pathophysiology of ADHD, which are in broad accordance with FC and structural study’s findings.A ressonĂąncia magnĂ©tica funcional em estado de repouso (rs-fMRI) tem o potencial de ser uma ferramenta auxiliar de diagnĂłstico ou prognĂłstico para um conjunto diversificado de distĂșrbios neurolĂłgicos e neuropsiquiĂĄtricos, que muitas vezes sĂŁo difĂ­ceis de diferenciar. A anĂĄlise de dados de rs-fMRI recorre muitas vezes ao conceito de conectoma funcional do cĂ©rebro, que se caracteriza pelas conexĂ”es funcionais entre as diferentes regiĂ”es do cĂ©rebro, sendo estas conexĂ”es interpretadas como comunicaçÔes entre diferentes pares de regiĂ”es cerebrais. Esta conectividade funcional Ă© quantificada atravĂ©s de dependĂȘncias estatĂ­sticas entre os sinais fMRI das regiĂ”es cerebrais, sendo estas tradicionalmente calculadas atravĂ©s da mĂ©trica coeficiente de correlação, e representadas atravĂ©s de matrizes de conectividade funcional. No entanto, vĂĄrios estudos demonstraram limitaçÔes em relação ao uso de mĂ©tricas de correlação, em que estas nĂŁo conseguem capturar por completo todas as informaçÔes presentes nesses sinais, levando os investigadores Ă  procura de diferentes mĂ©tricas estatĂ­sticas que pudessem preencher essas lacunas na obtenção de informaçÔes mais completas desses sinais. O estudo destes distĂșrbios neurolĂłgicos e neuropsiquiĂĄtricos começou por se basear em tĂ©cnicas como mapeamento paramĂ©trico estatĂ­stico, no contexto de estudos de fMRI baseados em tarefas. PorĂ©m, essas tĂ©cnicas apresentam certas limitaçÔes, nomeadamente a suposição de que cada regiĂŁo cerebral atua de forma independente, o que nĂŁo corresponde ao conhecimento atual sobre o funcionamento do cĂ©rebro. O surgimento da rs-fMRI permitiu obter uma perspetiva mais global e deu origem a uma vasta literatura sobre o efeito de patologias nos padrĂ”es de conetividade em repouso, incluindo tentativas de diagnĂłstico automatizado com base em biomarcadores extraĂ­dos dos conectomas. Nos Ășltimos anos, os investigadores voltaram a sua atenção para tĂ©cnicas de diferentes ramos de InteligĂȘncia Artificial, mais propriamente para os algoritmos de Deep Learning (DL), uma vez que sĂŁo capazes de superar os algoritmos tradicionais de Machine Learning (ML), que foram aplicados a estes estudos numa fase inicial, devido Ă  sua capacidade de extrair automaticamente informaçÔes relevantes de dados de alta dimensĂŁo, como Ă© o caso dos dados de conectividade funcional. Esses modelos utilizam os dados obtidos da rs-fMRI para melhorar as previsĂ”es de diagnĂłstico em relação Ă s tĂ©cnicas usadas atualmente em termos de precisĂŁo e rapidez, bem como para compreender melhor os padrĂ”es patolĂłgicos nas conexĂ”es funcionais destes distĂșrbios, podendo levar Ă  descoberta de novos biomarcadores. Apesar do notĂĄvel desempenho destes modelos, a arquitetura natural em caixa-preta dos algoritmos de DL, torna difĂ­cil saber quais as informaçÔes dos dados de entrada que levaram o modelo a executar uma determinada previsĂŁo, podendo este utilizar informaçÔes erradas dos dados para alcançar uma dada inferĂȘncia, restringindo o seu uso em ambientes clĂ­nicos. O objetivo desta dissertação, desenvolvida no Instituto de BiofĂ­sica e Engenharia BiomĂ©dica, Ă© explorar o poder dos modelos DL, de forma a avaliar atĂ© que ponto matrizes de conectividade funcional criadas a partir de diferentes mĂ©tricas estatĂ­sticas podem fornecer mais informaçÔes sobre a conectividade funcional do cĂ©rebro, para alĂ©m das mĂ©tricas de correlação convencionalmente usadas neste tipo de estudos. Foram estudados dois conjuntos de dados bastante utilizados em estudos de NeurociĂȘncia e que estĂŁo disponĂ­veis publicamente: o conjunto de dados ABIDE-I, composto por indivĂ­duos saudĂĄveis e indivĂ­duos com doenças do espectro do autismo (ASD), e o conjunto de dados ADHD-200, com controlos tipicamente desenvolvidos e indivĂ­duos com transtorno do dĂ©fice de atenção e hiperatividade (ADHD). Numa primeira fase foi realizada a computação das matrizes de conetividade funcional de ambos os conjuntos de dados, usando as diferentes mĂ©tricas estatĂ­sticas. Para isso, foi desenvolvido cĂłdigo de MATLAB, onde se utilizam as sĂ©ries temporais dos sinais BOLD obtidas dos dois conjuntos de dados para criar essas mesmas matrizes de conectividade funcional, incorporando funçÔes de diferentes mĂ©tricas estatĂ­sticas da caixa de ferramentas MULAN, compreendendo o coeficiente de correlação, o coeficiente de correlação nĂŁo linear, a informação mĂștua, a coerĂȘncia e a entropia de transferĂȘncia. De seguida, a classificação dos dados de conectividade funcional, de forma a avaliar o efeito do uso de diferentes mĂ©tricas estatĂ­sticas para a criação de matrizes de conectividade funcional na discriminação de sujeitos saudĂĄveis e patolĂłgicos, foi realizada usando dois modelos de DL. O modelo ConnectomeCNN melhorado e o modelo inovador ConnectomeCNN-Autoencoder foram desenvolvidos com recurso Ă  biblioteca de Redes Neuronais Keras, juntamente com o seu backend Tensorflow, ambos em Python. Estes modelos, desenvolvidos previamente no Instituto de BiofĂ­sica e Engenharia BiomĂ©dica, tiveram de ser otimizados de forma a obter a melhor performance, onde vĂĄrios parĂąmetros dos modelos e do respetivo treino dos mesmos foram testados para os dados a estudar. Pretendeu-se tambĂ©m estudar o efeito de uma abordagem multi-mĂ©trica nas tarefas de classificação dos sujeitos de ambos os conjuntos de dados, sendo que, para estudar essa abordagem as diferentes matrizes calculadas a partir das diferentes mĂ©tricas estatĂ­sticas utilizadas, foram combinadas, sendo usados os mesmos modelos que foram aplicados Ă s matrizes de conectividade funcional de cada mĂ©trica estatĂ­stica individualmente. É importante realçar que na abordagem multi-mĂ©trica tambĂ©m foi realizada a otimização dos parĂąmetros dos modelos utilizados e do respetivo treino, de modo a conseguir a melhor performance dos mesmos para estes dados. Para alĂ©m destes dois objetivos, estudou-se o uso de tĂ©cnicas de InteligĂȘncia Artificial ExplicĂĄvel (XAI), mais especificamente o mĂ©todo Layer-wise Relevance Propagation (LRP), com vista a superar o problema da caixa-preta dos modelos de DL, com a finalidade de explicar como Ă© que os modelos estĂŁo a utilizar os dados de entrada para realizar uma dada previsĂŁo. O mĂ©todo LRP foi aplicado aos dois modelos utilizados anteriormente, usando como dados de entrada o conjunto de dados ADHD-200, permitindo assim revelar quais as regiĂ”es cerebrais mais importantes no que toca a um diagnĂłstico relacionado com o ADHD. Os resultados obtidos mostram que o uso de outras mĂ©tricas estatĂ­sticas para criar as matrizes de Conectividade Funcional podem ser um complemento bastante Ăștil Ă s mĂ©tricas estatĂ­sticas tradicionalmente utilizadas para a classificação entre indivĂ­duos saudĂĄveis e indivĂ­duos como ASD e ADHD. Nomeadamente mĂ©tricas estatĂ­sticas nĂŁo lineares como o h2 e a informação mĂștua, obtiveram desempenhos semelhantes e, em alguns casos, desempenhos ligeiramente melhores em relação aos desempenhos obtidos por mĂ©todos de correlação, convencionalmente usados nestes estudos de conectividade funcional. A utilização da multi-mĂ©trica de conectividade funcional, apesar de nĂŁo apresentar melhorias no desempenho geral da classificação em relação ao melhor mĂ©todo das matrizes de conectividade funcional individuais do conjunto de mĂ©tricas estatĂ­sticas abordadas, apresenta resultados que justificam a exploração mais aprofundada deste tipo de abordagem, de forma a compreender melhor a complementaridade das mĂ©tricas e a melhor maneira de as utilizar. O uso do mĂ©todo LRP aplicado ao conjunto de dados do ADHD-200 mostrou a sua aplicabilidade a este tipo de estudos e a modelos de DL, identificando as regiĂ”es cerebrais mais relacionadas Ă  fisiopatologia do diagnĂłstico do ADHD que sĂŁo compatĂ­veis com o que Ă© reportado por diversos estudos de conectividade funcional e estudos de alteraçÔes estruturais associados a esta doença. O facto destas tĂ©cnicas de XAI demonstrarem como Ă© que os modelos de DL estĂŁo a usar os dados de entrada para efetuar as previsĂ”es, pode significar uma mais rĂĄpida e aceite adoção destes algoritmos em ambientes clĂ­nicos. Estas tĂ©cnicas podem auxiliar o diagnĂłstico e prognĂłstico destes distĂșrbios neurolĂłgicos e neuropsiquiĂĄtricos, que sĂŁo na maioria das vezes difĂ­ceis de diferenciar, permitindo aos mĂ©dicos adquirirem um conhecimento em relação Ă  previsĂŁo realizada e poder explicar a mesma aos seus pacientes

    Conceptualization of Computational Modeling Approaches and Interpretation of the Role of Neuroimaging Indices in Pathomechanisms for Pre-Clinical Detection of Alzheimer Disease

    Get PDF
    With swift advancements in next-generation sequencing technologies alongside the voluminous growth of biological data, a diversity of various data resources such as databases and web services have been created to facilitate data management, accessibility, and analysis. However, the burden of interoperability between dynamically growing data resources is an increasingly rate-limiting step in biomedicine, specifically concerning neurodegeneration. Over the years, massive investments and technological advancements for dementia research have resulted in large proportions of unmined data. Accordingly, there is an essential need for intelligent as well as integrative approaches to mine available data and substantiate novel research outcomes. Semantic frameworks provide a unique possibility to integrate multiple heterogeneous, high-resolution data resources with semantic integrity using standardized ontologies and vocabularies for context- specific domains. In this current work, (i) the functionality of a semantically structured terminology for mining pathway relevant knowledge from the literature, called Pathway Terminology System, is demonstrated and (ii) a context-specific high granularity semantic framework for neurodegenerative diseases, known as NeuroRDF, is presented. Neurodegenerative disorders are especially complex as they are characterized by widespread manifestations and the potential for dramatic alterations in disease progression over time. Early detection and prediction strategies through clinical pointers can provide promising solutions for effective treatment of AD. In the current work, we have presented the importance of bridging the gap between clinical and molecular biomarkers to effectively contribute to dementia research. Moreover, we address the need for a formalized framework called NIFT to automatically mine relevant clinical knowledge from the literature for substantiating high-resolution cause-and-effect models
    • 

    corecore