418 research outputs found
Large-scale extraction of brain connectivity from the neuroscientific literature
Motivation: In neuroscience, as in many other scientific domains, the primary form of knowledge dissemination is through published articles. One challenge for modern neuroinformatics is finding methods to make the knowledge from the tremendous backlog of publications accessible for search, analysis and the integration of such data into computational models. A key example of this is metascale brain connectivity, where results are not reported in a normalized repository. Instead, these experimental results are published in natural language, scattered among individual scientific publications. This lack of normalization and centralization hinders the large-scale integration of brain connectivity results. In this article, we present text-mining models to extract and aggregate brain connectivity results from 13.2 million PubMed abstracts and 630â216 full-text publications related to neuroscience. The brain regions are identified with three different named entity recognizers (NERs) and then normalized against two atlases: the Allen Brain Atlas (ABA) and the atlas from the Brain Architecture Management System (BAMS). We then use three different extractors to assess inter-region connectivity. Results: NERs and connectivity extractors are evaluated against a manually annotated corpus. The complete in litero extraction models are also evaluated against invivo connectivity data from ABA with an estimated precision of 78%. The resulting database contains over 4 million brain region mentions and over 100â000 (ABA) and 122â000 (BAMS) potential brain region connections. This database drastically accelerates connectivity literature review, by providing a centralized repository of connectivity data to neuroscientists. Availability and implementation: The resulting models are publicly available at github.com/BlueBrain/bluima. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin
Large-scale extraction of brain connectivity from the neuroscientific literature
In neuroscience, as in many other scientific domains, the primary form of knowledge dissemination is through published articles. One challenge for modern neuroinformatics is finding methods to make the knowledge from the tremendous backlog of publications accessible for search, analysis and the integration of such data into computational models. A key example of this is metascale brain connectivity, where results are not reported in a normalized repository. Instead, these experimental results are published in natural language, scattered among individual scientific publications. This lack of normalization and centralization hinders the large-scale integration of brain connectivity results. In this article, we present text-mining models to extract and aggregate brain connectivity results from 13.2 million PubMed abstracts and 630â216 full-text publications related to neuroscience. The brain regions are identified with three different named entity recognizers (NERs) and then normalized against two atlases: the Allen Brain Atlas (ABA) and the atlas from the Brain Architecture Management System (BAMS). We then use three different extractors to assess inter-region connectivity
Agile in-litero experiments:how can semi-automated information extraction from neuroscientific literature help neuroscience model building?
In neuroscience, as in many other scientific domains, the primary form of knowledge dissemination is through published articles in peer-reviewed journals. One challenge for modern neuroinformatics is to design methods to make the knowledge from the tremendous backlog of publications accessible for search, analysis and its integration into computational models. In this thesis, we introduce novel natural language processing (NLP) models and systems to mine the neuroscientific literature. In addition to in vivo, in vitro or in silico experiments, we coin the NLP methods developed in this thesis as in litero experiments, aiming at analyzing and making accessible the extended body of neuroscientific literature. In particular, we focus on two important neuroscientific entities: brain regions and neural cells. An integrated NLP model is designed to automatically extract brain region connectivity statements from very large corpora. This system is applied to a large corpus of 25M PubMed abstracts and 600K full-text articles. Central to this system is the creation of a searchable database of brain region connectivity statements, allowing neuroscientists to gain an overview of all brain regions connected to a given region of interest. More importantly, the database enables researcher to provide feedback on connectivity results and links back to the original article sentence to provide the relevant context. The database is evaluated by neuroanatomists on real connectomics tasks (targets of Nucleus Accumbens) and results in significant effort reduction in comparison to previous manual methods (from 1 week to 2h). Subsequently, we introduce neuroNER to identify, normalize and compare instances of identify neuronsneurons in the scientific literature. Our method relies on identifying and analyzing each of the domain features used to annotate a specific neuron mention, like the morphological term 'basket' or brain region 'hippocampus'. We apply our method to the same corpus of 25M PubMed abstracts and 600K full-text articles and find over 500K unique neuron type mentions. To demonstrate the utility of our approach, we also apply our method towards cross-comparing the NeuroLex and Human Brain Project (HBP) cell type ontologies. By decoupling a neuron mention's identity into its specific compositional features, our method can successfully identify specific neuron types even if they are not explicitly listed within a predefined neuron type lexicon, thus greatly facilitating cross-laboratory studies. In order to build such large databases, several tools and infrastructureslarge-scale NLP were developed: a robust pipeline to preprocess full-text PDF articles, as well as bluima, an NLP processing pipeline specialized on neuroscience to perform text-mining at PubMed scale. During the development of those two NLP systems, we acknowledged the need for novel NLP approaches to rapidly develop custom text mining solutions. This led to the formalization of the agile text miningagile text-mining methodology to improve the communication and collaboration between subject matter experts and text miners. Agile text mining is characterized by short development cycles, frequent tasks redefinition and continuous performance monitoring through integration tests. To support our approach, we developed Sherlok, an NLP framework designed for the development of agile text mining applications
Automatic target validation based on neuroscientific literature mining for tractography.
Target identification for tractography studies requires solid anatomical knowledge validated by an extensive literature review across species for each seed structure to be studied. Manual literature review to identify targets for a given seed region is tedious and potentially subjective. Therefore, complementary approaches would be useful. We propose to use text-mining models to automatically suggest potential targets from the neuroscientific literature, full-text articles and abstracts, so that they can be used for anatomical connection studies and more specifically for tractography. We applied text-mining models to three structures: two well-studied structures, since validated deep brain stimulation targets, the internal globus pallidus and the subthalamic nucleus and, the nucleus accumbens, an exploratory target for treating psychiatric disorders. We performed a systematic review of the literature to document the projections of the three selected structures and compared it with the targets proposed by text-mining models, both in rat and primate (including human). We ran probabilistic tractography on the nucleus accumbens and compared the output with the results of the text-mining models and literature review. Overall, text-mining the literature could find three times as many targets as two man-weeks of curation could. The overall efficiency of the text-mining against literature review in our study was 98% recall (at 36% precision), meaning that over all the targets for the three selected seeds, only one target has been missed by text-mining. We demonstrate that connectivity for a structure of interest can be extracted from a very large amount of publications and abstracts. We believe this tool will be useful in helping the neuroscience community to facilitate connectivity studies of particular brain regions. The text mining tools used for the study are part of the HBP Neuroinformatics Platform, publicly available at http://connectivity-brainer.rhcloud.com/
A Knowledge-based Integrative Modeling Approach for <em>In-Silico</em> Identification of Mechanistic Targets in Neurodegeneration with Focus on Alzheimerâs Disease
Dementia is the progressive decline in cognitive function due to damage or disease in the body beyond what might be expected from normal aging. Based on neuropathological and clinical criteria, dementia includes a spectrum of diseases, namely Alzheimer's dementia, Parkinson's dementia, Lewy Body disease, Alzheimer's dementia with Parkinson's, Pick's disease, Semantic dementia, and large and small vessel disease. It is thought that these disorders result from a combination of genetic and environmental risk factors. Despite accumulating knowledge that has been gained about pathophysiological and clinical characteristics of the disease, no coherent and integrative picture of molecular mechanisms underlying neurodegeneration in Alzheimerâs disease is available. Existing drugs only offer symptomatic relief to the patients and lack any efficient disease-modifying effects. The present research proposes a knowledge-based rationale towards integrative modeling of disease mechanism for identifying potential candidate targets and biomarkers in Alzheimerâs disease. Integrative disease modeling is an emerging knowledge-based paradigm in translational research that exploits the power of computational methods to collect, store, integrate, model and interpret accumulated disease information across different biological scales from molecules to phenotypes. It prepares the ground for transitioning from âdescriptiveâ to âmechanisticâ representation of disease processes. The proposed approach was used to introduce an integrative framework, which integrates, on one hand, extracted knowledge from the literature using semantically supported text-mining technologies and, on the other hand, primary experimental data such as gene/protein expression or imaging readouts. The aim of such a hybrid integrative modeling approach was not only to provide a consolidated systems view on the disease mechanism as a whole but also to increase specificity and sensitivity of the mechanistic model by providing disease-specific context. This approach was successfully used for correlating clinical manifestations of the disease to their corresponding molecular events and led to the identification and modeling of three important mechanistic components underlying Alzheimerâs dementia, namely the CNS, the immune system and the endocrine components. These models were validated using a novel in-silico validation method, namely biomarker-guided pathway analysis and a pathway-based target identification approach was introduced, which resulted in the identification of the MAPK signaling pathway as a potential candidate target at the crossroad of the triad components underlying disease mechanism in Alzheimerâs dementia
Promises and pitfalls of deep neural networks in neuroimaging-based psychiatric research
By promising more accurate diagnostics and individual treatment
recommendations, deep neural networks and in particular convolutional neural
networks have advanced to a powerful tool in medical imaging. Here, we first
give an introduction into methodological key concepts and resulting
methodological promises including representation and transfer learning, as well
as modelling domain-specific priors. After reviewing recent applications within
neuroimaging-based psychiatric research, such as the diagnosis of psychiatric
diseases, delineation of disease subtypes, normative modeling, and the
development of neuroimaging biomarkers, we discuss current challenges. This
includes for example the difficulty of training models on small, heterogeneous
and biased data sets, the lack of validity of clinical labels, algorithmic
bias, and the influence of confounding variables
The effect of using multiple connectivity metrics in brain Functional Connectivity studies
Tese de mestrado integrado, Engenharia BiomĂ©dica e BiofĂsica (Sinais e Imagens MĂ©dicas) Universidade de Lisboa, Faculdade de CiĂȘncias, 2022Resting-state functional magnetic resonance imaging (rs-fMRI) has the potential to assist as a
diagnostic or prognostic tool for a diverse set of neurological and neuropsychiatric disorders, which are
often difficult to differentiate. fMRI focuses on the study of the brain functional Connectome, which is
characterized by the functional connections and neuronal activity among different brain regions, also
interpreted as communications between pairs of regions. This Functional Connectivity (FC) is quantified
through the statistical dependences between brain regionsâ blood-oxygen-level-dependent (BOLD)
signals time-series, being traditionally evaluated by correlation coefficient metrics and represented as
FC matrices. However, several studies underlined limitations regarding the use of correlation metrics to
fully capture information from these signals, leading investigators towards different statistical metrics
that would fill those shortcomings. Recently, investigators have turned their attention to Deep Learning
(DL) models, outperforming traditional Machine Learning (ML) techniques due to their ability to
automatically extract relevant information from high-dimensional data, like FC data, using these models
with rs-fMRI data to improve diagnostic predictions, as well as to understand pathological patterns in
functional Connectome, that can lead to the discovery of new biomarkers. In spite of very encouraging
performances, the black-box nature of DL algorithms makes difficult to know which input information
led the model to a certain prediction, restricting its use in clinical settings.
The objective of this dissertation is to exploit the power of DL models, understanding how FC
matrices created from different statistical metrics can provide information about the brain FC, beyond
the conventionally used correlation family. Two publicly available datasets where studied, the ABIDE I dataset, composed by healthy and autism spectrum disease (ASD) individuals, and the ADHD-200
dataset, with typically developed controls and individuals with attention-deficit/hyperactive disorder
(ADHD). The computation of the FC matrices of both datasets, using different statistical metrics, was
performed in MATLAB using MULANâs toolbox functions, encompassing the correlation coefficient,
non-linear correlation coefficient, mutual information, coherence and transfer entropy. The
classification of FC data was performed using two DL models, the improved ConnectomeCNN model
and the innovative ConnectomeCNN-Autoencoder model. Moreover, another goal is to study the effect
of a multi-metric approach in classification performances, combining multiple FC matrices computed
from the different statistical metrics used, as well as to study the use of Explainable Artificial
Intelligence (XAI) techniques, namely Layer-wise Relevance Propagation method (LRP), to surpass the
black-box problem of DL models used, in order to reveal the most important brain regions in ADHD.
The results show that the use of other statistical metrics to compute FC matrices can be a useful
complement to the traditional correlation metric methods for the classification between healthy subjects
and subjects diagnosed with ADHD and ASD. Namely, non-linear metrics like h2 and mutual
information, achieved similar and, in some cases, even slightly better performances than correlation
methods. The use of FC multi-metric, despite not showing improvements in classification performance
compared to the best individual method, presented promising results, namely the ability of this approach
to select the best features from all the FC matrices combined, achieving a similar performance in relation
to the best individual metric in each of the evaluation measures of the model, leading to a more complete
classification. The LRP analysis applied to ADHD-200 dataset proved to be promising, identifying brain
regions related to the pathophysiology of ADHD, which are in broad accordance with FC and structural
studyâs findings.A ressonĂąncia magnĂ©tica funcional em estado de repouso (rs-fMRI) tem o potencial de ser uma
ferramenta auxiliar de diagnĂłstico ou prognĂłstico para um conjunto diversificado de distĂșrbios
neurolĂłgicos e neuropsiquiĂĄtricos, que muitas vezes sĂŁo difĂceis de diferenciar. A anĂĄlise de dados de
rs-fMRI recorre muitas vezes ao conceito de conectoma funcional do cérebro, que se caracteriza pelas
conexÔes funcionais entre as diferentes regiÔes do cérebro, sendo estas conexÔes interpretadas como
comunicaçÔes entre diferentes pares de regiÔes cerebrais. Esta conectividade funcional é quantificada
atravĂ©s de dependĂȘncias estatĂsticas entre os sinais fMRI das regiĂ”es cerebrais, sendo estas
tradicionalmente calculadas através da métrica coeficiente de correlação, e representadas através de
matrizes de conectividade funcional. No entanto, vårios estudos demonstraram limitaçÔes em relação ao
uso de métricas de correlação, em que estas não conseguem capturar por completo todas as informaçÔes
presentes nesses sinais, levando os investigadores Ă procura de diferentes mĂ©tricas estatĂsticas que
pudessem preencher essas lacunas na obtenção de informaçÔes mais completas desses sinais.
O estudo destes distĂșrbios neurolĂłgicos e neuropsiquiĂĄtricos começou por se basear em tĂ©cnicas
como mapeamento paramĂ©trico estatĂstico, no contexto de estudos de fMRI baseados em tarefas. PorĂ©m,
essas técnicas apresentam certas limitaçÔes, nomeadamente a suposição de que cada região cerebral atua
de forma independente, o que não corresponde ao conhecimento atual sobre o funcionamento do cérebro.
O surgimento da rs-fMRI permitiu obter uma perspetiva mais global e deu origem a uma vasta literatura
sobre o efeito de patologias nos padrÔes de conetividade em repouso, incluindo tentativas de diagnóstico
automatizado com base em biomarcadores extraĂdos dos conectomas. Nos Ășltimos anos, os
investigadores voltaram a sua atenção para tĂ©cnicas de diferentes ramos de InteligĂȘncia Artificial, mais
propriamente para os algoritmos de Deep Learning (DL), uma vez que sĂŁo capazes de superar os
algoritmos tradicionais de Machine Learning (ML), que foram aplicados a estes estudos numa fase
inicial, devido à sua capacidade de extrair automaticamente informaçÔes relevantes de dados de alta
dimensĂŁo, como Ă© o caso dos dados de conectividade funcional. Esses modelos utilizam os dados obtidos
da rs-fMRI para melhorar as previsÔes de diagnóstico em relação às técnicas usadas atualmente em
termos de precisão e rapidez, bem como para compreender melhor os padrÔes patológicos nas conexÔes
funcionais destes distĂșrbios, podendo levar Ă descoberta de novos biomarcadores. Apesar do notĂĄvel
desempenho destes modelos, a arquitetura natural em caixa-preta dos algoritmos de DL, torna difĂcil
saber quais as informaçÔes dos dados de entrada que levaram o modelo a executar uma determinada
previsĂŁo, podendo este utilizar informaçÔes erradas dos dados para alcançar uma dada inferĂȘncia,
restringindo o seu uso em ambientes clĂnicos.
O objetivo desta dissertação, desenvolvida no Instituto de BiofĂsica e Engenharia BiomĂ©dica, Ă©
explorar o poder dos modelos DL, de forma a avaliar até que ponto matrizes de conectividade funcional
criadas a partir de diferentes mĂ©tricas estatĂsticas podem fornecer mais informaçÔes sobre a
conectividade funcional do cérebro, para além das métricas de correlação convencionalmente usadas
neste tipo de estudos. Foram estudados dois conjuntos de dados bastante utilizados em estudos de
NeurociĂȘncia e que estĂŁo disponĂveis publicamente: o conjunto de dados ABIDE-I, composto por
indivĂduos saudĂĄveis e indivĂduos com doenças do espectro do autismo (ASD), e o conjunto de dados
ADHD-200, com controlos tipicamente desenvolvidos e indivĂduos com transtorno do dĂ©fice de atenção
e hiperatividade (ADHD).
Numa primeira fase foi realizada a computação das matrizes de conetividade funcional de ambos os
conjuntos de dados, usando as diferentes mĂ©tricas estatĂsticas. Para isso, foi desenvolvido cĂłdigo de
MATLAB, onde se utilizam as séries temporais dos sinais BOLD obtidas dos dois conjuntos de dados
para criar essas mesmas matrizes de conectividade funcional, incorporando funçÔes de diferentes
mĂ©tricas estatĂsticas da caixa de ferramentas MULAN, compreendendo o coeficiente de correlação, o
coeficiente de correlação nĂŁo linear, a informação mĂștua, a coerĂȘncia e a entropia de transferĂȘncia. De
seguida, a classificação dos dados de conectividade funcional, de forma a avaliar o efeito do uso de
diferentes mĂ©tricas estatĂsticas para a criação de matrizes de conectividade funcional na discriminação
de sujeitos saudĂĄveis e patolĂłgicos, foi realizada usando dois modelos de DL. O modelo
ConnectomeCNN melhorado e o modelo inovador ConnectomeCNN-Autoencoder foram desenvolvidos
com recurso Ă biblioteca de Redes Neuronais Keras, juntamente com o seu backend Tensorflow, ambos
em Python. Estes modelos, desenvolvidos previamente no Instituto de BiofĂsica e Engenharia
Biomédica, tiveram de ser otimizados de forma a obter a melhor performance, onde vårios parùmetros
dos modelos e do respetivo treino dos mesmos foram testados para os dados a estudar. Pretendeu-se
também estudar o efeito de uma abordagem multi-métrica nas tarefas de classificação dos sujeitos de
ambos os conjuntos de dados, sendo que, para estudar essa abordagem as diferentes matrizes calculadas
a partir das diferentes mĂ©tricas estatĂsticas utilizadas, foram combinadas, sendo usados os mesmos
modelos que foram aplicados Ă s matrizes de conectividade funcional de cada mĂ©trica estatĂstica
individualmente. à importante realçar que na abordagem multi-métrica também foi realizada a
otimização dos parùmetros dos modelos utilizados e do respetivo treino, de modo a conseguir a melhor
performance dos mesmos para estes dados. Para além destes dois objetivos, estudou-se o uso de técnicas
de InteligĂȘncia Artificial ExplicĂĄvel (XAI), mais especificamente o mĂ©todo Layer-wise Relevance
Propagation (LRP), com vista a superar o problema da caixa-preta dos modelos de DL, com a finalidade
de explicar como Ă© que os modelos estĂŁo a utilizar os dados de entrada para realizar uma dada previsĂŁo.
O método LRP foi aplicado aos dois modelos utilizados anteriormente, usando como dados de entrada
o conjunto de dados ADHD-200, permitindo assim revelar quais as regiÔes cerebrais mais importantes
no que toca a um diagnĂłstico relacionado com o ADHD.
Os resultados obtidos mostram que o uso de outras mĂ©tricas estatĂsticas para criar as matrizes de
Conectividade Funcional podem ser um complemento bastante Ăștil Ă s mĂ©tricas estatĂsticas
tradicionalmente utilizadas para a classificação entre indivĂduos saudĂĄveis e indivĂduos como ASD e
ADHD. Nomeadamente mĂ©tricas estatĂsticas nĂŁo lineares como o h2 e a informação mĂștua, obtiveram
desempenhos semelhantes e, em alguns casos, desempenhos ligeiramente melhores em relação aos
desempenhos obtidos por métodos de correlação, convencionalmente usados nestes estudos de
conectividade funcional. A utilização da multi-métrica de conectividade funcional, apesar de não
apresentar melhorias no desempenho geral da classificação em relação ao melhor método das matrizes
de conectividade funcional individuais do conjunto de mĂ©tricas estatĂsticas abordadas, apresenta
resultados que justificam a exploração mais aprofundada deste tipo de abordagem, de forma a
compreender melhor a complementaridade das métricas e a melhor maneira de as utilizar. O uso do
método LRP aplicado ao conjunto de dados do ADHD-200 mostrou a sua aplicabilidade a este tipo de
estudos e a modelos de DL, identificando as regiÔes cerebrais mais relacionadas à fisiopatologia do
diagnĂłstico do ADHD que sĂŁo compatĂveis com o que Ă© reportado por diversos estudos de conectividade
funcional e estudos de alteraçÔes estruturais associados a esta doença. O facto destas técnicas de XAI
demonstrarem como é que os modelos de DL estão a usar os dados de entrada para efetuar as previsÔes,
pode significar uma mais rĂĄpida e aceite adoção destes algoritmos em ambientes clĂnicos. Estas tĂ©cnicas
podem auxiliar o diagnĂłstico e prognĂłstico destes distĂșrbios neurolĂłgicos e neuropsiquiĂĄtricos, que sĂŁo
na maioria das vezes difĂceis de diferenciar, permitindo aos mĂ©dicos adquirirem um conhecimento em
relação à previsão realizada e poder explicar a mesma aos seus pacientes
Conceptualization of Computational Modeling Approaches and Interpretation of the Role of Neuroimaging Indices in Pathomechanisms for Pre-Clinical Detection of Alzheimer Disease
With swift advancements in next-generation sequencing technologies alongside the voluminous growth of biological data, a diversity of various data resources such as databases and web services have been created to facilitate data management, accessibility, and analysis. However, the burden of interoperability between dynamically growing data resources is an increasingly rate-limiting step in biomedicine, specifically concerning neurodegeneration. Over the years, massive investments and technological advancements for dementia research have resulted in large proportions of unmined data. Accordingly, there is an essential need for intelligent as well as integrative approaches to mine available data and substantiate novel research outcomes. Semantic frameworks provide a unique possibility to integrate multiple heterogeneous, high-resolution data resources with semantic integrity using standardized ontologies and vocabularies for context- specific domains. In this current work, (i) the functionality of a semantically structured terminology for mining pathway relevant knowledge from the literature, called Pathway Terminology System, is demonstrated and (ii) a context-specific high granularity semantic framework for neurodegenerative diseases, known as NeuroRDF, is presented. Neurodegenerative disorders are especially complex as they are characterized by widespread manifestations and the potential for dramatic alterations in disease progression over time. Early detection and prediction strategies through clinical pointers can provide promising solutions for effective treatment of AD. In the current work, we have presented the importance of bridging the gap between clinical and molecular biomarkers to effectively contribute to dementia research. Moreover, we address the need for a formalized framework called NIFT to automatically mine relevant clinical knowledge from the literature for substantiating high-resolution cause-and-effect models
- âŠ