28 research outputs found

    Recuperação de informação baseada em frases para textos biomédicos

    Get PDF
    Mestrado em Engenharia de Computadores e TelemáticaO desenvolvimento de novos métodos experimentais e tecnologias de alto rendimento no campo biomédico despoletou um crescimento acelerado do volume de publicações científicas na área. Inúmeros repositórios estruturados para dados biológicos foram criados ao longo das últimas décadas, no entanto, os utilizadores estão cada vez mais a recorrer a sistemas de recuperação de informação, ou motores de busca, em detrimento dos primeiros. Motores de pesquisa apresentam-se mais fáceis de usar devido à sua flexibilidade e capacidade de interpretar os requisitos dos utilizadores, tipicamente expressos na forma de pesquisas compostas por algumas palavras. Sistemas de pesquisa tradicionais devolvem documentos completos, que geralmente requerem um grande esforço de leitura para encontrar a informação procurada, encontrando-se esta, em grande parte dos casos, descrita num trecho de texto composto por poucas frases. Além disso, estes sistemas falham frequentemente na tentativa de encontrar a informação pretendida porque, apesar de a pesquisa efectuada estar normalmente alinhada semanticamente com a linguagem usada nos documentos procurados, os termos usados são lexicalmente diferentes. Esta dissertação foca-se no desenvolvimento de técnicas de recuperação de informação baseadas em frases que, para uma dada pesquisa de um utilizador, permitam encontrar frases relevantes da literatura científica que respondam aos requisitos do utilizador. O trabalho desenvolvido apresenta-se em duas partes. Primeiro foi realizado trabalho de investigação exploratória para identificação de características de frases informativas em textos biomédicos. Para este propósito foi usado um método de aprendizagem automática. De seguida foi desenvolvido um sistema de pesquisa de frases informativas. Este sistema suporta pesquisas de texto livre e baseadas em conceitos, os resultados de pesquisa apresentam-se enriquecidos com anotações de conceitos relevantes e podem ser ordenados segundo várias estratégias de classificação.Modern advances of experimental methods and high-throughput technology in the biomedical domain are causing a fast-paced, rising growth of the volume of published scientific literature in the field. While a myriad of structured data repositories for biological knowledge have been sprouting over the last decades, Information Retrieval (IR) systems are increasingly replacing them. IR systems are easier to use due to their flexibility and ability to interpret user needs in the form of queries, typically formed by a few words. Traditional document retrieval systems return entire documents, which may require a lot of subsequent reading to find the specific information sought, frequently contained in a small passage of only a few sentences. Additionally, IR often fails to find what is wanted because the words used in the query are lexically different, despite semantically aligned, from the words used in relevant sources. This thesis focuses on the development of sentence-based information retrieval approaches that, for a given user query, allow seeking relevant sentences from scientific literature that answer the user information need. The presented work is two-fold. First, exploratory research experiments were conducted for the identification of features of informative sentences from biomedical texts. A supervised machine learning method was used for this purpose. Second, an information retrieval system for informative sentences was developed. It supports free text and concept-based queries, search results are enriched with relevant concept annotations and sentences can be ranked using multiple configurable strategies

    Agile in-litero experiments:how can semi-automated information extraction from neuroscientific literature help neuroscience model building?

    Get PDF
    In neuroscience, as in many other scientific domains, the primary form of knowledge dissemination is through published articles in peer-reviewed journals. One challenge for modern neuroinformatics is to design methods to make the knowledge from the tremendous backlog of publications accessible for search, analysis and its integration into computational models. In this thesis, we introduce novel natural language processing (NLP) models and systems to mine the neuroscientific literature. In addition to in vivo, in vitro or in silico experiments, we coin the NLP methods developed in this thesis as in litero experiments, aiming at analyzing and making accessible the extended body of neuroscientific literature. In particular, we focus on two important neuroscientific entities: brain regions and neural cells. An integrated NLP model is designed to automatically extract brain region connectivity statements from very large corpora. This system is applied to a large corpus of 25M PubMed abstracts and 600K full-text articles. Central to this system is the creation of a searchable database of brain region connectivity statements, allowing neuroscientists to gain an overview of all brain regions connected to a given region of interest. More importantly, the database enables researcher to provide feedback on connectivity results and links back to the original article sentence to provide the relevant context. The database is evaluated by neuroanatomists on real connectomics tasks (targets of Nucleus Accumbens) and results in significant effort reduction in comparison to previous manual methods (from 1 week to 2h). Subsequently, we introduce neuroNER to identify, normalize and compare instances of identify neuronsneurons in the scientific literature. Our method relies on identifying and analyzing each of the domain features used to annotate a specific neuron mention, like the morphological term 'basket' or brain region 'hippocampus'. We apply our method to the same corpus of 25M PubMed abstracts and 600K full-text articles and find over 500K unique neuron type mentions. To demonstrate the utility of our approach, we also apply our method towards cross-comparing the NeuroLex and Human Brain Project (HBP) cell type ontologies. By decoupling a neuron mention's identity into its specific compositional features, our method can successfully identify specific neuron types even if they are not explicitly listed within a predefined neuron type lexicon, thus greatly facilitating cross-laboratory studies. In order to build such large databases, several tools and infrastructureslarge-scale NLP were developed: a robust pipeline to preprocess full-text PDF articles, as well as bluima, an NLP processing pipeline specialized on neuroscience to perform text-mining at PubMed scale. During the development of those two NLP systems, we acknowledged the need for novel NLP approaches to rapidly develop custom text mining solutions. This led to the formalization of the agile text miningagile text-mining methodology to improve the communication and collaboration between subject matter experts and text miners. Agile text mining is characterized by short development cycles, frequent tasks redefinition and continuous performance monitoring through integration tests. To support our approach, we developed Sherlok, an NLP framework designed for the development of agile text mining applications

    The BioLexicon: a Large-Scale Domain-Specific Lexical Resource for Biomedical Text Mining

    Get PDF
    The talk will focus on building a biolexicon by leveraging existing bio-resources, combining them within a common, standardized lexical, terminological, framework and employing advanced NL technologies to discover new terms, concepts, relations and linguistic lexical information from text

    Clinical Natural Language Processing in languages other than English: opportunities and challenges

    Get PDF
    Background: Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. Main Body We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. Conclusion: We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages

    Modelling characters’ mental depth in stories told by children aged 4-10

    Get PDF
    From age 3-4, children are generally capable of telling stories about a topic free of choice. Over the years their stories become more sophisticated in content and structure, reflecting various aspects of cognitive development. Here we focus on children’s ability to construe characters with increasing levels of mental depth, arguably reflecting socio-cognitive capacities including Theory of Mind. Within our sample of 51 stories told by children aged 4-10, characters range from flat “actors” performing simple actions, to “agents” having basic perceptive, emotional, and intentional capacities, to fully-blown “persons” with complex inner lives. We argue for the underexplored potential of computationally extracted story-internal factors (e.g. lexical/syntactic complexity) in explaining variance in character depth, as opposed to story-external factors (e.g. age, socioeconomic status) on which existing work has focused. We show that especially lexical richness explains variance in character depth, and this effect is larger than and not moderated by age.NWOVI.Veni.191C.051Computer Systems, Imagery and Medi

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available
    corecore