38 research outputs found

    Linguistische und semantische Annotation eines Zeitungskorpus

    Full text link
    Dieser Artikel beschreibt das Vorgehen beim automatischen inkrementellen Aufbereiten eines rohen Textkorpus mit linguistischer und semantischer Information. Es wird gezeigt, wie das Erkennen von Eigennamen hilft, die Wortartenkategorisierung und partielle syntaktische Analysen zu verbessern. Eine Evaluation über ca. 1000 Sätze zeigt die Stärken und Schwachpunkte der verschiedenen Erkenner auf

    Ranking Interactions for a Curation Task

    Full text link
    One of the key pieces of information which biomedical text mining systems are expected to extract from the literature are interactions among different types of biomedical entities (proteins, genes, diseases, drugs, etc.). Different types of entities might be considered, for example protein-protein interactions have been extensively studied as part of the Bio Creative competitive evaluations. However, more complex interactions such as those among genes, drugs, and diseases are increasingly of interest. Different databases have been used as reference for the evaluation of extraction and ranking techniques. The aim of this paper is to describe a machine-learning based reranking approach for candidate interactions extracted from the literature. The results are evaluated using data derived from the Pharm GKB database. The importance of a good ranking is particularly evident when the results are applied to support human curators

    OntoGene in BioCreative II

    Full text link
    BACKGROUND: Research scientists and companies working in the domains of biomedicine and genomics are increasingly faced with the problem of efficiently locating, within the vast body of published scientific findings, the critical pieces of information that are needed to direct current and future research investment. RESULTS: In this report we describe approaches taken within the scope of the second BioCreative competition in order to solve two aspects of this problem: detection of novel protein interactions reported in scientific articles, and detection of the experimental method that was used to confirm the interaction. Our approach to the former problem is based on a high-recall protein annotation step, followed by two strict disambiguation steps. The remaining proteins are then combined according to a number of lexico-syntactic filters, which deliver high-precision results while maintaining reasonable recall. The detection of the experimental methods is tackled by a pattern matching approach, which has delivered the best results in the official BioCreative evaluation. CONCLUSION: Although the results of BioCreative clearly show that no tool is sufficiently reliable for fully automated annotations, a few of the proposed approaches (including our own) already perform at a competitive level. This makes them interesting either as standalone tools for preliminary document inspection, or as modules within an environment aimed at supporting the process of curation of biomedical literature

    BioCreative III interactive task: an overview

    Get PDF
    The BioCreative challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. The biocurator community, as an active user of biomedical literature, provides a diverse and engaged end user group for text mining tools. Earlier BioCreative challenges involved many text mining teams in developing basic capabilities relevant to biological curation, but they did not address the issues of system usage, insertion into the workflow and adoption by curators. Thus in BioCreative III (BC-III), the InterActive Task (IAT) was introduced to address the utility and usability of text mining tools for real-life biocuration tasks. To support the aims of the IAT in BC-III, involvement of both developers and end users was solicited, and the development of a user interface to address the tasks interactively was requested

    A morpho-syntactic generation service for German glossary entries

    Full text link

    An OLIF-based open inflectional resource and yet another morphological system for German

    Full text link

    Selektive Evaluation von robusten Parsern

    Full text link
    Die Verfügbarkeit von Parsern mit einem hohen Abdeckungsgrad weckt das Bedürfnis nach systematischer und breiter Evaluation der Leistungsfähigkeit dieser Programme. Um linguistisch interpretierbare Masse zu erhalten, schlagen wir eine selektive Methode vor, die relevante Analysemerkmale aus dem Parsebaum projiziert, misst und robusten Resultaten so ein robustes Messverfahren beistellt. Dies erlaubt mit vertretbarem Aufwand zwei Parser fürs Deutsche gegenüber einem syntaktisch annotierten Korpus zu evaluieren, obwohl alle drei Komponenten auf unterschiedlichen Grammatikmodellen beruhen
    corecore