11 research outputs found

    ODIN: An Advanced Interface for the Curation of Biomedical Literature

    Get PDF

    The gene normalization task in BioCreative III

    Get PDF
    BACKGROUND: We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500 partially annotated articles were prepared. A total of 507 articles were selected as the test set. Due to the high annotation cost, it was not feasible to obtain gold-standard human annotations for all test articles. Instead, we developed an Expectation Maximization (EM) algorithm approach for choosing a small number of test articles for manual annotation that were most capable of differentiating team performance. Moreover, the same algorithm was subsequently used for inferring ground truth based solely on team submissions. We report team performance on both gold standard and inferred ground truth using a newly proposed metric called Threshold Average Precision (TAP-k). RESULTS: We received a total of 37 runs from 14 different teams for the task. When evaluated using the gold-standard annotations of the 50 articles, the highest TAP-k scores were 0.3297 (k=5), 0.3538 (k=10), and 0.3535 (k=20), respectively. Higher TAP-k scores of 0.4916 (k=5, 10, 20) were observed when evaluated using the inferred ground truth over the full test set. When combining team results using machine learning, the best composite system achieved TAP-k scores of 0.3707 (k=5), 0.4311 (k=10), and 0.4477 (k=20) on the gold standard, representing improvements of 12.4%, 21.8%, and 26.6% over the best team results, respectively. CONCLUSIONS: By using full text and being species non-specific, the GN task in BioCreative III has moved closer to a real literature curation task than similar tasks in the past and presents additional challenges for the text mining community, as revealed in the overall team results. By evaluating teams using the gold standard, we show that the EM algorithm allows team submissions to be differentiated while keeping the manual annotation effort feasible. Using the inferred ground truth we show measures of comparative performance between teams. Finally, by comparing team rankings on gold standard vs. inferred ground truth, we further demonstrate that the inferred ground truth is as effective as the gold standard for detecting good team performance

    Text Structures in Medical Text Processing: Empirical Evidence and a Text Understanding Prototype

    No full text
    this paper, we shall challenge this view. We stipulate that medical texts, as any other text sort, exhibit textual structures and that disregarding these structural relations will lead to underdetermined or even invalid content representations. To render support to our argument we conducted an empirical investigation of medical findings reports from a large clinical text database in order to assess whether this issue is really relevant. First, we will describe the experimental setting and elaborate on the quantitative distribution of various text phenomena in the sample. We will then turn to the consequences of not taking textual structures into account and show how referentially incoherent and referentially invalid text knowledge representation structures are likely to emerge. This is illustrated considering a small text fragment as analyzed by our system prototype, the Medical Knowledge SYNDIKATE. We shall focus on those aspects of the system design which account for the proper analysis of text phenomena. Up to now, such a functionality has to the best of our knowledge not been provided by any other system for medical text processing (for a survey, cf. [4])

    Why Discourse Structures in Medical Reports Matter for the Validity of Automatically Generated Text Knowledge Bases

    No full text
    The automatic analysis of medical full-texts currently suffers from neglecting text coherence phenomena such as reference relations between discourse units. This has unwarranted effects on the descriptional adequacy of medical knowledge bases automatically generated from texts. The resulting representation bias can be characterized in terms of artificially fragmented, incomplete and invalid knowledge structures. We discuss three types of textual phenomena (pronominal and nominal anaphora, as well as textual ellipsis) and outline basic methodologies how to deal with them. Keywords: natural language processing, pathology Introduction With the overall diffusion of electronic text processing technology in clinical offices and at the physician's workplace, and, more recently, the unlimited access to text resources in the Internet, a vast potential for medical information supply arises. The natural language processing community has responded to the urgent needs of real-world text process..

    An Ontological Engineering Methodology for Part-Whole Reasoning in Medicine

    No full text
    Part-whole relationships are fundamental ontological categories for medical reasoning. Part-whole modeling, however, still provides no conclusive methodology for adequate representations. We propose a new representation construct for part-whole reasoning based on the formal framework of description logics, thereby overcoming problems that arise in the context of previous formal approaches to part-whole modeling, as well as widely spread comprehensive medical terminologies. Introduction In medical informatics research, knowledge representation issues have been emphasized in recent years. It is becoming obvious that efficient classification, processing of structured data and free texts, as well as a broad variety of sophisticated information retrieval services (e.g., fact retrieval, text passage retrieval) and knowledge-based decision support require a common conceptual framework to facilitate semantic interoperability (Evans et al., 1994; Friedman et al., 1995). Concept systems routi..
    corecore