Semantic annotation and summarization of biomedical text

Abstract

Advancements in the biomedical community are largely documented and published in text format in scientific forums such as conference papers and journals. To address the scalability of utilizing the large volume of text-based information generated by continuing advances in the biomedical field, two complementary areas are studied. The first area is Semantic Annotation, which is a method for providing machineunderstandable information based on domain-specific resources. A novel semantic annotator, CONANN, is implemented for online matching of concepts defined by a biomedical metathesaurus. CONANN uses a multi-level filter based on both information retrieval and shallow natural language processing techniques. CONANN is evaluated against a state-of-the-art biomedical annotator using the performance measures of time (e.g. number of milliseconds per noun phrase) and precision/recall of the resulting concept matches. CONANN shows that annotation can be performed online, rather than offline, without a significant loss of precision and recall as compared to current offline systems. The second area of study is Text Summarization which is used as a way to perform data reduction of clinical trial texts while still describing the main themes of a biomedical document. The text summarization work is unique in that it focuses exclusively on summarizing biomedical full-text sources as opposed to abstracts, and also exclusively uses domain-specific concepts, rather than terms, to identify important information within a biomedical text. Two novel text summarization algorithms are implemented: one using a concept chaining method based on existing work in lexical chaining (BioChain), and the other using concept distribution to match important sentences between a source text and a generated summary (FreqDist). The BioChain and FreqDist summarizers are evaluated using the publicly-available ROUGE summary evaluation tool. ROUGE compares n-gram co-occurrences between a system summary and one or more model summaries. The text summarization evaluation shows that the two approaches outperform nearly all of the existing term-based approaches.Ph.D., Information Science and Technology -- Drexel University, 200

    Similar works