808 research outputs found

    Using distributional similarity to organise biomedical terminology

    Get PDF
    We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy

    Ontologies and Information Extraction

    Full text link
    This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

    CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT

    Get PDF
    Medical Subject Headings (MeSH) is a controlled vocabulary used by the National Library of Medicine to index medical articles, abstracts, and journals contained within the MEDLINE database. Although MeSH imposes uniformity and consistency in the indexing process, it has been proven that using MeSH indices only result in a small increase in precision over free-text indexing. Moreover, studies have shown that the use of controlled vocabularies in the indexing process is not an effective method to increase semantic relevance in information retrieval. To address the need for semantic relevance, we present an ontology-based information retrieval system for the MEDLINE collection that result in a 37.5% increase in precision when compared to free-text indexing systems. The presented system focuses on the ontology to: provide an alternative to text-representation for medical articles, finding relationships among co-occurring terms in abstracts, and to index terms that appear in text as well as discovered relationships. The presented system is then compared to existing MeSH and Free-Text information retrieval systems. This dissertation provides a proof-of-concept for an online retrieval system capable of providing increased semantic relevance when searching through medical abstracts in MEDLINE

    NEMO: Extraction and normalization of organization names from PubMed affiliations

    Get PDF
    Background: We are witnessing an exponential increase in biomedical research citations in PubMed. However, translating biomedical discoveries into practical treatments is estimated to take around 17 years, according to the 2000 Yearbook of Medical Informatics, and much information is lost during this transition. Pharmaceutical companies spend huge sums to identify opinion leaders and centers of excellence. Conventional methods such as literature search, survey, observation, self‐identification, expert opinion, and sociometry not only need much human effort, but are also non‐comprehensive. Such huge delays and costs can be reduced by “connecting those who produce the knowledge with those who apply it”. A humble step in this direction is large‐scale discovery of persons and organizations involved in specific areas of research. This can be achieved by automatically extracting and disambiguating author names and affiliation strings retrieved through Medical Subject Heading (MeSH) terms and other keywords associated with articles in PubMed. In this study, we propose NEMO (Normalization Engine for Matching Organizations), a system for extracting organization names from the affiliation strings provided in PubMed abstracts, building a thesaurus (list of synonyms) of organization names, and subsequently normalizing them to a canonical organization name using the thesaurus. Results: We used a parsing process that involves multi‐layered rule matching with multiple dictionaries. The normalization process involves clustering based on weighted local sequence alignment metrics to address synonymy at word level, and local learning based on finding connected components to address synonymy. The graphical user interface and java client library of NEMO are available at http://lnxnemo.sourceforge.net. Conclusion: NEMO associates each biomedical paper and its authors with a unique organization name and the geopolitical location of that organization. This system provides more accurate information about organizations than the raw affiliation strings provided in PubMed abstracts. It can be used for : a) bimodal social network analysis that evaluates the research relationships between individual researchers and their institutions; b) improving author name disambiguation; c) augmenting National Library of Medicine (NLM)’s Medical Articles Record System (MARS) system for correcting errors due to OCR on affiliation strings that are in small fonts; and d) improving PubMed citation indexing strategies (authority control) based on normalized organization name and country

    Ontology-assisted database integration to support natural language processing and biomedical data-mining

    Get PDF
    Successful biomedical data mining and information extraction require a complete picture of biological phenomena such as genes, biological processes, and diseases; as these exist on different levels of granularity. To realize this goal, several freely available heterogeneous databases as well as proprietary structured datasets have to be integrated into a single global customizable scheme. We will present a tool to integrate different biological data sources by mapping them to a proprietary biomedical ontology that has been developed for the purposes of making computers understand medical natural language

    A framework for the development of biomedical text mining software tools

    Get PDF
    Over the last few years, a growing number of techniques has been successfully proposed to tackle diverse challenges in the Biomedical Text Mining (BioTM) arena. However, the set of available software tools to researchers has not grown in a similar way. This work makes a contribution to close this gap, proposing a framework to ease the development of user-friendly and interoperable applications in this field, based on a set of available modular components. These modules can be connected in diverse ways to create applications that fit distinct user roles. Also, developers of new algorithms have a framework that allows them to easily integrate their implementations with state-of-the-art BioTM software for related tasks.This work was supported in part by the research projects recSysBio (ref. POCI/BIO/60139/2004) and MOBioPro (ref. POSC/EW59899/2004) of the University of Minho, financed by the Portuguese Fundaao para a Ciencia e Tecnologia. The work of SC is supported by a PhD grant from the same institution (ref. SFRH/BD/22863/2005)
    • 

    corecore