808 research outputs found
Using distributional similarity to organise biomedical terminology
We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE
CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT
Medical Subject Headings (MeSH) is a controlled vocabulary used by the National Library of Medicine to index medical articles, abstracts, and journals contained within the MEDLINE database. Although MeSH imposes uniformity and consistency in the indexing process, it has been proven that using MeSH indices only result in a small increase in precision over free-text indexing. Moreover, studies have shown that the use of controlled vocabularies in the indexing process is not an effective method to increase semantic relevance in information retrieval. To address the need for semantic relevance, we present an ontology-based information retrieval system for the MEDLINE collection that result in a 37.5% increase in precision when compared to free-text indexing systems. The presented system focuses on the ontology to: provide an alternative to text-representation for medical articles, finding relationships among co-occurring terms in abstracts, and to index terms that appear in text as well as discovered relationships. The presented system is then compared to existing MeSH and Free-Text information retrieval systems. This dissertation provides a proof-of-concept for an online retrieval system capable of providing increased semantic relevance when searching through medical abstracts in MEDLINE
NEMO: Extraction and normalization of organization names from PubMed affiliations
Background: We are witnessing an exponential increase in biomedical research citations in PubMed. However, translating biomedical discoveries into practical treatments is estimated to
take around 17 years, according to the 2000 Yearbook of Medical Informatics, and much information is lost during this transition. Pharmaceutical companies spend huge sums to identify opinion leaders and centers of excellence. Conventional methods such as literature search, survey, observation, selfâidentification, expert opinion, and sociometry not only need much human effort, but are also nonâcomprehensive. Such huge delays and costs can be reduced by âconnecting those who produce the knowledge with those who apply itâ. A humble step in this direction is largeâscale discovery of persons and organizations involved in specific areas of research. This can be achieved by automatically extracting and disambiguating author names and affiliation strings retrieved through Medical Subject Heading (MeSH) terms and other keywords associated with articles in PubMed. In this study, we propose NEMO (Normalization Engine for Matching Organizations), a system for extracting organization names from the affiliation strings provided in PubMed abstracts, building a thesaurus (list of synonyms) of organization names, and subsequently normalizing them to a canonical organization name using
the thesaurus.
Results: We used a parsing process that involves multiâlayered rule matching with multiple dictionaries. The normalization process involves clustering based on weighted local sequence
alignment metrics to address synonymy at word level, and local learning based on finding connected components to address synonymy. The graphical user interface and java client library
of NEMO are available at http://lnxnemo.sourceforge.net.
Conclusion: NEMO associates each biomedical paper and its authors with a unique organization name and the geopolitical location of that organization. This system provides more accurate
information about organizations than the raw affiliation strings provided in PubMed abstracts. It can be used for : a) bimodal social network analysis that evaluates the research relationships
between individual researchers and their institutions; b) improving author name disambiguation; c) augmenting National Library of Medicine (NLM)âs Medical Articles Record
System (MARS) system for correcting errors due to OCR on affiliation strings that are in small fonts; and d) improving PubMed citation indexing strategies (authority control) based on
normalized organization name and country
Ontology-assisted database integration to support natural language processing and biomedical data-mining
Successful biomedical data mining and information extraction require a complete picture of biological phenomena such as genes, biological processes, and diseases; as these exist on different levels of granularity. To realize this goal, several freely available heterogeneous databases as well as proprietary structured datasets have to be integrated into a single global customizable scheme. We will present a tool to integrate different biological data sources by mapping them to a proprietary biomedical ontology that has been developed for the purposes of making computers understand medical natural language
A framework for the development of biomedical text mining software tools
Over the last few years, a growing number of
techniques has been successfully proposed to tackle diverse
challenges in the Biomedical Text Mining (BioTM) arena.
However, the set of available software tools to researchers has
not grown in a similar way. This work makes a contribution to
close this gap, proposing a framework to ease the development
of user-friendly and interoperable applications in this field,
based on a set of available modular components. These modules
can be connected in diverse ways to create applications that fit
distinct user roles. Also, developers of new algorithms have a
framework that allows them to easily integrate their
implementations with state-of-the-art BioTM software for
related tasks.This work was supported in part by the research projects recSysBio (ref. POCI/BIO/60139/2004) and MOBioPro (ref. POSC/EW59899/2004) of the University of Minho, financed by the Portuguese Fundaao para a Ciencia e Tecnologia. The work of SC is supported by a PhD grant from the same institution (ref. SFRH/BD/22863/2005)
- âŠ