Search CORE

Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy

Author: Alexopoulou Dimitra
Andreopoulos Bill
Dietze Heiko
Doms Andreas
Gandon Fabien
Hakenberg Jörg
Khelif Khaled
Schroeder Michael
Wächter Thomas
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Ontology term labels can be ambiguous and have multiple senses. While this is no problem for human annotators, it is a challenge to automated methods, which identify ontology terms in text. Classical approaches to word sense disambiguation use co-occurring words or terms. However, most treat ontologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms. Another useful source of information for disambiguation are metadata. Here, we systematically compare three approaches to word sense disambiguation, which use ontologies and metadata, respectively. Results The 'Closest Sense' method assumes that the ontology defines multiple senses of the term. It computes the shortest path of co-occurring terms in the document to one of these senses. The 'Term Cooc' method defines a log-odds ratio for co-occurring terms including co-occurrences inferred from the ontology structure. The 'MetaData' approach trains a classifier on metadata. It does not require any ontology, but requires training data, which the other methods do not. To evaluate these approaches we defined a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The 'MetaData' approach performed best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The 'Term Cooc' approach performs better on Gene Ontology (92% success) than on MeSH (73% success) as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The 'Closest Sense' approach achieves on average 80% success rate. Conclusion Metadata is valuable for disambiguation, but requires high quality training data. Closest Sense requires no training, but a large, consistently modelled ontology, which are two opposing conditions. Term Cooc achieves greater 90% success given a consistently modelled ontology. Overall, the results show that well structured ontologies can play a very important role to improve disambiguation. Availability The three benchmark datasets created for the purpose of disambiguation are available in Additional file <supplr sid="S1">1</supplr>. <suppl id="S1"> <title> Additional file 1 </title> <text> Benchmark datasets used in the experiments. The three corpora (High quality/Low quantity corpus; Medium quality/Medium quantity corpus; Low quality/High quantity corpus) are given in the form of PubMed identifiers (PMID) for True/False cases for the 7 ambiguous terms examined (GO/MeSH/UMLS identifiers are also given). </text> <file name="1471-2105-10-28-S1.txt"> Click here for file </file> </suppl

Directory of Open Access Journals

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

SJSU ScholarWorks

HAL-Rennes 1

GoPubMed: Exploring Pubmed with Ontological Background Knowledge

Author: Alexopoulou Dimitra
Alvers Michael R.
Barrio-Alvers Bill
Dietze Heiko
Doms Andreas
Plake Conrad
Reischuck Andreas
Royer Loic
Schroeder Michael
Zschunke Matthias
Publication venue: Dagstuhl Seminar Proceedings. 08131 - Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives
Publication date: 01/01/2008
Field of study

With the ever increasing size of scientific literature, finding relevant documents and answering questions has become even more of a challenge. Recently, ontologies - hierarchical, controlled vocabularies - have been introduced to annotate genomic data. They can also improve the question answering and the selection of relevant documents in the literature search. Search engines such as GoPubMed.org use ontological background knowledge to give an overview over large query results and to help answering questions. We review the problems and solutions underlying these next generation intelligent search engines and give examples of the power of this new search paradigm

Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology.

Author: Adams Nico
Bada Mike
Batchelor Colin
Berardini Tanya Z
de Matos Paula
Dietze Heiko
Drabkin Harold J
Ennis Marcus
Foulger Rebecca E
Harris Midori A
Hastings Janna
Hill David P
Kale Namrata S
Lomax Jane
Mungall Christopher J
Owen Gareth
Roncaglia Paola
Steinbeck Christoph
Turner Steve
Publication venue: BMC Genomics
Publication date: 01/01/2013
Field of study

BACKGROUND: The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI. RESULTS: We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI. CONCLUSIONS: The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl

The Jackson Laboratory: The Mouseion at the JAXlibrary

Apollo (Cambridge)

A method for increasing expressivity of Gene Ontology annotations using a compositional approach.

Author: Alam-Faruque Yasmin
Blake Judith A
Carbon Seth
Dietze Heiko
Dimmer Emily C
Foulger Rebecca E
Harris Midori A
Hill David P
Huntley Rachael P
Khodiyar Varsha K
Lock Antonia
Lomax Jane
Lovering Ruth C
Mungall Christopher J
Mutowo-Meullenet Prudence
Sawford Tony
Van Auken Kimberly
Wood Valerie
Publication venue: BMC Bioinformatics
Publication date: 21/05/2014
Field of study

BACKGROUND: The Gene Ontology project integrates data about the function of gene products across a diverse range of organisms, allowing the transfer of knowledge from model organisms to humans, and enabling computational analyses for interpretation of high-throughput experimental and clinical data. The core data structure is the annotation, an association between a gene product and a term from one of the three ontologies comprising the GO. Historically, it has not been possible to provide additional information about the context of a GO term, such as the target gene or the location of a molecular function. This has limited the specificity of knowledge that can be expressed by GO annotations. RESULTS: The GO Consortium has introduced annotation extensions that enable manually curated GO annotations to capture additional contextual details. Extensions represent effector-target relationships such as localization dependencies, substrates of protein modifiers and regulation targets of signaling pathways and transcription factors as well as spatial and temporal aspects of processes such as cell or tissue type or developmental stage. We describe the content and structure of annotation extensions, provide examples, and summarize the current usage of annotation extensions. CONCLUSIONS: The additional contextual information captured by annotation extensions improves the utility of functional annotation by representing dependencies between annotations to terms in the different ontologies of GO, external ontologies, or an organism's gene products. These enhanced annotations can also support sophisticated queries and reasoning, and will provide curated, directional links between many gene products to support pathway and network reconstruction

The Jackson Laboratory: The Mouseion at the JAXlibrary

GoWeb: a semantic search engine for the life science web

Author: A Doms
A Grigoris
A Harth
A Miles
B Adida
B Green
B Katz
B Twisselmann
BM Good
C Blaschke
C Perez-Iratxeta
ClearForrest
D Berrueta
D Brickley
D Rebholz-Schuhmann
G Cheng
H Chen
H Tang
Heiko Dietze
HM Müller
J Ely
J Gobeill
J Hakenberg
K Bollacker
L Carr
L Ding
LA Granka
M Ashburner
M d'Aquin
M Kaisser
M Taubert
M Völkel
Michael Schroeder
N Tomuro
R Dieng-Kuntz
R Wentz
S Bechhofer
T Berners-Lee
T Yang
WR Hersh
Z Zheng
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study