Search CORE

1,790 research outputs found

NCBO Ontology Recommender 2.0: An Enhanced Approach for Biomedical Ontology Recommendation

Author: Graybeal John
Jonquet Clement
Martinez-Romero Marcos
Musen Mark A.
O'Connor Martin J.
Pazos Alejandro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Biomedical researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability. However, the number, variety and complexity of current biomedical ontologies make it cumbersome for researchers to determine which ones to reuse for their specific needs. To overcome this problem, in 2010 the National Center for Biomedical Ontology (NCBO) released the Ontology Recommender, which is a service that receives a biomedical text corpus or a list of keywords and suggests ontologies appropriate for referencing the indicated terms. We developed a new version of the NCBO Ontology Recommender. Called Ontology Recommender 2.0, it uses a new recommendation approach that evaluates the relevance of an ontology to biomedical text data according to four criteria: (1) the extent to which the ontology covers the input data; (2) the acceptance of the ontology in the biomedical community; (3) the level of detail of the ontology classes that cover the input data; and (4) the specialization of the ontology to the domain of the input data. Our evaluation shows that the enhanced recommender provides higher quality suggestions than the original approach, providing better coverage of the input data, more detailed information about their concepts, increased specialization for the domain of the input data, and greater acceptance and use in the community. In addition, it provides users with more explanatory information, along with suggestions of not only individual ontologies but also groups of ontologies. It also can be customized to fit the needs of different scenarios. Ontology Recommender 2.0 combines the strengths of its predecessor with a range of adjustments and new features that improve its reliability and usefulness. Ontology Recommender 2.0 recommends over 500 biomedical ontologies from the NCBO BioPortal platform, where it is openly available.Comment: 29 pages, 8 figures, 11 table

arXiv.org e-Print Archive

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

ZENODO

Directory of Open Access Journals

FigShare

Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation

Author: Chan Juancarlos
Jaffery Joshua
Müller Hans-Michael
Sternberg Paul W.
Van Auken Kimberly
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background: Manual curation of experimental data from the biomedical literature is an expensive and time-consuming endeavor. Nevertheless, most biological knowledge bases still rely heavily on manual curation for data extraction and entry. Text mining software that can semi- or fully automate information retrieval from the literature would thus provide a significant boost to manual curation efforts. Results: We employ the Textpresso category-based information retrieval and extraction system http://www.textpresso.org webcite, developed by WormBase to explore how Textpresso might improve the efficiency with which we manually curate C. elegans proteins to the Gene Ontology's Cellular Component Ontology. Using a training set of sentences that describe results of localization experiments in the published literature, we generated three new curation task-specific categories (Cellular Components, Assay Terms, and Verbs) containing words and phrases associated with reports of experimentally determined subcellular localization. We compared the results of manual curation to that of Textpresso queries that searched the full text of articles for sentences containing terms from each of the three new categories plus the name of a previously uncurated C. elegans protein, and found that Textpresso searches identified curatable papers with recall and precision rates of 79.1% and 61.8%, respectively (F-score of 69.5%), when compared to manual curation. Within those documents, Textpresso identified relevant sentences with recall and precision rates of 30.3% and 80.1% (F-score of 44.0%). From returned sentences, curators were able to make 66.2% of all possible experimentally supported GO Cellular Component annotations with 97.3% precision (F-score of 78.8%). Measuring the relative efficiencies of Textpresso-based versus manual curation we find that Textpresso has the potential to increase curation efficiency by at least 8-fold, and perhaps as much as 15-fold, given differences in individual curatorial speed. Conclusion: Textpresso is an effective tool for improving the efficiency of manual, experimentally based curation. Incorporating a Textpresso-based Cellular Component curation pipeline at WormBase has allowed us to transition from strictly manual curation of this data type to a more efficient pipeline of computer-assisted validation. Continued development of curation task-specific Textpresso categories will provide an invaluable resource for genomics databases that rely heavily on manual curation

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Caltech Authors

Annotation analysis for testing drug safety signals using unstructured clinical notes

Author: A Bate
C Friedman
D Classen
D Dore
D Graham
DW Bates
G Alterovitz
GK Savova
H Cao
KD Shetty
L Ohno-Machado
L Tari
MJ Goldacre
N Tatonetti
NF Noy
NH Shah
NH Shah
O Bodenreider
P Khatri
P LePendu
P LePendu
P Stang
PM Coloma
PM Nadkarni
R Harpaz
R Harpaz
R Harpaz
RP Radecki
S Paumier
S Schneeweiss
S Schneeweiss
S Weiss-Smith
SJ Reisinger
W Chapman
WW Chapman
WW Chapman
WW Chapman
X Wang
Y Liu
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

BackgroundThe electronic surveillance for adverse drug events is largely based upon the analysis of coded data from reporting systems. Yet, the vast majority of electronic health data lies embedded within the free text of clinical notes and is not gathered into centralized repositories. With the increasing access to large volumes of electronic medical data-in particular the clinical notes-it may be possible to computationally encode and to test drug safety signals in an active manner.ResultsWe describe the application of simple annotation tools on clinical text and the mining of the resulting annotations to compute the risk of getting a myocardial infarction for patients with rheumatoid arthritis that take Vioxx. Our analysis clearly reveals elevated risks for myocardial infarction in rheumatoid arthritis patients taking Vioxx (odds ratio 2.06) before 2005.ConclusionsOur results show that it is possible to apply annotation analysis methods for testing hypotheses about drug safety using electronic medical records

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

DIAL UCLouvain

Analysis of the human diseasome reveals phenotype modules across common, genetic, and infectious diseases

Author: Gkoutos Georgios V
Hoehndorf Robert
Schofield Paul N
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/11/2014
Field of study

Phenotypes are the observable characteristics of an organism arising from its response to the environment. Phenotypes associated with engineered and natural genetic variation are widely recorded using phenotype ontologies in model organisms, as are signs and symptoms of human Mendelian diseases in databases such as OMIM and Orphanet. Exploiting these resources, several computational methods have been developed for integration and analysis of phenotype data to identify the genetic etiology of diseases or suggest plausible interventions. A similar resource would be highly useful not only for rare and Mendelian diseases, but also for common, complex and infectious diseases. We apply a semantic text- mining approach to identify the phenotypes (signs and symptoms) associated with over 8,000 diseases. We demonstrate that our method generates phenotypes that correctly identify known disease-associated genes in mice and humans with high accuracy. Using a phenotypic similarity measure, we generate a human disease network in which diseases that share signs and symptoms cluster together, and we use this network to identify phenotypic disease modules

arXiv.org e-Print Archive

University of Birmingham Research Portal

PubMed Central

Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research

Author: Barbara J Ruef
Christopher J Mungall
Damian Smedley
George Gkoutos
Monte Westerfield
Nicole Washington
Paul Schofield
Peter N Robinson
Sandra C Doelken
Sebastian Bauer
Sebastian Köhler
Suzanna E Lewis
Publication venue: 'Faculty Opinions Ltd'
Publication date: 01/01/2013
Field of study

Crossref

Aberystwyth Research Portal