1,790 research outputs found
NCBO Ontology Recommender 2.0: An Enhanced Approach for Biomedical Ontology Recommendation
Biomedical researchers use ontologies to annotate their data with ontology
terms, enabling better data integration and interoperability. However, the
number, variety and complexity of current biomedical ontologies make it
cumbersome for researchers to determine which ones to reuse for their specific
needs. To overcome this problem, in 2010 the National Center for Biomedical
Ontology (NCBO) released the Ontology Recommender, which is a service that
receives a biomedical text corpus or a list of keywords and suggests ontologies
appropriate for referencing the indicated terms. We developed a new version of
the NCBO Ontology Recommender. Called Ontology Recommender 2.0, it uses a new
recommendation approach that evaluates the relevance of an ontology to
biomedical text data according to four criteria: (1) the extent to which the
ontology covers the input data; (2) the acceptance of the ontology in the
biomedical community; (3) the level of detail of the ontology classes that
cover the input data; and (4) the specialization of the ontology to the domain
of the input data. Our evaluation shows that the enhanced recommender provides
higher quality suggestions than the original approach, providing better
coverage of the input data, more detailed information about their concepts,
increased specialization for the domain of the input data, and greater
acceptance and use in the community. In addition, it provides users with more
explanatory information, along with suggestions of not only individual
ontologies but also groups of ontologies. It also can be customized to fit the
needs of different scenarios. Ontology Recommender 2.0 combines the strengths
of its predecessor with a range of adjustments and new features that improve
its reliability and usefulness. Ontology Recommender 2.0 recommends over 500
biomedical ontologies from the NCBO BioPortal platform, where it is openly
available.Comment: 29 pages, 8 figures, 11 table
Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation
Background: Manual curation of experimental data from the biomedical literature is an expensive and time-consuming endeavor. Nevertheless, most biological knowledge bases still rely heavily on manual curation for data extraction and entry. Text mining software that can semi- or fully automate information retrieval from the literature would thus provide a significant boost to manual curation efforts.
Results: We employ the Textpresso category-based information retrieval and extraction system http://www.textpresso.org webcite, developed by WormBase to explore how Textpresso might improve the efficiency with which we manually curate C. elegans proteins to the Gene Ontology's Cellular Component Ontology. Using a training set of sentences that describe results of localization experiments in the published literature, we generated three new curation task-specific categories (Cellular Components, Assay Terms, and Verbs) containing words and phrases associated with reports of experimentally determined subcellular localization. We compared the results of manual curation to that of Textpresso queries that searched the full text of articles for sentences containing terms from each of the three new categories plus the name of a previously uncurated C. elegans protein, and found that Textpresso searches identified curatable papers with recall and precision rates of 79.1% and 61.8%, respectively (F-score of 69.5%), when compared to manual curation. Within those documents, Textpresso identified relevant sentences with recall and precision rates of 30.3% and 80.1% (F-score of 44.0%). From returned sentences, curators were able to make 66.2% of all possible experimentally supported GO Cellular Component annotations with 97.3% precision (F-score of 78.8%). Measuring the relative efficiencies of Textpresso-based versus manual curation we find that Textpresso has the potential to increase curation efficiency by at least 8-fold, and perhaps as much as 15-fold, given differences in individual curatorial speed.
Conclusion: Textpresso is an effective tool for improving the efficiency of manual, experimentally based curation. Incorporating a Textpresso-based Cellular Component curation pipeline at WormBase has allowed us to transition from strictly manual curation of this data type to a more efficient pipeline of computer-assisted validation. Continued development of curation task-specific Textpresso categories will provide an invaluable resource for genomics databases that rely heavily on manual curation
Annotation analysis for testing drug safety signals using unstructured clinical notes
BackgroundThe electronic surveillance for adverse drug events is largely based upon the analysis of coded data from reporting systems. Yet, the vast majority of electronic health data lies embedded within the free text of clinical notes and is not gathered into centralized repositories. With the increasing access to large volumes of electronic medical data-in particular the clinical notes-it may be possible to computationally encode and to test drug safety signals in an active manner.ResultsWe describe the application of simple annotation tools on clinical text and the mining of the resulting annotations to compute the risk of getting a myocardial infarction for patients with rheumatoid arthritis that take Vioxx. Our analysis clearly reveals elevated risks for myocardial infarction in rheumatoid arthritis patients taking Vioxx (odds ratio 2.06) before 2005.ConclusionsOur results show that it is possible to apply annotation analysis methods for testing hypotheses about drug safety using electronic medical records
Analysis of the human diseasome reveals phenotype modules across common, genetic, and infectious diseases
Phenotypes are the observable characteristics of an organism arising from its
response to the environment. Phenotypes associated with engineered and natural
genetic variation are widely recorded using phenotype ontologies in model
organisms, as are signs and symptoms of human Mendelian diseases in databases
such as OMIM and Orphanet. Exploiting these resources, several computational
methods have been developed for integration and analysis of phenotype data to
identify the genetic etiology of diseases or suggest plausible interventions. A
similar resource would be highly useful not only for rare and Mendelian
diseases, but also for common, complex and infectious diseases. We apply a
semantic text- mining approach to identify the phenotypes (signs and symptoms)
associated with over 8,000 diseases. We demonstrate that our method generates
phenotypes that correctly identify known disease-associated genes in mice and
humans with high accuracy. Using a phenotypic similarity measure, we generate a
human disease network in which diseases that share signs and symptoms cluster
together, and we use this network to identify phenotypic disease modules
- âŠ