Search CORE

20 research outputs found

Ontology-based Assisted Curation of Biomedical Data

Author: Conrad Plake
Publication venue
Publication date: 22/04/2009
Field of study

Manual curation of biomedical data is highly accurate but time consuming, and does not scale with the ever increasing growth of biomedical literature. Text mining as a high-throughput computational technique scales well but requires human expertise to produce highly accurate results. Ontologies can help organizing large quantities of unstructured information. Here we present three systems, namely GoGene, GoPubMed and GoWeb, employing biomedical ontologies and show how they can assist manual curation of biomedical data.

GoGene associates all genes from different model organisms to concepts of the Gene Ontology (GO) and the Medical Subject Headings (MeSH). The hierarchical structures of both terminologies support clustering and summarizing long lists of genes. Through the integration of known gene annotations from UniProt and EntrezGene with text-mined annotations from all abstracts in PubMed, GoGene currently contains up to 4,000,000 associations between genes and concepts from GO and MeSH for ten model organisms. The quality of all associations can be verified by following the links to their origin, that is, literature or database entries.

GoPubMed aims at reducing the limitations of classical keyword search. It handles inconsistent vocabulary such as synonyms and specialized terminology. It shows the most relevant concepts in GO and MeSH for a search and thus reveals information which otherwise remains buried in the masses of text. This feature as well as the entire bibliography of all authors in PubMed facilitate comprehensive literature search. GoWeb translates these ideas to the World Wide Web and is thus not only limited to PubMed abstracts. GoWeb uses a standard web-search service and organizes search results based on GO, MeSH, and other concepts such as companies and institutions

Crossref

Nature Precedings

Improved mutation tagging with gene identifiers applied to membrane protein stability prediction

Author: Conrad Plake
Michael Schroeder
Rainer Winnenburg
Publication venue: Springer Nature
Publication date: 01/01/2009
Field of study

Background The automated retrieval and integration of information about protein point mutations in combination with structure, domain and interaction data from literature and databases promises to be a valuable approach to study structure-function relationships in biomedical data sets. Results We developed a rule- and regular expression-based protein point mutation retrieval pipeline for PubMed abstracts, which shows an F-measure of 87% for the mutation retrieval task on a benchmark dataset. In order to link mutations to their proteins, we utilize a named entity recognition algorithm for the identification of gene names co-occurring in the abstract, and establish links based on sequence checks. Vice versa, we could show that gene recognition improved from 77% to 91% F-measure when considering mutation information given in the text. To demonstrate practical relevance, we utilize mutation information from text to evaluate a novel solvation energy based model for the prediction of stabilizing regions in membrane proteins. For five G protein-coupled receptors we identified 35 relevant single mutations and associated phenotypes, of which none had been annotated in the UniProt or PDB database. In 71% reported phenotypes were in compliance with the model predictions, supporting a relation between mutations and stability issues in membrane proteins. Conclusion We present a reliable approach for the retrieval of protein mutations from PubMed abstracts for any set of genes or proteins of interest. We further demonstrate how amino acid substitution information from text can be utilized for protein structure stability studies on the basis of a novel energy model

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Crossref

Springer - Publisher Connector

PubMed Central

Technische Universität Dresden: Qucosa

Gene mention normalization and interaction extraction with context models and sentence motifs

Author: Hakenberg Jörg
Leser Ulf
Plake Conrad
Royer Loic
Schroeder Michael
Strobelt Hendrik
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Systematic feature evaluation for gene name recognition

Author: Bickel Steffen
Brefeld Ulf
Faulstich Lukas
Hakenberg Jörg
Leser Ulf
Plake Conrad
Scheffer Tobias
Zahn Hagen
Publication venue: BioMed Central
Publication date: 24/05/2005
Field of study

In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features

TUbiblio

Springer - Publisher Connector

PubMed Central

GoPubMed: Exploring Pubmed with Ontological Background Knowledge

Author: Alexopoulou Dimitra
Alvers Michael R.
Barrio-Alvers Bill
Dietze Heiko
Doms Andreas
Plake Conrad
Reischuck Andreas
Royer Loic
Schroeder Michael
Zschunke Matthias
Publication venue: Dagstuhl Seminar Proceedings. 08131 - Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives
Publication date: 01/01/2008
Field of study

With the ever increasing size of scientific literature, finding relevant documents and answering questions has become even more of a challenge. Recently, ontologies - hierarchical, controlled vocabularies - have been introduced to annotate genomic data. They can also improve the question answering and the selection of relevant documents in the literature search. Search engines such as GoPubMed.org use ontological background knowledge to give an overview over large query results and to help answering questions. We review the problems and solutions underlying these next generation intelligent search engines and give examples of the power of this new search paradigm

Dagstuhl Research Online Publication Server

Ontology-based Assisted Curation of Biomedical Data

Author: Conrad Plake
Conrad Plake
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

Improved mutation tagging with gene identifiers applied to membrane protein stability prediction

Author: Plake Conrad
Schröder Michael
Winnenburg Rainer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/10/2015
Field of study

Qucosa

Technische Universität Dresden: Qucosa

Learning Patterns for Information Extraction from Free Text

Author: Conrad Plake
Jörg Hakenberg
Ulf Leser
Publication venue
Publication date
Field of study

We describe a general approach to the task of information extraction from free text and propose methods for learning syntax patterns automatically from annotated corpora. We study the application of our approach to the extraction of protein-protein interactions from scientific texts. Based on this evaluation, we find that learning patterns outperforms techniques based on handcrafted patterns.

CiteSeerX