Search CORE

186 research outputs found

PaperMaker: validation of biomedical scientific publications

Author: D. Rebholz-Schuhmann
Leitner
P. Pezik
Rebholz-Schuhmann
S. Kavaliauskas
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: The automatic analysis of scientific literature can support authors in writing their manuscripts

Crossref

PubMed Central

MedEvi: Retrieving textual evidence of relations between biomedical concepts from Medline

Author: D. Rebholz-Schuhmann
Hoffmann
J.-j. Kim
P. Pezik
Rebholz-Schuhmann
Publication venue: Oxford University Press
Publication date
Field of study

Summary: Search engines running on MEDLINE abstracts have been widely used by biologists to find publications that are related to their research. The existing search engines such as PubMed, however, have limitations when applied for the task of seeking textual evidence of relations between given concepts. The limitations are mainly due to the problem that the search engines do not effectively deal with multi-term queries which may imply semantic relations between the terms. To address this problem, we present MedEvi, a novel search engine that imposes positional restriction on occurrences matching multi-term queries, based on the observation that terms with semantic relations which are explicitly stated in text are not found too far from each other. MedEvi further identifies additional keywords of biological and statistical significance from local context of matching occurrences in order to help users reformulate their queries for better results

Crossref

PubMed Central

Combining Evidence, Specificity, and Proximity towards the Normalization of Gene Ontology Terms in Text

Author: Gaudan S
Jimeno Yepes A
Lee V
Rebholz-Schuhmann D
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Structured information provided by manual annotation of proteins with Gene Ontology concepts represents a high-quality reliable data source for the research community. However, a limited scope of proteins is annotated due to the amount of human resources required to fully annotate each individual gene product from the literature. We introduce a novel method for automatic identification of GO terms in natural language text. The method takes into consideration several features: (1) the evidence for a GO term given by the words occurring in text, (2) the proximity between the words, and (3) the specificity of the GO terms based on their information content. The method has been evaluated on the BioCreAtIvE corpus and has been compared to current state of the art methods. The precision reached 0.34 at a recall of 0.34 for the identified terms at rank 1. In our analysis, we observe that the identification of GO terms in the Ã¢Â€Âœcellular componentÃ¢Â€Â subbranch of GO is more accurate than for terms from the other two subbranches. This observation is explained by the average number of words forming the terminology over the different subbranches

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Towards mature use of semantic resources for biomedical analyses

Author: Collier N
Hahn U
Pyysalo S
Rebholz-Schuhmann D
Rinaldi F
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ZORA

Towards automated metabolome assembly: application of text mining to correlate small molecules, targets and tissues

Author: C Steinbeck
D Rebholz-Schuhmann
D Wishart
D Wishart
KV Jayaseelan
P Moreno
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

First steps in the logic-based assessment of post-composed phenotypic descriptions

Author: Berlanga R.
Grau B. C.
Jimenez-Ruiz E.
Rebholz-Schuhmann D.
Publication venue
Publication date
Field of study

In this paper we present a preliminary logic-based evaluation of the integration of post-composed phenotypic descriptions with domain ontologies. The evaluation has been performed using a description logic reasoner together with scalable techniques: ontology modularization and approximations of the logical difference between ontologies

City Research Online

Integrating protein-protein interactions and text mining for protein function prediction

Author: A Yuryev
B Boeckmann
B Titz
D Rebholz-Schuhmann
D Rebholz-Schuhmann
D Rebholz-Schuhmann
Dietrich Rebholz-Schuhmann
FM Couto
FM Couto
FM Couto
G Pandey
GD Bader
GO Consortium
H Liu
H Shatkay
L Hirschman
LR Matthews
M Ashburner
MA Huynen
MC Roland
P Pagel
P Ruch
R Malik
R Sharan
R Sharan
S Gaudan
S Jaeger
S Peri
Samira Jaeger
SM Baxter
Sylvain Gaudan
Ulf Leser
V Spirin
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Functional annotation of proteins remains a challenging task. Currently the scientific literature serves as the main source for yet uncurated functional annotations, but curation work is slow and expensive. Automatic techniques that support this work are still lacking reliability. We developed a method to identify conserved protein interaction graphs and to predict missing protein functions from orthologs in these graphs. To enhance the precision of the results, we furthermore implemented a procedure that validates all predictions based on findings reported in the literature. Results Using this procedure, more than 80% of the GO annotations for proteins with highly conserved orthologs that are available in UniProtKb/Swiss-Prot could be verified automatically. For a subset of proteins we predicted new GO annotations that were not available in UniProtKb/Swiss-Prot. All predictions were correct (100% precision) according to the verifications from a trained curator. Conclusion Our method of integrating CCSs and literature mining is thus a highly reliable approach to predict GO annotations for weakly characterized proteins with orthologs.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

Ontology Clustering with OWL2Vec*

Author: Castro L. J.
Chen J.
Jimenez-Ruiz E.
Rebholz-Schuhmann D.
Ritchie A.
Publication venue: CEUR Workshop Proceedings
Publication date: 28/07/2021
Field of study

In this work we present an exploratory study to apply OWL2Vec* to drive the clustering of ontology entities (i.e., ontology clustering). OWL2Vec* is a state-of-the-art system that creates embeddings, capturing the semantics of both entities and tokens that appear in the ontology

City Research Online

Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb

Author: A Stark
Antonio Jimeno-Yepes
BJ Polacco
BJ Stapley
C Blaschke
C Blaschke
C Friedman
CH Wu
CJO Baker
CJO Baker
D Bourigault
D Rebholz-Schuhmann
D Rebholz-Schuhmann
Dietrich Rebholz-Schuhmann
DL Wheeler
DM Kristensen
EM Marcotte
F Cerbah
F Guenthner
F Horn
G Leroy
JA Barker
JC Nebel
Kevin Nagel
LC Lee
M Ikeda
MM Babu
P Pezik
R Kanagasabai
R Witte
S Gaudan
S Yoon
TJ Oldfield
Y Miyao
Y Tateisi
Y Tsuruoka
YL Yip
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background A protein annotation database, such as the Universal Protein Resource knowledge base (UniProtKb), is a valuable resource for the validation and interpretation of predicted 3D structure patterns in proteins. Existing studies have focussed on point mutation extraction methods from biomedical literature which can be used to support the time consuming work of manual database curation. However, these methods were limited to point mutation extraction and do not extract features for the annotation of proteins at the residue level. Results This work introduces a system that identifies protein residues in MEDLINE abstracts and annotates them with features extracted from the context written in the surrounding text. MEDLINE abstract texts have been processed to identify protein mentions in combination with taxonomic species and protein residues (F1-measure 0.52). The identified protein-species-residue triplets have been validated and benchmarked against reference data resources (UniProtKb, average F1-measure of 0.54). Then, contextual features were extracted through shallow and deep parsing and the features have been classified into predefined categories (F1-measure ranges from 0.15 to 0.67). Furthermore, the feature sets have been aligned with annotation types in UniProtKb to assess the relevance of the annotations for ongoing curation projects. Altogether, the annotations have been assessed automatically and manually against reference data resources. Conclusion This work proposes a solution for the automatic extraction of functional annotation for protein residues from biomedical articles. The presented approach is an extension to other existing systems in that a wider range of residue entities are considered and that features of residues are extracted as annotations.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

SciMiner: web-based literature mining tool for target identification and functional enrichment analysis

Author: Adam D. Schuyler
Cheng
David J. States
Eva L. Feldman
Fisher
Gao
Hanisch
Junguk Hur
Morgan
Plake
Rebholz-Schuhmann
Shannon
Publication venue: Oxford University Press
Publication date
Field of study

Summary:SciMiner is a web-based literature mining and functional analysis tool that identifies genes and proteins using a context specific analysis of MEDLINE abstracts and full texts. SciMiner accepts a free text query (PubMed Entrez search) or a list of PubMed identifiers as input. SciMiner uses both regular expression patterns and dictionaries of gene symbols and names compiled from multiple sources. Ambiguous acronyms are resolved by a scoring scheme based on the co-occurrence of acronyms and corresponding description terms, which incorporates optional user-defined filters. Functional enrichment analyses are used to identify highly relevant targets (genes and proteins), GO (Gene Ontology) terms, MeSH (Medical Subject Headings) terms, pathways and protein–protein interaction networks by comparing identified targets from one search result with those from other searches or to the full HGNC [HUGO (Human Genome Organization) Gene Nomenclature Committee] gene set. The performance of gene/protein name identification was evaluated using the BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) version 2 (Year 2006) Gene Normalization Task as a gold standard. SciMiner achieved 87.1% recall, 71.3% precision and 75.8% F-measure. SciMiner's literature mining performance coupled with functional enrichment analyses provides an efficient platform for retrieval and summary of rich biological information from corpora of users' interests

Crossref

PubMed Central