Search CORE

236 research outputs found

Species identification for gene name normalization

Author: C Plake
Domonkos Tikk
H Salgado
Illés Solt
J Hakenberg
M Gerner
Ulf Leser
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Cross-species gene normalization by species inference

Author: AA Morgan
B Alex
B Settles
CH Wei
Chih-Hsuan Wei
CN Hsu
D Maglott
F Leitner
HD Carroll
HJ Dai
Hung-Yu Kao
J Hakenberg
J Hakenberg
J Hakenberg
J Wermter
JD Kim
JF Heinz
K Verspoor
L Hirschman
L Hirschman
M Gerner
O Tuason
P Corbett
R Klinger
R Saetre
T Kappeler
X Wang
Y Chen
Z Lu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Development of a Fast miRNA Extraction System for Tumor Analysis Based on a Simple Lab on Chip Approach

Author: Dame G.
Hakenberg S.
Lampe J.
Urban G.
Publication venue: The Authors. Published by Elsevier Ltd.
Publication date: 31/12/2015
Field of study

AbstractMiRNAs are small (20 to 23 nucleotides in length) noncoding RNAs regulating numerous essential cell functions. They operate by targeting messenger RNAs for cleavage or translational repression, influencing cell development and cell differentiation. MiRNAs are identified to play an important role in human cancers. In gene expression studies for tumor diagnostics, an extraction system with high extraction efficiency from low sample amounts is mandatory for any biomarker identification. A fast on chip RNA extraction module formerly used in pathogen detection was modified for the extraction of miRNAs from human cell cultures. This fast method (∼8min) yields purified and amplifiable miRNAs for subsequent expression analysis. Compared to commercial extraction kits, the on chip miRNA extraction system shows 100fold higher extraction efficiencies for cell cultures

Elsevier - Publisher Connector

The GNAT library for local and remote gene mention normalization

Author: C. M. Bergman
C. Plake
G. Gonzalez
G. Nenadic
Gerner
I. Solt
J. Hakenberg
M. Gerner
M. Haeussler
M. Schroeder
Tamames
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Summary: Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the Gnat Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of Gnat achieves a Tap-20 score of 0.1987

CiteSeerX

Crossref

PubMed Central

The University of Manchester - Institutional Repository

MDC Repository

GoGene: gene annotation in the fast lane

Author: Al-Shahrour
C. Plake
Couto
Doms
Gr tzmann
J. Hakenberg
L. Royer
M. Schroeder
R. Winnenburg
Rebholz-Schuhmann
Publication venue: Oxford University Press
Publication date
Field of study

High-throughput screens such as microarrays and RNAi screens produce huge amounts of data. They typically result in hundreds of genes, which are often further explored and clustered via enriched GeneOntology terms. The strength of such analyses is that they build on high-quality manual annotations provided with the GeneOntology. However, the weakness is that annotations are restricted to process, function and location and that they do not cover all known genes in model organisms. GoGene addresses this weakness by complementing high-quality manual annotation with high-throughput text mining extracting co-occurrences of genes and ontology terms from literature. GoGene contains over 4 000 000 associations between genes and gene-related terms for 10 model organisms extracted from more than 18 000 000 PubMed entries. It does not cover only process, function and location of genes, but also biomedical categories such as diseases, compounds, techniques and mutations. By bringing it all together, GoGene provides the most recent and most complete facts about genes and can rank them according to novelty and importance. GoGene accepts keywords, gene lists, gene sequences and protein sequences as input and supports search for genes in PubMed, EntrezGene and via BLAST. Since all associations of genes to terms are supported by evidence in the literature, the results are transparent and can be verified by the user. GoGene is available at http://gopubmed.org/gogene

CiteSeerX

Crossref

PubMed Central

The PPI affix dictionary (PPIAD) and BioMethod Lexicon: importance of affixes and tags for recognition of entity mentions and experimental protein interactions

Author: Alfonso Valencia
Andrew Chatr-aryamontri
Ashish V Tendulkar
Florian Leitner
J Hakenberg
L Smith
M Krallinger
M Narayanaswamy
Martin Krallinger
O Sanchez-Graillet
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

SR4GN: A Species Recognition Software Tool for Gene Normalization

Author: AA Morgan
B D
C-H Wei
C-H Wei
C-H Wei
C-N Hsu
Chih-Hsuan Wei
H Cunningham
HD Carroll
Hung-Yu Kao
J Hakenberg
J Hakenberg
J William A Baumgartner
Jan Aerts
K Bontcheva
L Hirschman
M Gerner
M Krallinger
M Krallinger
N Naderi
T Kappeler
T Mu
X Wang
Y Kano Jr
Z Lu
Zhiyong Lu
Publication venue: Public Library of Science
Publication date: 05/06/2012
Field of study

As suggested in recent studies, species recognition and disambiguation is one of the most critical and challenging steps in many downstream text-mining applications such as the gene normalization task and protein-protein interaction extraction. We report SR4GN: an open source tool for species recognition and disambiguation in biomedical text. In addition to the species detection function in existing tools, SR4GN is optimized for the Gene Normalization task. As such it is developed to link detected species with corresponding gene mentions in a document. SR4GN achieves 85.42% in accuracy and compares favorably to the other state-of-the-art techniques in benchmark experiments. Finally, SR4GN is implemented as a standalone software tool, thus making it convenient and robust for use in many text-mining applications. SR4GN can be downloaded at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/downloads/SR4G

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Identifying the needs of penile cancer sufferers: A systematic review of the quality of life, psychosexual and psychosocial literature in penile cancer

Author: C Schairer
CA D'Ancona
DR Camidge
E Solsona
FR Romero
G Gulino
J Dillner
JA Leijte
Maurice M Lau
OW Hakenberg
P Kind
S Opjordsmoen
Satish B Maddineni
T Windahl
V Ficarra
V Ficarra
Vijay K Sangar
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

The strength of co-authorship in gene name disambiguation

Author: A Morgan
AL Barabasi
AS Yeh
B Schijvenaars
D Hanisch
DR Maglott
G Savova
H Liu
H Xu
H Xu
H Xu
IH Witten
J Hakenberg
JR Quinlan
L Chen
L Hirschman
M Weeber
Richárd Farkas
RM Podowski
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background A biomedical entity mention in articles and other free texts is often ambiguous. For example, 13% of the gene names (aliases) might refer to more than one gene. The task of Gene Symbol Disambiguation (GSD) – a special case of Word Sense Disambiguation (WSD) – is to assign a unique gene identifier for all identified gene name aliases in biology-related articles. Supervised and unsupervised machine learning WSD techniques have been applied in the biomedical field with promising results. We examine here the utilisation potential of the fact – one of the special features of biological articles – that the authors of the documents are known through graph-based semi-supervised methods for the GSD task. Results Our key hypothesis is that a biologist refers to each particular gene by a fixed gene alias and this holds for the co-authors as well. To make use of the co-authorship information we decided to build the inverse co-author graph on MedLine abstracts. The nodes of the inverse co-author graph are articles and there is an edge between two nodes if and only if the two articles have a mutual author. We introduce here two methods using distances (based on the graph) of abstracts for the GSD task. We found that a disambiguation decision can be made in 85% of cases with an extremely high (99.5%) precision rate just by using information obtained from the inverse co-author graph. We incorporated the co-authorship information into two GSD systems in order to attain full coverage and in experiments our procedure achieved precision of 94.3%, 98.85%, 96.05% and 99.63% on the human, mouse, fly and yeast GSD evaluation sets, respectively. Conclusion Based on the promising results obtained so far we suggest that the co-authorship information and the circumstances of the articles' release (like the title of the journal, the year of publication) can be a crucial building block of any sophisticated similarity measure among biological articles and hence the methods introduced here should be useful for other biomedical natural language processing tasks (like organism or target disease detection) as well.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central