Search CORE

999 research outputs found

A graph-search framework for associating gene identifiers with documents

Author: A Yeh
AM Cohen
AM Cohen
AM Cohen
C Zhai
Consortium TGO
D Hanisch
E Hatcher
E Minkov
E Minkov
Einat Minkov
F Sha
J Crim
K Franzén
K Fundel
K Humphreys
L Hirschman
L Hirschman
M Collins
M Craven
R Bunescu
RI Kondor
T Rindflesch
U Leser
William W Cohen
WW Cohen
WW Cohen
WW Cohen
Y Altun
Y Freund
Z Kou
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: One step in the model organism database curation process is to find, for each article, the identifier of every gene discussed in the article. We consider a relaxation of this problem suitable for semi-automated systems, in which each article is associated with a ranked list of possible gene identifiers, and experimentally compare methods for solving this geneId ranking problem. In addition to baseline approaches based on combining named entity recognition (NER) systems with a "soft dictionary" of gene synonyms, we evaluate a graph-based method which combines the outputs of multiple NER systems, as well as other sources of information, and a learning method for reranking the output of the graph-based method. RESULTS: We show that named entity recognition (NER) systems with similar F-measure performance can have significantly different performance when used with a soft dictionary for geneId-ranking. The graph-based approach can outperform any of its component NER systems, even without learning, and learning can further improve the performance of the graph-based ranking approach. CONCLUSION: The utility of a named entity recognition (NER) system for geneId-finding may not be accurately predicted by its entity-level F1 performance, the most common performance measure. GeneId-ranking systems are best implemented by combining several NER systems. With appropriate combination methods, usefully accurate geneId-ranking systems can be constructed based on easily-available resources, without resorting to problem-specific, engineered components

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Species-level functional profiling of metagenomes and metatranscriptomes.

Author: A Sczyrba
A Shafquat
AE Duran-Pinedo
AK Sharma
B Buchfink
B Langmead
BE Suzek
BK Swan
C Burke
C Luo
Curtis Huttenhower
D Medini
DH Huson
DT Truong
DT Truong
E Pasolli
EA Franzosa
EA Franzosa
Eric A. Franzosa
George Weingart
GG Silva
Gholamali Rahnavard
H Hauswedell
J Kim
J Lloyd-Price
J Lloyd-Price
J Ravel
J. Gregory Caporaso
JA Fuhrman
K Huang
Karen Schwarzberg Lipson
Lauren J. McIver
LR Thompson
LR Thompson
Luke R. Thompson
M Hamady
M Kanehisa
M Scholz
Melanie Schirmer
MY Galperin
N Segata
N Segata
Nicola Segata
OU Mason
P Petrenko
PJ Turnbaugh
R Caspi
RC Edgar
RD Finn
Rob Knight
S Abubucker
S Nayfach
S Sunagawa
S Sunagawa
T Bose
UniProt Consortium.
W Huang
Y Ye
Y Zhao
Publication venue: eScholarship, University of California
Publication date: 01/11/2018
Field of study

Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types

Crossref

eScholarship - University of California

NeuroRDF: semantic integration of highly curated data to prioritize biomarker candidates in Alzheimer's disease

Author
Publication venue: BioMed Central
Publication date: 08/07/2016
Field of study

Springer - Publisher Connector

Modeling Complex Networks For (Electronic) Commerce

Author: Provost Foster
Sundararajan Arun
Publication venue
Publication date: 01/01/2007
Field of study

NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc

Crossref

New York University Faculty Digital Archive

Knowledge Integration and Representation for Biomedical Analysis

Author: Alachram Halima
Publication venue
Publication date: 04/02/2021
Field of study

Georg-August-University Göttingen

Improving protein function prediction methods with integrated literature data

Author: A Karimpour-Fard
A Vazquez
A Vinayagam
Aaron P Gabow
AK Ramani
B Schwikowski
BTF Alako
C Brun
C von Mering
Debra S Goldberg
E Nabieva
HW Mewes
I Xenarios
J Rual
K Tsuda
L Hunter
L Hunter
L Tanabe
Lawrence E Hunter
M Ashburner
M Aubry
M Chagoyen
M Huynen
M Krallinger
M Krallinger
M Pelligri
M Yetisgen-Yildiz
OG Troyanskaya
P Srinivasan
PM Bowers
R Cilibrasi
R Hoffmann
S Letovsky
S Raychaudhuri
Sonia M Leach
T Schlitt
T Tanabe
TK Jenssen
U Karaoz
William A Baumgartner
Y Ofran
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Identifying a protein's function contributes to an understanding of its role in the involved pathways, its suitability as a drug target, and its potential for protein modifications. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. We systematically consider the use of literature co-occurrence data, introduce a new method for quantifying the reliability of co-occurrence and test how performance differs across species. We also quantify changes in performance as the prediction algorithms annotate with increased specificity. Results We find that including information on the co-occurrence of proteins within an abstract greatly boosts performance in the Functional Flow graph-theoretic function prediction algorithm in yeast, fly and worm. This increase in performance is not simply due to the presence of additional edges since supplementing protein-protein interactions with co-occurrence data outperforms supplementing with a comparably-sized genetic interaction dataset. Through the combination of protein-protein interactions and co-occurrence data, the neighborhood around unknown proteins is quickly connected to well-characterized nodes which global prediction algorithms can exploit. Our method for quantifying co-occurrence reliability shows superior performance to the other methods, particularly at threshold values around 10% which yield the best trade off between coverage and accuracy. In contrast, the traditional way of asserting co-occurrence when at least one abstract mentions both proteins proves to be the worst method for generating co-occurrence data, introducing too many false positives. Annotating the functions with greater specificity is harder, but co-occurrence data still proves beneficial. Conclusion Co-occurrence data is a valuable supplemental source for graph-theoretic function prediction algorithms. A rapidly growing literature corpus ensures that co-occurrence data is a readily-available resource for nearly every studied organism, particularly those with small protein interaction databases. Though arguably biased toward known genes, co-occurrence data provides critical additional links to well-studied regions in the interaction network that graph-theoretic function prediction algorithms can exploit.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central