Search CORE

93 research outputs found

Integrating phenotype and gene expression data for predicting gene function

Author: A Kahraman
Affymetrix
Andy D Perkins
Brandon M Malone
DW Huang
GC Cawley
J Rodgers
JD Wren
L In-Yee
M Ashburner
M Ashburner
M Steinbach
N Daraselia
NCBI
P Groth
P Groth
Rosetta Biosoftware
Susan M Bridges
U Karaoz
Y Zhao
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Comparative analysis of five protein-protein interaction corpora

Author: A Rzhetsky
Antti Airola
C Blaschke
C Nédellec
D Klein
DJ Best
Filip Ginter
HL Johnson
J Ding
J Kim
Jari Björne
JD Wren
Juho Heimonen
K Fundel
KB Cohen
L Smith
M Light
N Daraselia
R Bunescu
R Ihaka
S Pyysalo
Sampo Pyysalo
Tapio Salakoski
WJ Wilbur
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Growing interest in the application of natural language processing methods to biomedical text has led to an increasing number of corpora and methods targeting protein-protein interaction (PPI) extraction. However, there is no general consensus regarding PPI annotation and consequently resources are largely incompatible and methods are difficult to evaluate. Results We present the first comparative evaluation of the diverse PPI corpora, performing quantitative evaluation using two separate information extraction methods as well as detailed statistical and qualitative analyses of their properties. For the evaluation, we unify the corpus PPI annotations to a shared level of information, consisting of undirected, untyped binary interactions of non-static types with no identification of the words specifying the interaction, no negations, and no interaction certainty. We find that the F-score performance of a state-of-the-art PPI extraction method varies on average 19 percentage units and in some cases over 30 percentage units between the different evaluated corpora. The differences stemming from the choice of corpus can thus be substantially larger than differences between the performance of PPI extraction methods, which suggests definite limits on the ability to compare methods evaluated on different resources. We analyse a number of potential sources for these differences and identify factors explaining approximately half of the variance. We further suggest ways in which the difficulty of the PPI extraction tasks codified by different corpora can be determined to advance comparability. Our analysis also identifies points of agreement and disagreement in PPI corpus annotation that are rarely explicitly stated by the authors of the corpora. Conclusions Our comparative analysis uncovers key similarities and differences between the diverse PPI corpora, thus taking an important step towards standardization. In the course of this study we have created a major practical contribution in converting the corpora into a shared format. The conversion software is freely available at <url>http://mars.cs.utu.fi/PPICorpora</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Mining the Gene Wiki for functional genomic knowledge

Author: A Subramanian
AI Su
Andrew I Su
AR Aronson
AR Pico
B Mons
Benjamin M Good
C Jonquet
D Weekes
Douglas G Howe
DW Huang
E Callaway
E Camon
EB Camon
ES Lander
H Stehr
I Rivals
J Osborne
JC Venter
JW Huss
JW Huss
L Hirschman
LA Flórez
M Ashburner
M Waldrop
N Daraselia
NH Shah
R Hoffmann
R Tirrell
R Winnenburg
Simon M Lin
W Baumgartner
Warren A Kibbe
Z Lu
Publication venue: BioMed Central
Publication date: 01/12/2011
Field of study

Abstract Background Ontology-based gene annotations are important tools for organizing and analyzing genome-scale biological data. Collecting these annotations is a valuable but costly endeavor. The Gene Wiki makes use of Wikipedia as a low-cost, mass-collaborative platform for assembling text-based gene annotations. The Gene Wiki is comprised of more than 10,000 review articles, each describing one human gene. The goal of this study is to define and assess a computational strategy for translating the text of Gene Wiki articles into ontology-based gene annotations. We specifically explore the generation of structured annotations using the Gene Ontology and the Human Disease Ontology. Results Our system produced 2,983 candidate gene annotations using the Disease Ontology and 11,022 candidate annotations using the Gene Ontology from the text of the Gene Wiki. Based on manual evaluations and comparisons to reference annotation sets, we estimate a precision of 90-93% for the Disease Ontology annotations and 48-64% for the Gene Ontology annotations. We further demonstrate that this data set can systematically improve the results from gene set enrichment analyses. Conclusions The Gene Wiki is a rapidly growing corpus of text focused on human gene function. Here, we demonstrate that the Gene Wiki can be a powerful resource for generating ontology-based gene annotations. These annotations can be used immediately to improve workflows for building curated gene annotation databases and knowledge-based statistical analyses.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

An environment for relation mining over richly annotated corpora: the case of GENIA

Author: A Koike
A Mikheev
A Ratnaparkhi
A Yakushiji
C Friedman
D Hindle
D Lin
DPA Corney
F Rinaldi
F Rinaldi
Fabio Rinaldi
G Leroy
G Minnen
G Schneider
G Schneider
G Schneider
Gerold Schneider
J Carroll
J Hakenberg
J Kim
J Preiss
J Saric
JC Reynar
K Kaljurand
Kaarel Kaljurand
LJ Jensen
M Collins
M Huang
M Marcus
M Romacker
Martin Romacker
Michael Hess
N Daraselia
S Novichkova
S Riedel
T Rindflesch
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The biomedical domain is witnessing a rapid growth of the amount of published scientific results, which makes it increasingly difficult to filter the core information. There is a real need for support tools that 'digest' the published results and extract the most important information. RESULTS: We describe and evaluate an environment supporting the extraction of domain-specific relations, such as protein-protein interactions, from a richly-annotated corpus. We use full, deep-linguistic parsing and manually created, versatile patterns, expressing a large set of syntactic alternations, plus semantic ontology information. CONCLUSION: The experiments show that our approach described is capable of delivering high-precision results, while maintaining sufficient levels of recall. The high level of abstraction of the rules used by the system, which are considerably more powerful and versatile than finite-state approaches, allows speedy interactive development and validation

Crossref

Springer - Publisher Connector

PubMed Central

ZORA

GenCLiP: a software program for clustering gene lists by literature profiling and constructing gene co-occurrence networks related to custom keywords

Author: AA Schaffer
BT Alako
C Plake
C Rodriguez-Penagos
D Chaussabel
D Lee
EG Cerami
G Karakiulakis
H Kim
Hui-Yong Tian
Jin Zhao
K Fundel
Kai-Tai Yao
KJ Bussey
LJ Jensen
M Bundschus
M Suderman
MB Eisen
N Daraselia
P Shannon
R Hammamieh
R Hoffmann
R Rubinstein
RT Tsai
S Li
T Ide
TK Jenssen
VK Gajendran
Yi-Bo Zhou
Z Huang
ZF Hu
Zhen-Fu Hu
Zhong-Xi Huang
Publication venue: BioMed Central
Publication date: 01/07/2008
Field of study

Abstract Background Biomedical researchers often want to explore pathogenesis and pathways regulated by abnormally expressed genes, such as those identified by microarray analyses. Literature mining is an important way to assist in this task. Many literature mining tools are now available. However, few of them allows the user to make manual adjustments to zero in on what he/she wants to know in particular. Results We present our software program, GenCLiP (Gene Cluster with Literature Profiles), which is based on the methods presented by Chaussabel and Sher (<it>Genome Biol </it>2002, 3(10):RESEARCH0055) that search gene lists to identify functional clusters of genes based on up-to-date literature profiling. Four features were added to this previously described method: the ability to 1) manually curate keywords extracted from the literature, 2) search genes and gene co-occurrence networks related to custom keywords, 3) compare analyzed gene results with negative and positive controls generated by GenCLiP, and 4) calculate probabilities that the resulting genes and gene networks are randomly related. In this paper, we show with a set of differentially expressed genes between keloids and normal control, how implementation of functions in GenCLiP successfully identified keywords related to the pathogenesis of keloids and unknown gene pathways involved in the pathogenesis of keloids. Conclusion With regard to the identification of disease-susceptibility genes, GenCLiP allows one to quickly acquire a primary pathogenesis profile and identify pathways involving abnormally expressed genes not previously associated with the disease.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Large-scale directional relationship extraction and resolution

Author: A Culotta
A Gladki
A Koike
A Yuryev
AB Clegg
C Rodriguez-Penagos
CM Topinka
Cory B Giles
D Zhou
F Rinaldi
F Rinaldi
H Chen
H Jang
H Kim
I Donaldson
IK Ruf
J Ding
J Jiang
JA Mitchell
JC Park
JD Kim
JD Kim
JD Wren
JD Wren
JD Wren
Jonathan D Wren
JP Vaque
K Fundel
K Sagae
LM Juliano
M Bundschus
M Chagoyen
M Huang
M Lease
M Wang
M-C de Marneffe
N Daraselia
P Zweigenbaum
R Bunescu
R Kuffner
RC Bunescu
RT Tsai
S Kim
S Novichkova
TK Jenssen
W Pratt
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Extracting causal relations on HIV drug resistance from literature

Author: A Koike
AM Cohen
Breanndán Ó Nualláin
C Giles
Charles A Boucher
D Klein
D Zhou
DR Douglas
F Horn
F Leitner
F Rinaldi
G Erkan
H Jang
H Saigo
IH Witten
J Saric
J Vercauteren
JG Liao
JH Kim
K Fundel
LJ Jensen
M Abulaish
M Huang
MY Kim
N Daraselia
O Sanchez-Graillet
P Zweigenbaum
Peter MA Sloot
Quoc-Chinh Bui
R Chowdhary
R Malik
RA Erhardt
S Ananiadou
S Katrenko
S Kim
T Lengauer
VI Torvik
Y Miyao
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background In HIV treatment it is critical to have up-to-date resistance data of applicable drugs since HIV has a very high rate of mutation. These data are made available through scientific publications and must be extracted manually by experts in order to be used by virologists and medical doctors. Therefore there is an urgent need for a tool that partially automates this process and is able to retrieve relations between drugs and virus mutations from literature. Results In this work we present a novel method to extract and combine relationships between HIV drugs and mutations in viral genomes. Our extraction method is based on natural language processing (NLP) which produces grammatical relations and applies a set of rules to these relations. We applied our method to a relevant set of PubMed abstracts and obtained 2,434 extracted relations with an estimated performance of 84% for F-score. We then combined the extracted relations using logistic regression to generate resistance values for each <drug, mutation> pair. The results of this relation combination show more than 85% agreement with the Stanford HIVDB for the ten most frequently occurring mutations. The system is used in 5 hospitals from the Virolab project <url>http://www.virolab.org</url> to preselect the most relevant novel resistance data from literature and present those to virologists and medical doctors for further evaluation. Conclusions The proposed relation extraction and combination method has a good performance on extracting HIV drug resistance data. It can be used in large-scale relation extraction experiments. The developed methods can also be applied to extract other type of relations such as gene-protein, gene-disease, and disease-mutation.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

EUR Research Repository

Erasmus University Digital Repository

DR-NTU (Digital Repository of NTU)

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Transboundary determinants of avian zoonotic infectious diseases: challenges for strengthening research capacity and connecting surveillance networks

Author: Andrew W. Bartlow
Annie Elshoff
Denys Muzyka
Fares Khoury
Ivane Daraselia
Jean Tsao
Jeanne M. Fair
Jennifer Owen
Lara Fakhouri
Lela Urushadze
Levan Ninua
Mu’men Alrwashdeh
Nisreen Al-Hmoud
Sopio Balkhamishvili
Zura Javakhishvili
Publication venue: Frontiers Media S.A.
Publication date: 01/02/2024
Field of study

As the climate changes, global systems have become increasingly unstable and unpredictable. This is particularly true for many disease systems, including subtypes of highly pathogenic avian influenzas (HPAIs) that are circulating the world. Ecological patterns once thought stable are changing, bringing new populations and organisms into contact with one another. Wild birds continue to be hosts and reservoirs for numerous zoonotic pathogens, and strains of HPAI and other pathogens have been introduced into new regions via migrating birds and transboundary trade of wild birds. With these expanding environmental changes, it is even more crucial that regions or counties that previously did not have surveillance programs develop the appropriate skills to sample wild birds and add to the understanding of pathogens in migratory and breeding birds through research. For example, little is known about wild bird infectious diseases and migration along the Mediterranean and Black Sea Flyway (MBSF), which connects Europe, Asia, and Africa. Focusing on avian influenza and the microbiome in migratory wild birds along the MBSF, this project seeks to understand the determinants of transboundary disease propagation and coinfection in regions that are connected by this flyway. Through the creation of a threat reduction network for avian diseases (Avian Zoonotic Disease Network, AZDN) in three countries along the MBSF (Georgia, Ukraine, and Jordan), this project is strengthening capacities for disease diagnostics; microbiomes; ecoimmunology; field biosafety; proper wildlife capture and handling; experimental design; statistical analysis; and vector sampling and biology. Here, we cover what is required to build a wild bird infectious disease research and surveillance program, which includes learning skills in proper bird capture and handling; biosafety and biosecurity; permits; next generation sequencing; leading-edge bioinformatics and statistical analyses; and vector and environmental sampling. Creating connected networks for avian influenzas and other pathogen surveillance will increase coordination and strengthen biosurveillance globally in wild birds

Directory of Open Access Journals

Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation

BACKGROUND: High-throughput experiments, such as with DNA microarrays, typically result in hundreds of genes potentially relevant to the process under study, rendering the interpretation of these experiments problematic. Here, we propose and evaluate an approach to find functional associations between large numbers of genes and other biomedical concepts from free-text literature. For each gene, a profile of related concepts is constructed that summarizes the context in which the gene is mentioned in literature. We assign a weight to each concept in the profile based on a likelihood ratio measure. Gene concept profiles can then be clustered to find related genes and other concepts. RESULTS: The experimental validation was done in two steps. We first applied our method on a controlled test set. After this proved to be successful the datasets from two DNA microarray experiments were analyzed in the same way and the results were evaluated by domain experts. The first dataset was a gene-expression profile that characterizes the cancer cells of a group of acute myeloid leukemia patients. For this group of patients the biological background of the cancer cells is largely unknown. Using our methodology we found an association of these cells to monocytes, which agreed with other experimental evidence. The second data set consisted of differentially expressed genes following androgen receptor stimulation in a prostate cancer cell line. Based on the analysis we put forward a hypothesis about the biological processes induced in these studied cells: secretory lysosomes are involved in the production of prostatic fluid and their development and/or secretion are androgen-regulated processes. CONCLUSION: Our method can be used to analyze DNA microarray datasets based on information explicitly and implicitly available in the literature. We provide a publicly available tool, dubbed Anni, for this purpose

Maastricht University Research Portal

Crossref

Springer - Publisher Connector

PubMed Central

EUR Research Repository

Erasmus University Digital Repository

Re-Annotation Is an Essential Step in Systems Biology Modeling of Functional Genomics Data

Author: A Harel
A Hutloff
AM Schnoes
Bart H. J. van den Berg
BH van den Berg
C Smith
CA Ouzounis
CE Jones
CE Rudd
CH Wu
D Barrell
D Devos
D Kemmer
DA Benson
DP Wall
E Eyras
E Quevillon
F Meurens
Fiona M. McCarthy
FM McCarthy
G Moreno-Hagelsieb
H Zhou
ICGS Consortium
Iddo Friedberg
J Burnside
JC Camus
JR Wortman
K Sellheyer
KM Kim
L Tian
LL Chen
M Andersson
M Andersson
M Ashburner
M Pruess
M Schena
M Vidric
ME van Berkel
MK Richardson
N Daraselia
N Gupta
N Rocques
O Gundogdu
PB Neerincx
PE Neiman
R Apweiler
R Edgar
RA Shilling
S Washietl
SE Brenner
Shane C. Burgess
SL Salzberg
Susan J. Lamont
T Barrett
TJ Buza
TJ Buza
UM Braga-Neto
V Wood
X Wang
YP de Jong
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

One motivation of systems biology research is to understand gene functions and interactions from functional genomics data such as that derived from microarrays. Up-to-date structural and functional annotations of genes are an essential foundation of systems biology modeling. We propose that the first essential step in any systems biology modeling of functional genomics data, especially for species with recently sequenced genomes, is gene structural and functional re-annotation. To demonstrate the impact of such re-annotation, we structurally and functionally re-annotated a microarray developed, and previously used, as a tool for disease research. We quantified the impact of this re-annotation on the array based on the total numbers of structural- and functional-annotations, the Gene Annotation Quality (GAQ) score, and canonical pathway coverage. We next quantified the impact of re-annotation on systems biology modeling using a previously published experiment that used this microarray. We show that re-annotation improves the quantity and quality of structural- and functional-annotations, allows a more comprehensive Gene Ontology based modeling, and improves pathway coverage for both the whole array and a differentially expressed mRNA subset. Our results also demonstrate that re-annotation can result in a different knowledge outcome derived from previous published research findings. We propose that, because of this, re-annotation should be considered to be an essential first step for deriving value from functional genomics data

Digital Repository @ Iowa State University (ISU)

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Scholars Junction - Mississippi State University Institutional Repository