Search CORE

The University of Manchester - Institutional Repository

eGIFT: Mining Gene Information from the Literature

Author: A Gladki
AS Schwartz
C Blaschke
C Perez-Iratxeta
Carl J Schmidt
Catalina O Tudor
CO Tudor
D Cheng
D Rebholz-Schuhmann
D Yarowsky
H Maier
H Shatkay
J Ding
J McEntyre
J Miller
JJ Kim
K Fundel
K Vijay-Shanker
KB Cohen
LC Tsoi
M Krallinger
MA Andrade
NR Smalheiser
O Gospodnetic
PK Shah
R Bruce
R Jelier
S Gaudan
S Kaczanowski
S Pakhomov
Y Liu
Y Tsuruoka
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background With the biomedical literature continually expanding, searching PubMed for information about specific genes becomes increasingly difficult. Not only can thousands of results be returned, but gene name ambiguity leads to many irrelevant hits. As a result, it is difficult for life scientists and gene curators to rapidly get an overall picture about a specific gene from documents that mention its names and synonyms. Results In this paper, we present eGIFT (<url>http://biotm.cis.udel.edu/eGIFT</url>), a web-based tool that associates informative terms, called <it>i</it>Terms, and sentences containing them, with genes. To associate <it>i</it>Terms with a gene, eGIFT ranks <it>i</it>Terms about the gene, based on a score which compares the frequency of occurrence of a term in the gene's literature to its frequency of occurrence in documents about genes in general. To retrieve a gene's documents (Medline abstracts), eGIFT considers all gene names, aliases, and synonyms. Since many of the gene names can be ambiguous, eGIFT applies a disambiguation step to remove matches that do not correspond to this gene. Another additional filtering process is applied to retain those abstracts that focus on the gene rather than mention it in passing. eGIFT's information for a gene is pre-computed and users of eGIFT can search for genes by using a name or an EntrezGene identifier. <it>i</it>Terms are grouped into different categories to facilitate a quick inspection. eGIFT also links an <it>i</it>Term to sentences mentioning the term to allow users to see the relation between the <it>i</it>Term and the gene. We evaluated the precision and recall of eGIFT's <it>i</it>Terms for 40 genes; between 88% and 94% of the <it>i</it>Terms were marked as salient by our evaluators, and 94% of the UniProtKB keywords for these genes were also identified by eGIFT as <it>i</it>Terms. Conclusions Our evaluations suggest that <it>i</it>Terms capture highly-relevant aspects of genes. Furthermore, by showing sentences containing these terms, eGIFT can provide a quick description of a specific gene. eGIFT helps not only life scientists survey results of high-throughput experiments, but also annotators to find articles describing gene aspects and functions.</p

Springer - Publisher Connector

Seeded Bayesian Networks: Constructing genetic networks from microarray data

Author: AI Saeed
AI Saeed
AJ Hartemink
Amira Djebbari
AV Werhli
D Heckerman
D Husmeier
DC Weaver
DH Wolpert
DM Chickering
DM Chickering
E Frank
G Bastos
J McEntyre
JF Rual
John Quackenbush
JR Nevins
JW Harbour
M Kanehisa
ME Ross
ME Ross
N Friedman
N Friedman
O Gevaert
P Le Phillip
P Shannon
PT Spellman
R Castelo
S Acid
S Aref
S Imoto
T Akutsu
T Chen
T Fawcett
TH Cormen
TK Jenssen
TR Golub
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background DNA microarrays and other genomics-inspired technologies provide large datasets that often include hidden patterns of correlation between genes reflecting the complex processes that underlie cellular metabolism and physiology. The challenge in analyzing large-scale expression data has been to extract biologically meaningful inferences regarding these processes – often represented as networks – in an environment where the datasets are often imperfect and biological noise can obscure the actual signal. Although many techniques have been developed in an attempt to address these issues, to date their ability to extract meaningful and predictive network relationships has been limited. Here we describe a method that draws on prior information about gene-gene interactions to infer biologically relevant pathways from microarray data. Our approach consists of using preliminary networks derived from the literature and/or protein-protein interaction data as seeds for a Bayesian network analysis of microarray results. Results Through a bootstrap analysis of gene expression data derived from a number of leukemia studies, we demonstrate that seeded Bayesian Networks have the ability to identify high-confidence gene-gene interactions which can then be validated by comparison to other sources of pathway data. Conclusion The use of network seeds greatly improves the ability of Bayesian Network analysis to learn gene interaction networks from gene expression data. We demonstrate that the use of seeds derived from the biomedical literature or high-throughput protein-protein interaction data, or the combination, provides improvement over a standard Bayesian Network analysis, allowing networks involving dynamic processes to be deduced from the static snapshots of biological systems that represent the most common source of microarray data. Software implementing these methods has been included in the widely used TM4 microarray analysis package.</p

Harvard University - DASH

Springer - Publisher Connector

Public Library of Science (PLOS)

Computational Methods for Protein Identification from Mass Spectrometry Data

Author: A Frank
A Ganapathy
A Keller
A Keller
A Keller
A Taylor
AI Nesvizhskii
AJ Liska
AJ Liska
AJ Liska
AJ Mackey
B Habermann
B Ma
BC Searle
BE Boyes
C Robertson
CA Hastings
D Fenyo
DA Stead
DB Weatherly
DC Chamrad
DF Hochstrasser
DJ Pappin
DL Tabb
DN Perkins
ED Salin
F Levander
H Wang
HI Field
HJ Joshi
I Beer
I Rogers
I Shadforth
J Arthur
J Eriksson
J Eriksson
J Magnin
J Peng
J Razumovskaya
J Reinders
J Samuelsson
JA Bons
JE Elias
JG Rohrbough
JJ Thomson
JK Eng
JL Joss
JM Hogan
Johanna McEntyre
Jonathan W Arthur
JR Yates III
JWH Wong
K Biemann
KA Resing
KA Resing
KR Coombes
L Huang
Leo McHugh
M Kempka
M Mann
M Tuloup
MA Baldwin
MJ Noga
ML Nielsen
MR Wilkins
NL Anderson
P Hernandez
R Apweiler
R Ullmer
RE Moorea
RJ Arnold
RM Day
S Carr
S Gay
S Orchard
SB Vardeman
SF Altschul
SJ Cordwell
V Bafna
V Dancik
W Zhang
WJ Henzel
WR Pearson
Y Chen
Y Han
Z Zhang
Z Zhang
Publication venue: Public Library of Science
Publication date: 01/02/2008
Field of study

Protein identification using mass spectrometry is an indispensable computational tool in the life sciences. A dramatic increase in the use of proteomic strategies to understand the biology of living systems generates an ongoing need for more effective, efficient, and accurate computational methods for protein identification. A wide range of computational methods, each with various implementations, are available to complement different proteomic approaches. A solid knowledge of the range of algorithms available and, more critically, the accuracy and effectiveness of these techniques is essential to ensure as many of the proteins as possible, within any particular experiment, are correctly identified. Here, we undertake a systematic review of the currently available methods and algorithms for interpreting, managing, and analyzing biological data associated with protein identification. We summarize the advances in computational solutions as they have responded to corresponding advances in mass spectrometry hardware. The evolution of scoring algorithms and metrics for automated protein identification are also discussed with a focus on the relative performance of different techniques. We also consider the relative advantages and limitations of different techniques in particular biological contexts. Finally, we present our perspective on future developments in the area of computational protein identification by considering the most recent literature on new and promising approaches to the problem as well as identifying areas yet to be explored and the potential application of methods from other areas of computational biology

Translational Systems Biology of Inflammation

Inflammation is a complex, multi-scale biologic response to stress that is also required for repair and regeneration after injury. Despite the repository of detailed data about the cellular and molecular processes involved in inflammation, including some understanding of its pathophysiology, little progress has been made in treating the severe inflammatory syndrome of sepsis. To address the gap between basic science knowledge and therapy for sepsis, a community of biologists and physicians is using systems biology approaches in hopes of yielding basic insights into the biology of inflammation. “Systems biology” is a discipline that combines experimental discovery with mathematical modeling to aid in the understanding of the dynamic global organization and function of a biologic system (cell to organ to organism). We propose the term translational systems biology for the application of similar tools and engineering principles to biologic systems with the primary goal of optimizing clinical practice. We describe the efforts to use translational systems biology to develop an integrated framework to gain insight into the problem of acute inflammation. Progress in understanding inflammation using translational systems biology tools highlights the promise of this multidisciplinary field. Future advances in understanding complex medical problems are highly dependent on methodological advances and integration of the computational systems biology community with biologists and clinicians

Public Library of Science (PLOS)

D-Scholarship@Pitt

Differentiating Protein-Coding and Noncoding RNA: Challenges and Ambiguities

The assumption that RNA can be readily classified into either protein-coding or non-protein–coding categories has pervaded biology for close to 50 years. Until recently, discrimination between these two categories was relatively straightforward: most transcripts were clearly identifiable as protein-coding messenger RNAs (mRNAs), and readily distinguished from the small number of well-characterized non-protein–coding RNAs (ncRNAs), such as transfer, ribosomal, and spliceosomal RNAs. Recent genome-wide studies have revealed the existence of thousands of noncoding transcripts, whose function and significance are unclear. The discovery of this hidden transcriptome and the implicit challenge it presents to our understanding of the expression and regulation of genetic information has made the need to distinguish between mRNAs and ncRNAs both more pressing and more complicated. In this Review, we consider the diverse strategies employed to discriminate between protein-coding and noncoding transcripts and the fundamental difficulties that are inherent in what may superficially appear to be a simple problem. Misannotations can also run in both directions: some ncRNAs may actually encode peptides, and some of those currently thought to do so may not. Moreover, recent studies have shown that some RNAs can function both as mRNAs and intrinsically as functional ncRNAs, which may be a relatively widespread phenomenon. We conclude that it is difficult to annotate an RNA unequivocally as protein-coding or noncoding, with overlapping protein-coding and noncoding transcripts further confounding this distinction. In addition, the finding that some transcripts can function both intrinsically at the RNA level and to encode proteins suggests a false dichotomy between mRNAs and ncRNAs. Therefore, the functionality of any transcript at the RNA level should not be discounted

University of Queensland eSpace

Overview of the interactive task in BioCreative V

Author: Afroza K. Irin
Andrew Chatr-Aryamontri
Arighi
Arighi
Arighi
Barbra Ferrell
Cathy H. Wu
Cecilia N. Arighi
Chu-Hsien Su
Comeau
David Campos
David Salgado
Emiliano Pereira
Evangelos Pafilis
Fabio Rinaldi
Gabriela Contreras
Georgios Gkoutos
Hamsa D. Tadepally
Hirschman
Hong-Jie Dai
Hui-Jou Chou
Ingrid Keseler
Jeyakumar Natarajan
Johanna McEntyre
Juliane Fluck
Karen Rothfels
Kimberly Van Auken
Krallinger
Lara Almeida
Lars J. Jensen
Laurel Cooper
Likert
Loukia Tsaprouni
Lucy Chilton
Lynette Hirschman
Marija Milacic
Mary Schaeffer
Matthew Mort
Nancy George
Nicole Vasilevsky
Onkar Singh
Peter McQuilton
Qinghua Wang
Raquel M. Silva
Raul Rodriguez-Esteban
Raymund Stefancsik
Riza Batista-Navarro
Sandra Orchard
Sangya Pundir
Shabbir S. Abdul
Sherri Matis-Mitchell
Shruti Rao
Silvia Jimenez
Socorro Gama-Castro
Sophia Ananiadou
Stanley J. F. Laulederkind
Sumit Madan
Suresh Subramani
Sérgio Matos
Toni R. Jue
Wu
Xiaodong Wang
Yalbi I. Balderas-Martínez
Zhiyong Lu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These systems are not meant to replace biocurators, but instead to assist them in one or more literature curation steps. To do so, the user interface is an important aspect that needs to be considered for tool adoption. The BioCreative Interactive task (IAT) is a track designed for exploring user-system interactions, promoting development of useful TM tools, and providing a communication channel between the biocuration and the TM communities. In BioCreative V, the IAT track followed a format similar to previous interactive tracks, where the utility and usability of TM tools, as well as the generation of use cases, have been the focal points. The proposed curation tasks are user-centric and formally evaluated by biocurators. In BioCreative V IAT, seven TM systems and 43 biocurators participated. Two levels of user participation were offered to broaden curator involvement and obtain more feedback on usability aspects. The full level participation involved training on the system, curation of a set of documents with and without TM assistance, tracking of time-on-task, and completion of a user survey. The partial level participation was designed to focus on usability aspects of the interface and not the performance per se. In this case, biocurators navigated the system by performing pre-designed tasks and then were asked whether they were able to achieve the task and the level of difficulty in completing the task. In this manuscript, we describe the development of the interactive task, from planning to execution and discuss major findings for the systems tested

University of Birmingham Research Portal

HAL AMU

The University of Manchester - Institutional Repository

MPG.PuRe

Hal-Diderot

University of Bedfordshire Repository

Online Research @ Cardiff

HAL-Inserm

Copenhagen University Research Information System

Oxford University Research Archive

Structural Biology by NMR: Structure, Dynamics, and Interactions

Author: A Bhattacharya
A Cavalli
A Grishaev
A Grishaev
A Jack
A Loquet
A Yee
AD Mackerell Jr
AE Torda
AJ Nederveen
AJ Nederveen
AK Jha
AS Altieri
AT Brunger
B Lopez-Mendez
BF Volkman
BG Schulze
C Dominguez
C Tang
CA Spronk
D Abergel
D Hamelberg
D Hamelberg
D Shortle
DA Case
DA Snyder
DC Williams Jr
DE Woessner
DM Korzhnev
DS Wishart
DS Wishart
E Ab
EZ Eisenmesser
EZ Eisenmesser
F Castellani
F Zhang
FAA Mulder
G Bouvignies
G Bouvignies
G Bouvignies
G Kontaxis
G Lipari
G Nicastro
GM Clore
GM Clore
H Grubmüller
HJ Dyson
HL Liu
J Iwahara
J Kuszewski
J Meiler
JC Hus
Johanna McEntyre
JP Linge
JP Loria
JR Tolman
K Lindorff-Larsen
K Loth
K Pervushin
K Wüthrich
KA Henzler-Wildman
KB Briggman
L Wang
L Wang
M Habeck
M Habeck
M Kainosho
MD Mukrasch
Michael Nilges
MJ Osborne
N Tjandra
NA Lakomek
P Bernardo
P Güntert
P Vallurupalli
P Vallurupalli
Phineus R. L. Markwick
PR Markwick
PRL Markwick
PRL Markwick
R Bruschweiler
R Cole
R Elber
R Horst
R Sprangers
SB Nabuurs
SB Nabuurs
SL Chang
T Bremi
T Herrmann
TA Ulmer
Thérèse Malliavin
V Hornak
V Tugarinov
V Tugarinov
VA Daragan
VA Feher
W Peti
W Rieping
W Rieping
WF Vranken
Y Kim
Y Shen
YJ Huang
YJ Huang
YJ Huang
YS Jung
Publication venue: Public Library of Science
Publication date: 01/09/2008
Field of study

The function of bio-macromolecules is determined by both their 3D structure and conformational dynamics. These molecules are inherently flexible systems displaying a broad range of dynamics on time-scales from picoseconds to seconds. Nuclear Magnetic Resonance (NMR) spectroscopy has emerged as the method of choice for studying both protein structure and dynamics in solution. Typically, NMR experiments are sensitive both to structural features and to dynamics, and hence the measured data contain information on both. Despite major progress in both experimental approaches and computational methods, obtaining a consistent view of structure and dynamics from experimental NMR data remains a challenge. Molecular dynamics simulations have emerged as an indispensable tool in the analysis of NMR data

Public Library of Science (PLOS)