Search CORE

arXiv.org e-Print Archive

Inference algorithms for gene networks: a statistical mechanics analysis

Author: A Braunstein
A Pagnani
Alberts B
Baillet-Bechet M Braunstein A Pagnani A Weigt M Zecchina R
Banerjee O El Ghaoui L d’Aspremont A Natsoulis G
Braunstein A
Butte A J
Engel A
Gardner E
Gardner E
Hertz J
Kabashima Y
Kabashima Y
Lee S I
M Weigt
Murphy K Mian S
R Zecchina
Ravikumar P Wainwright M J Lafferty J D
Schmidt M Niculescu-Mizil A Murphy K
Tibshirany R
Tria F Pagnani A Weigt M
Publication venue: 'IOP Publishing'
Publication date: 01/01/2008
Field of study

The inference of gene regulatory networks from high throughput gene expression data is one of the major challenges in systems biology. This paper aims at analysing and comparing two different algorithmic approaches. The first approach uses pairwise correlations between regulated and regulating genes; the second one uses message-passing techniques for inferring activating and inhibiting regulatory interactions. The performance of these two algorithms can be analysed theoretically on well-defined test sets, using tools from the statistical physics of disordered systems like the replica method. We find that the second algorithm outperforms the first one since it takes into account collective effects of multiple regulators

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Indirect two-sided relative ranking: a robust similarity measure for gene expression data

Author: CM Perou
DE Arking
DE Martin
E Chávez
E Hubbell
ER DeLong
G Natsoulis
G Wei
GJ Kaspers
GJ Kaspers
IM Chakravarti
J Lamb
J Lamb
J Lu
JL DeRisi
KP Seiler
Lise Getoor
LJ van't Veer
Louis Licamele
OG Troyanskaya
R Pieters
SL Pomeroy
T Hongo
TR Golub
W Liu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background There is a large amount of gene expression data that exists in the public domain. This data has been generated under a variety of experimental conditions. Unfortunately, these experimental variations have generally prevented researchers from accurately comparing and combining this wealth of data, which still hides many novel insights. Results In this paper we present a new method, which we refer to as indirect two-sided relative ranking, for comparing gene expression profiles that is robust to variations in experimental conditions. This method extends the current best approach, which is based on comparing the correlations of the up and down regulated genes, by introducing a comparison based on the correlations in rankings across the entire database. Because our method is robust to experimental variations, it allows a greater variety of gene expression data to be combined, which, as we show, leads to richer scientific discoveries. Conclusions We demonstrate the benefit of our proposed indirect method on several datasets. We first evaluate the ability of the indirect method to retrieve compounds with similar therapeutic effects across known experimental barriers, namely vehicle and batch effects, on two independent datasets (one private and one public). We show that our indirect method is able to significantly improve upon the previous state-of-the-art method with a substantial improvement in recall at rank 10 of 97.03% and 49.44%, on each dataset, respectively. Next, we demonstrate that our indirect method results in improved accuracy for classification in several additional datasets. These datasets demonstrate the use of our indirect method for classifying cancer subtypes, predicting drug sensitivity/resistance, and classifying (related) cell types. Even in the absence of a known (i.e., labeled) experimental barrier, the improvement of the indirect method in each of these datasets is statistically significant.</p

Springer - Publisher Connector

Digital Repository at the University of Maryland

Biogenesis of mitochondrial proteins

Open Access LMU ( Ludwig-Maximilians-Univ. München)

Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery

Author: A Ben-Dor
A Moreau
A Pachot
Aaron T. Smith
AC Gavin
AL Barabasi
B Ganter
C Fan
C Lu
C Sima
Craig E. Thomas
DF Ransohoff
DF Ransohoff
DH Adams
DL Mendrick
E Vittinghoff
ER Dougherty
FD Sistare
G Natsoulis
G Natsoulis
George H. Searfoss
GH John
GW Donaldson
I Guyon
I Guyon
IDJ Bross
J Liu
J Ozer
JE Peterson
JH Cai
Jiangang Liu
JW Eun
Keith Dunker
Keith M. Goldstein
L Coussens
MA Olayioye
MR Fielden
N Dessì
N Zidek
P Peduzzi
Peter Csermely
PR Bushel
R Kohavi
R Tibshirani
Robert A. Jolly
S Das
Shuyu Li
T Bo
Tao Wei
TP Ryan
TR Golub
Vladimir N. Uversky
W Luo
X Fan
X Zhang
Y Saeys
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructed from such data sets often consist of a large number of genes with no obvious functional relevance to the biological effect the model intends to predict that can make it challenging to interpret the modeling results. To address these issues, we developed a novel algorithm, Predictive Power Estimation Algorithm (PPEA), which estimates the predictive power of each individual transcript through an iterative two-way bootstrapping procedure. By repeatedly enforcing that the sample number is larger than the transcript number, in each iteration of modeling and testing, PPEA reduces the potential risk of overfitting. We show with three different cases studies that: (1) PPEA can quickly derive a reliable rank order of predictive power of individual transcripts in a relatively small number of iterations, (2) the top ranked transcripts tend to be functionally related to the phenotype they are intended to predict, (3) using only the most predictive top ranked transcripts greatly facilitates development of multiplex assay such as qRT-PCR as a biomarker, and (4) more importantly, we were able to demonstrate that a small number of genes identified from the top-ranked transcripts are highly predictive of phenotype as their expression changes distinguished adverse from nonadverse effects of compounds in completely independent tests. Thus, we believe that the PPEA model effectively addresses the over-fitting problem and can be used to facilitate genomic biomarker discovery for predictive toxicology and drug responses

USFSP Digital Archive

Scholar Commons - University of South Florida

Functional analysis of multiple genomic signatures demonstrates that classification algorithms choose phenotype-related genes

Author: A Barla
A Bugrim
A Guryanov
A Oberthuer
A Subramanian
AL Barabasi
AL Boulesteix
C Furlanello
CM Perou
D Dosymbekov
DW Parsons
EK Lobenhofer
F Murtagh
FL Kiechle
G Jurman
G Natsoulis
G Natsoulis
H Bonnefoi
HP Fischer
HY Chang
J C Corton
J Cohen
J Dopazo
JD Shaughnessy Jr
JJ Chen
JW Eun
KI Goh
KR Hess
L Ein-Dor
L Shi
LD Wood
LJ van ‘t Veer
M Ashburner
M Bessarabova
M Chen
M Dudoladova
M Kanehisa
M Vidal
MA Troester
ME Cusick
MR Fielden
R Ihaka
R J Brennan
R S Thomas
R Shah
R Shen
RA Fisher
RS Thomas
S Dudoit
S Jones
S Siegel
T Ideker
T Nikolskaya
T Serebryiskaya
T Shi
T Sorlie
W Huang da
W Shi
W Tong
Y Deng
Y Huang
Y Nikolsky
Y Nikolsky
Y Nikolsky
Y Nikolsky
Z Dezso
Z Dezso
Publication venue: Nature Publishing Group
Publication date: 01/01/2010
Field of study

Gene expression signatures of toxicity and clinical response benefit both safety assessment and clinical practice; however, difficulties in connecting signature genes with the predicted end points have limited their application. The Microarray Quality Control Consortium II (MAQCII) project generated 262 signatures for ten clinical and three toxicological end points from six gene expression data sets, an unprecedented collection of diverse signatures that has permitted a wide-ranging analysis on the nature of such predictive models. A comprehensive analysis of the genes of these signatures and their nonredundant unions using ontology enrichment, biological network building and interactome connectivity analyses demonstrated the link between gene signatures and the biological basis of their predictive power. Different signatures for a given end point were more similar at the level of biological properties and transcriptional control than at the gene level. Signatures tended to be enriched in function and pathway in an end point and model-specific manner, and showed a topological bias for incoming interactions. Importantly, the level of biological similarity between different signatures for a given end point correlated positively with the accuracy of the signature predictions. These findings will aid the understanding, and application of predictive genomic signatures, and support their broader application in predictive medicine

Aquila Digital Community (University of Southern Mississippi, USM)

Archivio della ricerca - Fondazione Bruno Kessler

A tryptophan-rich peptide acts as a transcription activation domain

Author: A Traven
AJ Courey
B Chatton
BS Negrutskii
C Francklyn
CC Wang
CC Wang
Chen-Huan Lin
Chia-Pei Chang
Chien-Chia Wang
CP Chang
CW Carter Jr
FT Zenke
G Natsoulis
G Simos
Grace Lin
HL Tang
I Sadowski
KJ Chang
L Kuras
L Maréchal-Drouard
M Francin
M Kaminska
M Mirande
M Pelchat
M Ptashne
MJ Carrozza
MR Green
N Mermod
P Schimmel
R Giege
SA Martinis
SA Martinis
WC Chiu
Y Wu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Eukaryotic transcription activators normally consist of a sequence-specific DNA-binding domain (DBD) and a transcription activation domain (AD). While many sequence patterns and motifs have been defined for DBDs, ADs do not share easily recognizable motifs or structures. Results We report herein that the N-terminal domain of yeast valyl-tRNA synthetase can function as an AD when fused to a DNA-binding protein, LexA, and turn on reporter genes with distinct LexA-responsive promoters. The transcriptional activity was mainly attributed to a five-residue peptide, WYDWW, near the C-terminus of the N domain. Remarkably, the pentapeptide <it>per se </it>retained much of the transcriptional activity. Mutations which substituted tryptophan residues for both of the non-tryptophan residues in the pentapeptide (resulting in W5) significantly enhanced its activity (~1.8-fold), while mutations which substituted aromatic residues with alanine residues severely impaired its activity. Accordingly, a much more active peptide, pentatryptophan (W7), was produced, which elicited ~3-fold higher activity than that of the native pentapeptide and the N domain. Further study indicated that W7 mediates transcription activation through interacting with the general transcription factor, TFIIB. Conclusions Since W7 shares no sequence homology or features with any known transcription activators, it may represent a novel class of AD.</p

Springer - Publisher Connector

Public Library of Science (PLOS)

Analysis and Computational Dissection of Molecular Signature Multiplicity

Author: A Ploner
Alexander Statnikov
B Hammer
CF Aliferis
CF Aliferis
CF Aliferis
CF Aliferis
CF Aliferis
Constantin F. Aliferis
DL Gold
E Dougherty
F Azuaje
F Wagner
G Balazsi
G Natsoulis
I Guyon
I Tsamardinos
J Pearl
J Pearl
J Peña
J Shawe-Taylor
JP Ioannidis
L Ein-Dor
L Ein-Dor
L Li
LR Grate
M Hollander
P Roepman
RL Somorjai
S Michiels
S Ramaswamy
Scott Markel
SM Weiss
T Chu
TR Golub
TS Furey
X Qiu
Publication venue: Public Library of Science
Publication date: 01/05/2010
Field of study

Molecular signatures are computational or mathematical models created to diagnose disease and other phenotypes and to predict clinical outcomes and response to treatment. It is widely recognized that molecular signatures constitute one of the most important translational and basic science developments enabled by recent high-throughput molecular assays. A perplexing phenomenon that characterizes high-throughput data analysis is the ubiquitous multiplicity of molecular signatures. Multiplicity is a special form of data analysis instability in which different analysis methods used on the same data, or different samples from the same population lead to different but apparently maximally predictive signatures. This phenomenon has far-reaching implications for biological discovery and development of next generation patient diagnostics and personalized treatments. Currently the causes and interpretation of signature multiplicity are unknown, and several, often contradictory, conjectures have been made to explain it. We present a formal characterization of signature multiplicity and a new efficient algorithm that offers theoretical guarantees for extracting the set of maximally predictive and non-redundant signatures independent of distribution. The new algorithm identifies exactly the set of optimal signatures in controlled experiments and yields signatures with significantly better predictivity and reproducibility than previous algorithms in human microarray gene expression datasets. Our results shed light on the causes of signature multiplicity, provide computational tools for studying it empirically and introduce a framework for in silico bioequivalence of this important new class of diagnostic and personalized medicine modalities

Public Library of Science (PLOS)

Application of Biomarkers in Cancer Risk Management: Evaluation from Stochastic Clonal Evolutionary and Dynamic System Optimization Points of View

Author: A Jemal
A Tanemura
AY Yakovlev
BA Weir
BJ Flehinger
BJ Reid
Brian J. Reid
C Greenman
CJ Ye
CL Sawyers
Claus O. Wilke
CM Croce
D Hanahan
DF Ransohoff
EG Luebeck
EP Diamandis
ER Fearon
G Natsoulis
HH Heng
HH Heng
HY Chen
J Handl
JB O'Connell
JE Cohen
JJ Lee
K Shedden
KD Siegmunda
LD Wood
LJ van 't Veer
M Dettling
M Esteller
M Zelen
MA Nowak
ME Robson
MT Barrett
N Gerges
P Ao
PA Wingo
Patricia L. Blount
PC Galipeau
PC Nowell
PC Prorok
R Beroukhim
R Etzioni
S Frank
S Jones
S Jones
SJ Lee
SM Hanash
Thomas L. Vaughan
WY Tan
X Li
Xiaohong Li
Publication venue: Public Library of Science
Publication date: 01/02/2011
Field of study

Aside from primary prevention, early detection remains the most effective way to decrease mortality associated with the majority of solid cancers. Previous cancer screening models are largely based on classification of at-risk populations into three conceptually defined groups (normal, cancer without symptoms, and cancer with symptoms). Unfortunately, this approach has achieved limited successes in reducing cancer mortality. With advances in molecular biology and genomic technologies, many candidate somatic genetic and epigenetic “biomarkers” have been identified as potential predictors of cancer risk. However, none have yet been validated as robust predictors of progression to cancer or shown to reduce cancer mortality. In this Perspective, we first define the necessary and sufficient conditions for precise prediction of future cancer development and early cancer detection within a simple physical model framework. We then evaluate cancer risk prediction and early detection from a dynamic clonal evolution point of view, examining the implications of dynamic clonal evolution of biomarkers and the application of clonal evolution for cancer risk management in clinical practice. Finally, we propose a framework to guide future collaborative research between mathematical modelers and biomarker researchers to design studies to investigate and model dynamic clonal evolution. This approach will allow optimization of available resources for cancer control and intervention timing based on molecular biomarkers in predicting cancer among various risk subsets that dynamically evolve over time

The population biology and evolutionary significance of Ty elements in Saccharomyces cerevisiae

Author: A. E. Shrimpton
A. J. Kingsman
A. K. W. Taguchi
A. Kurlandzka
A. L. Koch
A. S. Kondrashov
B. Charlesworth
B. Errede
B. G. Hall
B. G. Hall
B. J. Fitzpatrick
C. E. Paquin
C. M. Wilke
C. M. Wilke
C. Morawetz
C. N. Giroux
C. Paquin
C. Paquin
D. J. Clark
D. J. Finnegan
D. J. Garfinkel
D. J. Garfinkel
E. G. Pasyukova
E. G. Pasyukova
F. A. Laski
F. Muller
F. W. Stahl
G. Cornelis
G. Natsoulis
G. R. Fink
G. Simchen
H. Eibel
H. Eibel
H. Iida
H. L. Klein
H. Xu
J. A. Barnett
J. Adams
J. B. Stavenhagen
J. D. Boeke
J. D. Boeke
J. D. Boeke
J. D. Boeke
J. D. Boeke
J. D. Boeke
J. F. McDonald
J. F. McDonald
J. Gafner
J. Gatner
J. J. Clare
J. J. Clare
J. Lopilato
J. Maynard Smith
J. Mellor
J. Mellor
J. Mellor
J. R. Cameron
J. R. Warmington
K. G. Weinstock
L. Chao
L. Chao
L. E. Orgel
L. J. Hansen
L. J. Hansen
M. B. Pedersen
M. B. Pedersen
M. B. Pedersen
M. G. Goebl
M. J. Curcio
M. Rolfe
M. Rose
M. Syvanen
N. Kleckner
N. L. Kaplan
P. J. Farabaugh
P. J. Kretschmer
P. Nevers
P. Perrot
P. Philippsen
R. A. Voelker
R. C. Lewontin
R. H. MacArthur
R. Modi
R. Rothstein
R. Stucka
R. T. Elder
R. T. Elder
S. A. Sawyer
S. B. Sandmeyer
S. D. Yougren
S. E. Adams
S. Misra
S. Picologlou
S. Picologlou
S. Sawyer
S. Scherer
S. W. Liebman
T. B. Oyen
T. F. C. Mackay
T. F. C. Mackay
T. McClanahan
V. M. Williamson
W. F. Doolittle
W. J. Conover
W. Wilson
Y. Toh-e
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1992
Field of study

The basic structure and properties of Ty elements are considered with special reference to their role as agents of evolutionary change. Ty elements may generate genetic variation for fitness by their action as mutagens, as well as by providing regions of portable homology for recombination. The mutational spectra generated by Ty 1 transposition events may, due to their target specificity and gene regulatory capabilities, possess a higher frequency of adaptively favorable mutations than spectra resulting from other types of mutational processes. Laboratory strains contain between 25–35 elements, and in both these and industrial strains the insertions appear quite stable. In contrast, a wide variation in Ty number is seen in wild isolates, with a lower average number/genome. Factors which may determine Ty copy number in populations include transposition rates (dependent on Ty copy number and mating type), and stabilization of Ty elements in the genome as well as selection for and against Ty insertions in the genome. Although the average effect of Ty transpositions are deleterious, populations initiated with a single clone containing a single Ty element steadily accumulated Ty elements over 1,000 generations. Direct evidence that Ty transposition events can be selectively favored is provided by experiments in which populations containing large amounts of variability for Ty1 copy number were maintained for ∼100 generations in a homogeneous environment. At their termination, the frequency of clones containing 0 Ty elements had decreased to ∼0.0, and the populations had became dominated by a small number of clones containing >0 Ty elements. No such reduction in variability was observed in populations maintained in a structured environment, though changes in Ty number were observed. The implications of genetic (mating type and ploidy) changes and environmental fluctuations for the long-term persistence of Ty elements within the S. cerevisiae species group are discussed.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/42799/1/10709_2004_Article_BF00133718.pd