Search CORE

Numérisation de Documents Anciens Mathématiques

Differential expression analysis with global network adjustment

Author: A Antonellis
A Zellner
AE Hoerl
AI Su
D Bates
DB Dahl
E Choy
EJ Cosgrove
H Zou
J Friedman
J Ruan
J Schoumans
J Wettenhall
Jannine D Cody
Jonathan A Gelfond
Joseph G Ibrahim
JT Leek
M Gustafsson
M Newton
Mayetri Gupta
Ming-Hui Chen
R Development Core Team
R Tibshirani
RJ Prill
S Pounds
SC Smith
SM Siepka
T Barrett
T Barrett
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: Large-scale chromosomal deletions or other non-specific perturbations of the transcriptome can alter the expression of hundreds or thousands of genes, and it is of biological interest to understand which genes are most profoundly affected. We present a method for predicting a gene’s expression as a function of other genes thereby accounting for the effect of transcriptional regulation that confounds the identification of genes differentially expressed relative to a regulatory network. The challenge in constructing such models is that the number of possible regulator transcripts within a global network is on the order of thousands, and the number of biological samples is typically on the order of 10. Nevertheless, there are large gene expression databases that can be used to construct networks that could be helpful in modeling transcriptional regulation in smaller experiments. Results: We demonstrate a type of penalized regression model that can be estimated from large gene expression databases, and then applied to smaller experiments. The ridge parameter is selected by minimizing the cross-validation error of the predictions in the independent out-sample. This tends to increase the model stability and leads to a much greater degree of parameter shrinkage, but the resulting biased estimation is mitigated by a second round of regression. Nevertheless, the proposed computationally efficient “over-shrinkage” method outperforms previously used LASSO-based techniques. In two independent datasets, we find that the median proportion of explained variability in expression is approximately 25%, and this results in a substantial increase in the signal-to-noise ratio allowing more powerful inferences on differential gene expression leading to biologically intuitive findings. We also show that a large proportion of gene dependencies are conditional on the biological state, which would be impossible with standard differential expression methods. Conclusions: By adjusting for the effects of the global network on individual genes, both the sensitivity and reliability of differential expression measures are greatly improved.</p&gt

Springer - Publisher Connector

Carolina Digital Repository

Enlighten

Recommended from our members

Radon and Remedial Action in Spokane River Valley Residences: An Interim Report

Author: Fisk W. J.
Grimsrud D. T.
Moed B. A.
Prill R. J.
Sextro R. G.
Turk B. H.
Publication venue: Lawrence Berkeley Laboratory
Publication date: 01/03/1986
Field of study

Fifty-six percent of 46 residences monitored in the Spokane River Valley in eastern Washington/northern Idaho have indoor radon concentrations above the National Council for Radiation Protection (NCRP) guidelines of 8 pCi/1. Indoor levels were over 20 pCi/1 in eight homes, and ranged up to 132 pCi/1 in one house. Radon concentrations declined by factors of 4 to 38 during summer months. Measurements of soil emanation rates, domestic water supply concentrations, and building material flux rates indicate that diffusion of radon does not significantly contribute to the high concentrations observed. Rather, radon entry is dominated by pressure-driven bulk soil gas transport, aggravated by the local subsurface soil composition and structure. A variety of radon control strategies are being evaluated in 14 of these homes. Sub-surface ventilation by depressurization and overpressurization, basement overpressurization, and crawlspace ventilation are capable of successfully reducing radon levels below 5 pCi/1 in these homes. House ventilation is appropriate in buildings with low-moderate concentrations, while sealing of cracks has been relatively ineffective

UNT Digital Library

Network 'small-world-ness': a quantitative method for determining canonical network equivalence

Author: A Barrat
A Ozgur
A Roxin
A Wagner
AL Barabasi
B Efron
D Lusseau
DE Knuth
DJ Watts
DJ Watts
DS Bassett
H Ebel
H Jeong
H Jeong
J Saramaki
JG White
K Klemm
Kevin Gurney
L Tian
LA Adamic
LA Amaral
LF Lago-Fernandez
LR Little
M Barahona
M Bollobas
M Faloutsos
M Huxham
M Kaiser
MA Janssen
MA Stephens
Mark D. Humphries
MD Humphries
ME Newman
ME Newman
ME Newman
ME Newman
MEJ Newman
MEJ Newman
MEJ Newman
MEJ Newman
MJ Conyon
MJ Keeling
ND Martinez
O Sporns
Olaf Sporns
P Sen
PS Bearman
R Albert
R Cohen
R de Castro
R Khanin
R Milo
RF Cancho
RJ Prill
S Achard
S Boccaletti
S Delre
S Valverde
T Nishikawa
TI Netoff
V Braitenberg
V Latora
VP Zhigulin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2008
Field of study

Background: Many technological, biological, social, and information networks fall into the broad class of 'small-world' networks: they have tightly interconnected clusters of nodes, and a shortest mean path length that is similar to a matched random graph (same number of nodes and edges). This semi-quantitative definition leads to a categorical distinction ('small/not-small') rather than a quantitative, continuous grading of networks, and can lead to uncertainty about a network's small-world status. Moreover, systems described by small-world networks are often studied using an equivalent canonical network model-the Watts-Strogatz (WS) model. However, the process of establishing an equivalent WS model is imprecise and there is a pressing need to discover ways in which this equivalence may be quantified. Methodology/Principal Findings: We defined a precise measure of 'small-world-ness' S based on the trade off between high local clustering and short path length. A network is now deemed a 'small-world' if S. 1-an assertion which may be tested statistically. We then examined the behavior of S on a large data-set of real-world systems. We found that all these systems were linked by a linear relationship between their S values and the network size n. Moreover, we show a method for assigning a unique Watts-Strogatz (WS) model to any real-world network, and show analytically that the WS models associated with our sample of networks also show linearity between S and n. Linearity between S and n is not, however, inevitable, and neither is S maximal for an arbitrary network of given size. Linearity may, however, be explained by a common limiting growth process. Conclusions/Significance: We have shown how the notion of a small-world network may be quantified. Several key properties of the metric are described and the use of WS canonical models is placed on a more secure footing

The University of Manchester - Institutional Repository

White Rose Research Online

Differential Dynamic Properties of Scleroderma Fibroblasts in Response to Perturbation of Environmental Stimuli

Author: A Dempster
A Kremling
A Kremling
A Werhli
A-L Barabási
AR Cabral
B Binder
C Rangel
DR Vinson
EC LeRoy
EC LeRoy
EO Voit
Frank C. Arnett
G Arcangeli
H de Jong
Hao Xiong
HE Assmus
HN Claman
I Nachman
J Herrero
J Stelling
JE Gerriets
K Ogata
KA Kim
L Chen
M Aldana
M Jinnin
M Xiong
MJ Beal
Momiao Xiong
N Dojer
NC Martins
O Wolkenhauer
O Wolkenhauer
R Derynck
R Steuer
Raya Khanin
RJ Prill
RJ Prill
S Gibson
S Kauffman
S Martin
SA Jimenez
T Akutsu
T Kailath
XD Zhou
Xiaodong Zhou
Xinjian Guo
Y Pan
Y Wei
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Diseases are believed to arise from dysregulation of biological systems (pathways) perturbed by environmental triggers. Biological systems as a whole are not just the sum of their components, rather ever-changing, complex and dynamic systems over time in response to internal and external perturbation. In the past, biologists have mainly focused on studying either functions of isolated genes or steady-states of small biological pathways. However, it is systems dynamics that play an essential role in giving rise to cellular function/dysfunction which cause diseases, such as growth, differentiation, division and apoptosis. Biological phenomena of the entire organism are not only determined by steady-state characteristics of the biological systems, but also by intrinsic dynamic properties of biological systems, including stability, transient-response, and controllability, which determine how the systems maintain their functions and performance under a broad range of random internal and external perturbations. As a proof of principle, we examine signal transduction pathways and genetic regulatory pathways as biological systems. We employ widely used state-space equations in systems science to model biological systems, and use expectation-maximization (EM) algorithms and Kalman filter to estimate the parameters in the models. We apply the developed state-space models to human fibroblasts obtained from the autoimmune fibrosing disease, scleroderma, and then perform dynamic analysis of partial TGF-β pathway in both normal and scleroderma fibroblasts stimulated by silica. We find that TGF-β pathway under perturbation of silica shows significant differences in dynamic properties between normal and scleroderma fibroblasts. Our findings may open a new avenue in exploring the functions of cells and mechanism operative in disease development

DigitalCommons@The Texas Medical Center

Texas A&M Repository

Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge

Author: A de la Fuente
A Roses
AA Alizadeh
AD Weston
BJ Chen
Bonnie Berger
EE Schadt
EE Schadt
George Tucker
GM Furnival
H Zou
I Ruczinski
J Friedman
L Zhou
LJ van 't Veer
M West
Mark Isalan
MV Rockman
Po-Ru Loh
R Tibshirani
RB Brem
RJ Prill
TR Golub
V Emilsson
Y Benjamini
Y Chen
Z Kutalik
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

A major goal of large-scale genomics projects is to enable the use of data from high-throughput experimental methods to predict complex phenotypes such as disease susceptibility. The DREAM5 Systems Genetics B Challenge solicited algorithms to predict soybean plant resistance to the pathogen Phytophthora sojae from training sets including phenotype, genotype, and gene expression data. The challenge test set was divided into three subcategories, one requiring prediction based on only genotype data, another on only gene expression data, and the third on both genotype and gene expression data. Here we present our approach, primarily using regularized regression, which received the best-performer award for subchallenge B2 (gene expression only). We found that despite the availability of 941 genotype markers and 28,395 gene expression features, optimal models determined by cross-validation experiments typically used fewer than ten predictors, underscoring the importance of strong regularization in noisy datasets with far more features than samples. We also present substantial analysis of the training and test setup of the challenge, identifying high variance in performance on the gold standard test sets.National Science Foundation (U.S.). Graduate Research Fellowship ProgramNational Defense Science and Engineering Graduate Fellowshi

DSpace@MIT

Aberdeen University Research

Hsp90 orchestrates transcriptional regulation by Hsf1 and cell wall remodelling by MAPK signalling during thermal adaptation in a pathogenic yeast

Author: A Ali
A Mitchell
A Plaine
A Roetzer
A Roetzer
A Walther
A Winkler
AA Duina
AE Piispanen
Alex Andrianopoulos
Alistair J. P. Brown
AP Gasch
B Eisman
B Enjalbert
B Enjalbert
B Enjalbert
BK Jakobsen
C Csank
C Giardina
C Queitsch
C San Jose
CA Munro
Carol Munro
CH de Dios
CJ Nobile
D Chen
DA Davis
DA Smith
DF Jarosz
DF Jarosz
DF Smith
DM Arana
DR Wysong
E Kussell
E Leberer
E Nikolaou
E Roman
EA Craig
EA Elion
ET Burt
F Estruch
F Lamoth
F Navarro-Garcia
F Navarro-Garcia
F Navarro-García
F Sherman
FA Guhad
G Degols
G Wiederrecht
G Wiederrecht
GJ Gallo
H Garreau
H Lavoie
HC Causton
I Tirosh
IV Ene
J Chen
J Zou
JL Brewster
JR Blankenship
K-H Chen
LA Walker
LA Walker
LE Cowen
Leah E. Cowen
Louise Walker
M Ramsdale
M Taipale
MD Leach
MD Leach
MD Leach
MD Leach
ME Feder
Michelle D. Leach
MM Bradford
N Shulga
P Gónzalez-Párraga
P Hawle
PD Cantero
PK Sorger
PK Sorger
R Alonso-Monge
R Alonso-Monge
R Diez-Orejas
RA Monge
RB Wilson
RS Hegde
RS Shapiro
S Bates
S Diezmann
S Kuge
S Nicholls
S Nicholls
S Nicholls
SD Singh
SH Millson
SK Prill
SL LaFayette
SL Rutherford
SM Noble
SM O'Rourke
SM Roe
Susan Budge
TT Wimalasena
VM Bruno
VM Bruno
WA Fonzi
WM Toone
X Zhang
Y Guo
Y Kamada
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/12/2012
Field of study

Acknowledgments We thank Rebecca Shapiro for creating CaLC1819, CaLC1855 and CaLC1875, Gillian Milne for help with EM, Aaron Mitchell for generously providing the transposon insertion mutant library, Jesus Pla for generously providing the hog1 hst7 mutant, and Cathy Collins for technical assistance.Peer reviewedPublisher PD

FigShare

Identifying Biological Network Structure, Predicting Network Behavior, and Classifying Network State With High Dimensional Model Representation (HDMR)

This work presents an adapted Random Sampling - High Dimensional Model Representation (RS-HDMR) algorithm for synergistically addressing three key problems in network biology: (1) identifying the structure of biological networks from multivariate data, (2) predicting network response under previously unsampled conditions, and (3) inferring experimental perturbations based on the observed network state. RS-HDMR is a multivariate regression method that decomposes network interactions into a hierarchy of non-linear component functions. Sensitivity analysis based on these functions provides a clear physical and statistical interpretation of the underlying network structure. The advantages of RS-HDMR include efficient extraction of nonlinear and cooperative network relationships without resorting to discretization, prediction of network behavior without mechanistic modeling, robustness to data noise, and favorable scalability of the sampling requirement with respect to network size. As a proof-of-principle study, RS-HDMR was applied to experimental data measuring the single-cell response of a protein-protein signaling network to various experimental perturbations. A comparison to network structure identified in the literature and through other inference methods, including Bayesian and mutual-information based algorithms, suggests that RS-HDMR can successfully reveal a network structure with a low false positive rate while still capturing non-linear and cooperative interactions. RS-HDMR identified several higher-order network interactions that correspond to known feedback regulations among multiple network species and that were unidentified by other network inference methods. Furthermore, RS-HDMR has a better ability to predict network response under unsampled conditions in this application than the best statistical inference algorithm presented in the recent DREAM3 signaling-prediction competition. RS-HDMR can discern and predict differences in network state that arise from sources ranging from intrinsic cell-cell variability to altered experimental conditions, such as when drug perturbations are introduced. This ability ultimately allows RS-HDMR to accurately classify the experimental conditions of a given sample based on its observed network state

Princeton University Open Access Repository

DSpace@MIT

FigShare

Time lagged information theoretic approaches to the reverse engineering of gene regulatory networks

Author: A Rao
AA Margolin
Chaoyang Zhang
D Heckerman
D Marbach
Edward J Perkins
I Schmulevich
I Shmulevich
J Dougherty
Junker H Björn
K Murphy
L Chen
L Shoudan
M Hecker
M Kanehisa
M Kanehisa
M Kanehisa
M Zou
MH Hansen
Ping Gong
Preetam Ghosh
PT Spellman
RJ Prill
SA Kauffman
T Akutsu
T Chen
TM Cover
V Chaitankar
Vijender Chaitankar
W Zhao
W Zhao
X Zhang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: A number of models and algorithms have been proposed in the past for gene regulatory network (GRN) inference; however, none of them address the effects of the size of time-series microarray expression data in terms of the number of time-points. In this paper, we study this problem by analyzing the behaviour of three algorithms based on information theory and dynamic Bayesian network (DBN) models. These algorithms were implemented on different sizes of data generated by synthetic networks. Experiments show that the inference accuracy of these algorithms reaches a saturation point after a specific data size brought about by a saturation in the pair-wise mutual information (MI) metric; hence there is a theoretical limit on the inference accuracy of information theory based schemes that depends on the number of time points of micro-array data used to infer GRNs. This illustrates the fact that MI might not be the best metric to use for GRN inference algorithms. To circumvent the limitations of the MI metric, we introduce a new method of computing time lags between any pair of genes and present the pair-wise time lagged Mutual Information (TLMI) and time lagged Conditional Mutual Information (TLCMI) metrics. Next we use these new metrics to propose novel GRN inference schemes which provides higher inference accuracy based on the precision and recall parameters. Results: It was observed that beyond a certain number of time-points (i.e., a specific size) of micro-array data, the performance of the algorithms measured in terms of the recall-to-precision ratio saturated due to the saturation in the calculated pair-wise MI metric with increasing data size. The proposed algorithms were compared to existing approaches on four different biological networks. The resulting networks were evaluated based on the benchmark precision and recall metrics and the results favour our approach. Conclusions: To alleviate the effects of data size on information theory based GRN inference algorithms, novel time lag based information theoretic approaches to infer gene regulatory networks have been proposed. The results show that the time lags of regulatory effects between any pair of genes play an important role in GRN inference schemes

Aquila Digital Community

Springer - Publisher Connector

A Relative Variation-Based Method to Unraveling Gene Regulatory Networks

Author: A Greenfield
A Madar
A Pinna
AA Margolin
BE Perrin
D Marbach
D Marbach
F Ferrazzi
Frank Emmert-Streib
H de Jong
I Cantone
J Schäfer
JJ Faith
JJ Rice
JM Lattin
KM Zhou
KY Yip
L Ljung
PE Meyer
R Opgen-Rhein
RJ Prill
S Martin
T Akutsu
T Schaffter
T Zhou
Tong Zhou
TS Gardner
Y Wang
Yali Wang
ZZ Hu
Publication venue: Public Library of Science
Publication date: 20/02/2012
Field of study

Gene regulatory network (GRN) reconstruction is essential in understanding the functioning and pathology of a biological system. Extensive models and algorithms have been developed to unravel a GRN. The DREAM project aims to clarify both advantages and disadvantages of these methods from an application viewpoint. An interesting yet surprising observation is that compared with complicated methods like those based on nonlinear differential equations, etc., methods based on a simple statistics, such as the so-called -score, usually perform better. A fundamental problem with the -score, however, is that direct and indirect regulations can not be easily distinguished. To overcome this drawback, a relative expression level variation (RELV) based GRN inference algorithm is suggested in this paper, which consists of three major steps. Firstly, on the basis of wild type and single gene knockout/knockdown experimental data, the magnitude of RELV of a gene is estimated. Secondly, probability for the existence of a direct regulation from a perturbed gene to a measured gene is estimated, which is further utilized to estimate whether a gene can be regulated by other genes. Finally, the normalized RELVs are modified to make genes with an estimated zero in-degree have smaller RELVs in magnitude than the other genes, which is used afterwards in queuing possibilities of the existence of direct regulations among genes and therefore leads to an estimate on the GRN topology. This method can in principle avoid the so-called cascade errors under certain situations. Computational results with the Size 100 sub-challenges of DREAM3 and DREAM4 show that, compared with the -score based method, prediction performances can be substantially improved, especially the AUPR specification. Moreover, it can even outperform the best team of both DREAM3 and DREAM4. Furthermore, the high precision of the obtained most reliable predictions shows that the suggested algorithm may be very helpful in guiding biological experiment designs