Search CORE

416 research outputs found

Positional dependence of transcriptional inhibition by DNA torsional stress in yeast chromosomes

Author: Benjamin Piña
Esposito F
Freeman LA
Gasch AP
Joaquim Roca
Ricky S Joshi
Wang JC
Publication venue: Nature Publishing Group
Publication date: 07/01/2010
Field of study

How DNA helical tension is constrained along the linear chromosomes of eukaryotic cells is poorly understood. In this study, we induced the accumulation of DNA (+) helical tension in Saccharomyces cerevisiae cells and examined how DNA transcription was affected along yeast chromosomes. The results revealed that, whereas the overwinding of DNA produced a general impairment of transcription initiation, genes situated at <100 kb from the chromosomal ends gradually escaped from the transcription stall. This novel positional effect seemed to be a simple function of the gene distance to the telomere: It occurred evenly in all 32 chromosome extremities and was independent of the atypical structure and transcription activity of subtelomeric chromatin. These results suggest that DNA helical tension dissipates at chromosomal ends and, therefore, provides a functional indication that yeast chromosome extremities are topologically open. The gradual escape from the transcription stall along the chromosomal flanks also indicates that friction restrictions to DNA twist diffusion, rather than tight topological boundaries, might suffice to confine DNA helical tension along eukaryotic chromatin

Crossref

PubMed Central

Digital.CSIC

Effect of microarray data heterogeneity on regulatory gene module discovery

Author: A Tanay
AJ Saldanha
Alok Mishra
AP Gasch
Duncan Gillies
E Segal
L Hubert
P Spellman
Z Bar-Joseph
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Network integration meets network dynamics

Author: AP Gasch
G Karlebach
JJ Tyson
MW Covert
MW Covert
PV Missiuro
T Shlomi
Teresa M Przytycka
TM Przytycka
Y-A Kim
YC Wang
Yoo-Ah Kim
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Molecular interaction networks provide a window on the workings of the cell. However, combining various types of networks into one coherent large-scale dynamic model remains a formidable challenge. A recent paper in BMC Systems Biology describes a promising step in this direction

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Multi-membership gene regulation in pathway based microarray analysis

Author: A Goesmann
AB Khodursky
Annette M Payne
AP Gasch
D Cavalieri
D Greenbaum
E Panteris
FA Kolpakov
FR Blattner
G Russo
I Rojas
JH Holland
JL DeRisi
KD Dahlquist
L Stryer
M Kanehisa
M Quadroni
M Schena
P Grosu
P Shannon
PC Champe
PD Karp
R Hamming
RK Brouwer
S Kirkpatrick
S Pavlidis
S Swift
SJ Russell
Stelios P Pavlidis
Stephen M Swift
T Toyoda
Z Michalewicz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

This article is available through the Brunel Open Access Publishing Fund. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background: Gene expression analysis has been intensively researched for more than a decade. Recently, there has been elevated interest in the integration of microarray data analysis with other types of biological knowledge in a holistic analytical approach. We propose a methodology that can be facilitated for pathway based microarray data analysis, based on the observation that a substantial proportion of genes present in biochemical pathway databases are members of a number of distinct pathways. Our methodology aims towards establishing the state of individual pathways, by identifying those truly affected by the experimental conditions based on the behaviour of such genes. For that purpose it considers all the pathways in which a gene participates and the general census of gene expression per pathway. Results: We utilise hill climbing, simulated annealing and a genetic algorithm to analyse the consistency of the produced results, through the application of fuzzy adjusted rand indexes and hamming distance. All algorithms produce highly consistent genes to pathways allocations, revealing the contribution of genes to pathway functionality, in agreement with current pathway state visualisation techniques, with the simulated annealing search proving slightly superior in terms of efficiency. Conclusions: We show that the expression values of genes, which are members of a number of biochemical pathways or modules, are the net effect of the contribution of each gene to these biochemical processes. We show that by manipulating the pathway and module contribution of such genes to follow underlying trends we can interpret microarray results centred on the behaviour of these genes.The work was sponsored by the studentship scheme of the School of Information Systems, Computing and Mathematics, Brunel Universit

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Brunel University Research Archive

A classification-based framework for predicting and analyzing gene regulatory response

Author: AJ Hartemink
Anshul Kundaje
AP Gasch
AP Gasch
Chris H Wiggins
Christina Leslie
CI Holmberg
D Pe'er
D Pe'er
D Pollard
DC Raitt
E Ramil
E Segal
E Segal
ER Gansner
HJ Bussemaker
I Ota
I Pedruzzi
J Ihmels
JD Hughes
JT Lin
M Middendorf
M Middendorf
M Middendorf
MA Beer
Manuel Middendorf
Mihir Shah
P Zarzov
RE Schapire
TI Lee
VK Vyas
W Hoeffding
Y Pilpel
Yoav Freund
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: We have recently introduced a predictive framework for studying gene transcriptional regulation in simpler organisms using a novel supervised learning algorithm called GeneClass. GeneClass is motivated by the hypothesis that in model organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular microarray experiment based on the presence of binding site subsequences ("motifs") in the gene's regulatory region and the expression levels of regulators such as transcription factors in the experiment ("parents"). GeneClass formulates the learning task as a classification problem — predicting +1 and -1 labels corresponding to up- and down-regulation beyond the levels of biological and measurement noise in microarray measurements. Using the Adaboost algorithm, GeneClass learns a prediction function in the form of an alternating decision tree, a margin-based generalization of a decision tree. METHODS: In the current work, we introduce a new, robust version of the GeneClass algorithm that increases stability and computational efficiency, yielding a more scalable and reliable predictive model. The improved stability of the prediction tree enables us to introduce a detailed post-processing framework for biological interpretation, including individual and group target gene analysis to reveal condition-specific regulation programs and to suggest signaling pathways. Robust GeneClass uses a novel stabilized variant of boosting that allows a set of correlated features, rather than single features, to be included at nodes of the tree; in this way, biologically important features that are correlated with the single best feature are retained rather than decorrelated and lost in the next round of boosting. Other computational developments include fast matrix computation of the loss function for all features, allowing scalability to large datasets, and the use of abstaining weak rules, which results in a more shallow and interpretable tree. We also show how to incorporate genome-wide protein-DNA binding data from ChIP chip experiments into the GeneClass algorithm, and we use an improved noise model for gene expression data. RESULTS: Using the improved scalability of Robust GeneClass, we present larger scale experiments on a yeast environmental stress dataset, training and testing on all genes and using a comprehensive set of potential regulators. We demonstrate the improved stability of the features in the learned prediction tree, and we show the utility of the post-processing framework by analyzing two groups of genes in yeast — the protein chaperones and a set of putative targets of the Nrg1 and Nrg2 transcription factors — and suggesting novel hypotheses about their transcriptional and post-transcriptional regulation. Detailed results and Robust GeneClass source code is available for download from

Crossref

Springer - Publisher Connector

Columbia University Academic Commons

PubMed Central

Simple integrative preprocessing preserves what is shared in data sources

Author: Abhishek Tripathi
AP Gasch
Arto Klami
G Dennis
GR Lanckriet
H Hotelling
HC Causton
J Kettenring
J Nikkilä
JA Berger
JDR Farquhar
M Girolami
ME Ross
PT Spellman
Samuel Kaski
Y Yamanishi
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Bioinformatics data analysis toolbox needs general-purpose, fast and easily interpretable preprocessing tools that perform data integration during exploratory data analysis. Our focus is on vector-valued data sources, each consisting of measurements of the same entity but on different variables, and on tasks where source-specific variation is considered noisy or not interesting. Principal components analysis of all sources combined together is an obvious choice if it is not important to distinguish between data source-specific and shared variation. Canonical Correlation Analysis (CCA) focuses on mutual dependencies and discards source-specific "noise" but it produces a separate set of components for each source. Results It turns out that components given by CCA can be combined easily to produce a linear and hence fast and easily interpretable feature extraction method. The method fuses together several sources, such that the properties they share are preserved. Source-specific variation is discarded as uninteresting. We give the details and implement them in a software tool. The method is demonstrated on gene expression measurements in three case studies: classification of cell cycle regulated genes in yeast, identification of differentially expressed genes in leukemia, and defining stress response in yeast. The software package is available at <url>http://www.cis.hut.fi/projects/mi/software/drCCA/</url>. Conclusion We introduced a method for the task of data fusion for exploratory data analysis, when statistical dependencies between the sources and not within a source are interesting. The method uses canonical correlation analysis in a new way for dimensionality reduction, and inherits its good properties of being simple, fast, and easily interpretable as a linear projection.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Variations in Stress Sensitivity and Genomic Expression in Diverse S. cerevisiae Isolates

Author: A Ferrando
A Gabriel
A Goffeau
AD Basehoar
AK Pahlman
AM Deutschbauer
AP Gasch
AP Gasch
AP Gasch
AP Gasch
Audrey P. Gasch
B Dunn
CM Wilke
CR Landry
CR Landry
CT Harbison
D Cavalieri
D Charlesworth
D Wang
Daniel J. Kvitek
DB Berry
DM Ruderfer
E Aa
E Besnard
EA Winzeler
EA Winzeler
EJ Foss
EM Torres
EO Perlstein
EO Perlstein
F Ronquist
F Sherman
G Ben-Ari
G Giaever
G Liti
G Yvert
Gil McVean
GK Smyth
H Sinha
HS Kim
I Tirosh
J Ronald
J Ronald
J Schacherer
JC Fay
JC Fay
Jessica L. Will
JH McCusker
JJ Infante
JK McKay
JL Legras
JM Cherry
JM Raser
JP Gerke
JP Townsend
JR Johnston
JR Pollack
K Mitsui
K Spitze
K Tamura
KM Brown
LM Steinmetz
M Gaisne
M Kellis
M Primig
MB Eisen
MD Robinson
OR Homann
P Marullo
PD Sniegowski
R Lyne
RB Brem
RB Brem
RB Brem
RK Mortimer
S Fogel
S Fogel
S Nogami
T Gatbonton
TR Hughes
U Bond
W Wei
WJ Blake
XH Hu
Y Benjamini
Z Gu
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Interactions between an organism and its environment can significantly influence phenotypic evolution. A first step toward understanding this process is to characterize phenotypic diversity within and between populations. We explored the phenotypic variation in stress sensitivity and genomic expression in a large panel of Saccharomyces strains collected from diverse environments. We measured the sensitivity of 52 strains to 14 environmental conditions, compared genomic expression in 18 strains, and identified gene copy-number variations in six of these isolates. Our results demonstrate a large degree of phenotypic variation in stress sensitivity and gene expression. Analysis of these datasets reveals relationships between strains from similar niches, suggests common and unique features of yeast habitats, and implicates genes whose variable expression is linked to stress resistance. Using a simple metric to suggest cases of selection, we found that strains collected from oak exudates are phenotypically more similar than expected based on their genetic diversity, while sake and vineyard isolates display more diverse phenotypes than expected under a neutral model. We also show that the laboratory strain S288c is phenotypically distinct from all of the other strains studied here, in terms of stress sensitivity, gene expression, Ty copy number, mitochondrial content, and gene-dosage control. These results highlight the value of understanding the genetic basis of phenotypic variation and raise caution about using laboratory strains for comparative genomics

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A specialized learner for inferring structured cis-regulatory modules

Author: A Agresti
AJ Miller
AP Gasch
D Karolchik
E Segal
GE Crooks
Keith Noto
MA Beer
Mark Craven
N Rajewsky
PA Devijver
Q Zhou
S Aerts
S Keleş
S Sinha
TI Lee
TL Bailey
TM Mitchell
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The process of transcription is controlled by systems of transcription factors, which bind to specific patterns of binding sites in the transcriptional control regions of genes, called cis-regulatory modules (CRMs). We present an expressive and easily comprehensible CRM representation which is capable of capturing several aspects of a CRM's structure and distinguishing between DNA sequences which do or do not contain it. We also present a learning algorithm tailored for this domain, and a novel method to avoid overfitting by controlling the expressivity of the model. RESULTS: We are able to find statistically significant CRMs more often then a current state-of-the-art approach on the same data sets. We also show experimentally that each aspect of our expressive CRM model space makes a positive contribution to the learned models on yeast and fly data. CONCLUSION: Structural aspects are an important part of CRMs, both in terms of interpreting them biologically and learning them accurately. Source code for our algorithm is available at

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Validating module network learning algorithms using simulated data

Author: A Battle
A Butte
AA Petti
AJ Butte
Anagha Joshi
AP Gasch
CE Shannon
CT Harbison
D Pe'er
D Pe'er
E Segal
E Segal
E Segal
Eric Bonnet
HW Ma
J Kasturi
J Sinkkonen
K Basso
K Lemmens
KA Heller
Kathleen Marchal
Koenraad Van Leemput
LH Hartwell
M Ashburner
MA Beer
Martin Kuiper
MJL de Hoon
N Friedman
N Friedman
NM Luscombe
Piet van Remortel
S Maere
Steven Maere
T Ideker
T Van den Bulcke
T Van den Bulcke
Tim Van den Bulcke
Tom Michoel
X Xu
Y Garten
Yvan Saeys
Yves Van de Peer
Z Bar-Joseph
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on the inference performance. Overall, application of Genomica and LeMoNe to simulated data sets gave comparable results. However, LeMoNe offers some advantages, one of them being that the learning process is considerably faster for larger data sets. Additionally, we show that the location of the regulators in the LeMoNe regulation programs and their conditional entropy may be used to prioritize regulators for functional validation, and that the combination of the bottom-up clustering strategy with the conditional entropy-based assignment of regulators improves the handling of missing or hidden regulators.Comment: 13 pages, 6 figures + 2 pages, 2 figures supplementary informatio

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

Edinburgh Research Explorer

Archivsystem Ask23

HAL-CEA

UNCLES: Method for the identification of genes differentially consistently co-expressed in a specific subset of datasets

Author: A Huber
A Prelić
AA Shabalin
AP Gasch
Asoke K. Nandi
B Abu-Jamous
B Abu-Jamous
Basel Abu-Jamous
C Koch
CH Wade
CT Harbison
D Dikicioglu
D Liu
DA Orlando
David J. Roberts
IS Dhillon
J Bahler
J Yang
JK Choi
JK Limb
JM Pena
JM Stuart
KC Li
KC Li
KY Yeung
KY Yeung
L Lazzeroni
LP Zhao
MB Eisen
P Cahan
P Grandi
PC Roberts
PT Spellman
R Fa
R Lletı́a
R Nilsson
RJ Cho
RM Piro
Rui Fa
S Chu
S Fujii
S Sharma
S Vega-Pons
T Hayata
T Murali
T Pramila
TC Fleischer
VA Gennarino
X Liu
Y Cheng
Y Kluger
Z Tao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/06/2015
Field of study

Background: Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Results: Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. Conclusions: The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies.The National Institute for Health Research (NIHR) under its Programme Grants for Applied Research Programme (Grant Reference Number RP-PG-0310-1004)

Jyväskylä University Digital Archive

Crossref

Springer - Publisher Connector

PubMed Central

Brunel University Research Archive