Search CORE

5,402 research outputs found

Regulatory Motif Finding by Logic Regression

Author: Keles Sunduz
van der Laan Mark J.
Vulpe Chris
Publication venue: Collection of Biostatistics Research Archive
Publication date: 30/03/2004
Field of study

Multiple transcription factors coordinately control transcriptional regulation of genes in eukaryotes. Although multiple computational methods consider the identification of individual transcription factor binding sites (TFBSs), very few focus on the interactions between these sites. We consider finding transcription factor binding sites and their context specific interactions using microarray gene expression data. We devise a hybrid approach called LogicMotif composed of a TFBS identification method combined with the new regression methodology logic regression of Ruczinski et al. (2003). LogicMotif has two steps: First potential binding sites are identified from transcription control regions of genes of interest. Various available methods can be used in this first step when the genes of interest can be divided into groups such as up and down regulated. For this step, we also develop a simple univariate regression and extension method MFURE to extract candidate TFBSs from a large number of genes in the availability of microarray gene expression data. MFURE provides an alternative method for this step when partitioning of the genes into disjoint groups is not preferred. This first step aims to identify individual sites within gene groups of interest or sites that are correlated with the gene expression outcome. In the second step, logic regression is used to build a predictive model of outcome of interest (either gene expression or up and down regulation) using these potential sites. This two-fold approach creates a rich diverse set of potential binding sites in the first step and builds regression or classification models in the second step using logic regression that is particularly good at identifying complex interactions. LogicMotif is applied to two publicly available data sets. A genome-wide gene expression data set of Saccharomyces cerevisiae is used for validation. The regression models obtained are interpretable and the biological implications are in agreement with the known resuts. This analysis suggests that LogicMotif provides biologically more reasonable regression models than previous analysis of this data set with standard linear regression methods. Another data set of Saccharomyces cerevisiae illustrates the use of LogicMotif in classification questions by building a model that discriminates between up and down regulated genes in iron copper deficiency. LogicMotif identified an inductive and two repressor motifs in this data set. The inductive motif matches the binding site of the transcription factor Aft1p that has a key role in regulation of the uptake process. One of the novel repressor sites is highly present in transcription control regions of FeS genes. This site could represent a TFBS for an unknown transcription factor involved in repression of genes encoding FeS proteins in iron deficiency. We established the stability of the method to the type of outcome variable by using both continuous and binary outcome variables for this data set. Our results indicate that logic regression used in combination with cluster/group operating binding site identification methods or with our proposed method MFURE is a powerful and flexible alternative to linear regression based motif finding methods

Collection Of Biostatistics Research Archive

Validating module network learning algorithms using simulated data

Author: A Battle
A Butte
AA Petti
AJ Butte
Anagha Joshi
AP Gasch
CE Shannon
CT Harbison
D Pe'er
D Pe'er
E Segal
E Segal
E Segal
Eric Bonnet
HW Ma
J Kasturi
J Sinkkonen
K Basso
K Lemmens
KA Heller
Kathleen Marchal
Koenraad Van Leemput
LH Hartwell
M Ashburner
MA Beer
Martin Kuiper
MJL de Hoon
N Friedman
N Friedman
NM Luscombe
Piet van Remortel
S Maere
Steven Maere
T Ideker
T Van den Bulcke
T Van den Bulcke
Tim Van den Bulcke
Tom Michoel
X Xu
Y Garten
Yvan Saeys
Yves Van de Peer
Z Bar-Joseph
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on the inference performance. Overall, application of Genomica and LeMoNe to simulated data sets gave comparable results. However, LeMoNe offers some advantages, one of them being that the learning process is considerably faster for larger data sets. Additionally, we show that the location of the regulators in the LeMoNe regulation programs and their conditional entropy may be used to prioritize regulators for functional validation, and that the combination of the bottom-up clustering strategy with the conditional entropy-based assignment of regulators improves the handling of missing or hidden regulators.Comment: 13 pages, 6 figures + 2 pages, 2 figures supplementary informatio

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

Edinburgh Research Explorer

The EM Algorithm and the Rise of Computational Biology

Author: Citable Link
Jun S. Liu
Xiaodan Fan
Yuan Yuan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Recommended from our members

Meta-analysis of massively parallel reporter assays enables prediction of regulatory function across cell types.

Author: Ahituv Nadav
Kreimer Anat
Yan Zhongxia
Yosef Nir
Publication venue: eScholarship, University of California
Publication date: 01/09/2019
Field of study

Deciphering the potential of noncoding loci to influence gene regulation has been the subject of intense research, with important implications in understanding genetic underpinnings of human diseases. Massively parallel reporter assays (MPRAs) can measure regulatory activity of thousands of DNA sequences and their variants in a single experiment. With increasing number of publically available MPRA data sets, one can now develop data-driven models which, given a DNA sequence, predict its regulatory activity. Here, we performed a comprehensive meta-analysis of several MPRA data sets in a variety of cellular contexts. We first applied an ensemble of methods to predict MPRA output in each context and observed that the most predictive features are consistent across data sets. We then demonstrate that predictive models trained in one cellular context can be used to predict MPRA output in another, with loss of accuracy attributed to cell-type-specific features. Finally, we show that our approach achieves top performance in the Fifth Critical Assessment of Genome Interpretation "Regulation Saturation" Challenge for predicting effects of single-nucleotide variants. Overall, our analysis provides insights into how MPRA data can be leveraged to highlight functional regulatory regions throughout the genome and can guide effective design of future experiments by better prioritizing regions of interest

eScholarship - University of California

Recommended from our members

FAM129B, an antioxidative protein, reduces chemosensitivity by competing with Nrf2 for Keap1 binding.

Author: Cheng Jing-Yan
Cheng Kai-Chun
Hsu Huan-Ming
Liang Yuh-Jin
Lin Ruey-Jen
Wang Sheng-Hung
Wu Jen-Chine
Yu Alice L
Yu John
Yu Jyh-Cherng
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

BackgroundThe transcription factor Nrf2 is a master regulator of antioxidant response. While Nrf2 activation may counter increasing oxidative stress in aging, its activation in cancer can promote cancer progression and metastasis, and confer resistance to chemotherapy and radiotherapy. Thus, Nrf2 has been considered as a key pharmacological target. Unfortunately, there are no specific Nrf2 inhibitors for therapeutic application. Moreover, high Nrf2 activity in many tumors without Keap1 or Nrf2 mutations suggests that alternative mechanisms of Nrf2 regulation exist.MethodsInteraction of FAM129B with Keap1 is demonstrated by immunofluorescence, colocalization, co-immunoprecipitation and mammalian two-hybrid assay. Antioxidative function of FAM129B is analyzed by measuring ROS levels with DCF/flow cytometry, Nrf2 activation using luciferase reporter assay and determination of downstream gene expression by qPCR and wester blotting. Impact of FAM129B on in vivo chemosensitivity is examined in mice bearing breast and colon cancer xenografts. The clinical relevance of FAM129B is assessed by qPCR in breast cancer samples and data mining of publicly available databases.FindingsWe have demonstrated that FAM129B in cancer promotes Nrf2 activity by reducing its ubiquitination through competition with Nrf2 for Keap1 binding via its DLG and ETGE motifs. In addition, FAM129B reduces chemosensitivity by augmenting Nrf2 antioxidative signaling and confers poor prognosis in breast and lung cancer.InterpretationThese findings demonstrate the important role of FAM129B in Nrf2 activation and antioxidative response, and identify FMA129B as a potential therapeutic target. FUND: The Chang Gung Medical Foundation (Taiwan) and the Ministry of Science and Technology (Taiwan)

eScholarship - University of California

Learning ‘‘graph-mer’’ Motifs that Predict Gene Expression Trajectories in Development

Author: Leslie Christina
Li Xuejing
Panea Casandra
Reinke Valerie
Wiggins Chris H.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2010
Field of study

A key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profiles and then searching for overrepresented motifs in the promoter sequences of genes in a cluster. However, genes with similar expression profiles may be controlled by distinct regulatory programs. Moreover, if many gene expression profiles in a data set are highly correlated, as in the case of whole organism developmental time series, it may be difficult to resolve fine-grained clusters in the first place. We present a predictive framework for modeling the natural flow of information, from promoter sequence to expression, to learn cis regulatory motifs and characterize gene expression patterns in developmental time courses. We introduce a cluster-free algorithm based on a graph-regularized version of partial least squares (PLS) regression to learn sequence patterns—represented by graphs of k-mers, or “graph-mers”—that predict gene expression trajectories. Applying the approach to wildtype germline development in Caenorhabditis elegans, we found that the first and second latent PLS factors mapped to expression profiles for oocyte and sperm genes, respectively. We extracted both known and novel motifs from the graph-mers associated to these germline-specific patterns, including novel CG-rich motifs specific to oocyte genes. We found evidence supporting the functional relevance of these putative regulatory elements through analysis of positional bias, motif conservation and in situ gene expression. This study demonstrates that our regression model can learn biologically meaningful latent structure and identify potentially functional motifs from subtle developmental time course expression data

CiteSeerX

Columbia University Academic Commons

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Identification of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests

Author: A Boorsma
A Bureau
A Gasch
A Kundaje
A Tanay
A Thalamuthu
B Futcher
C Koch
CJ McInerny
CT Harbison
D Das
D Das
E Segal
Eric P. Xing
H Althoefer
HM Bussemaker
HW Mewes
J Bähler
J Ernst
JD Hughes
JR Quinlan
JR Warner
JS Chang
KJ Archer
KL Lunetta
L Breiman
L Breiman
L Breiman
L Kaufman
M Kato
M Segal
Mark R. Segal
MB Eisen
N Zhang
P Sudarsanam
PT Spellman
R Diaz-Uriarte
R Tibshirani
RAM de Bruin
RJ Cho
S Chu
S Dudoit
S Keles
S Tavazioe
SA Burchett
SA Raithatha
TM Phuong
U Schlecht
Y Benjamini
Y Pilpel
Yuanyuan Xiao
Publication venue: Public Library of Science
Publication date: 01/06/2009
Field of study

The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression measures. An analysis of the multivariate random forest output reveals complex regulatory networks, which consist of cohesive, condition-dependent regulatory cliques. Each regulatory clique features homogeneous gene expression profiles and common motifs or synergistic motif groups. We apply our method to several yeast physiological processes: cell cycle, sporulation, and various stress conditions. Our technique displays excellent performance with regard to identifying known regulatory motifs, including high order interactions. In addition, we present evidence of the existence of an alternative MCB-binding pathway, which we confirm using data from two independent cell cycle studies and two other physioloigical processes. Finally, we have uncovered elaborate transcription regulation refinement mechanisms involving PAC and mRRPE motifs that govern essential rRNA processing. These include intriguing instances of differing motif dosages and differing combinatorial motif control that promote regulatory specificity in rRNA metabolism under differing physiological processes

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Transcription factor binding site prediction with multivariate gene expression data

Author: Speed Terence P.
Wildermuth Mary C.
Zhang Nancy R.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

Multi-sample microarray experiments have become a standard experimental method for studying biological systems. A frequent goal in such studies is to unravel the regulatory relationships between genes. During the last few years, regression models have been proposed for the de novo discovery of cis-acting regulatory sequences using gene expression data. However, when applied to multi-sample experiments, existing regression based methods model each individual sample separately. To better capture the dynamic relationships in multi-sample microarray experiments, we propose a flexible method for the joint modeling of promoter sequence and multivariate expression data. In higher order eukaryotic genomes expression regulation usually involves combinatorial interaction between several transcription factors. Experiments have shown that spacing between transcription factor binding sites can significantly affect their strength in activating gene expression. We propose an adaptive model building procedure to capture such spacing dependent cis-acting regulatory modules. We apply our methods to the analysis of microarray time-course experiments in yeast and in Arabidopsis. These experiments exhibit very different dynamic temporal relationships. For both data sets, we have found all of the well-known cis-acting regulatory elements in the related context, as well as being able to predict novel elements.Comment: Published in at http://dx.doi.org/10.1214/10.1214/07-AOAS142 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

ScholarlyCommons@Penn

Quantitative evaluation and reversion analysis of the attractor landscapes of an intracellular regulatory network for colorectal cancer

Author: Dongkwan Shin
Kwang-Hyun Cho
Sea Choi
Yunseong Kim
Publication venue: Springer Nature
Publication date: 01/01/2017
Field of study

The molecular profiles of CMS cancer cells, statistical significance analysis of reversion targets, and synergistic effect analysis of every two nodes inhibition. (XLSX 67Â kb

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

FigShare