72 research outputs found
A Statistical Model to Determine Multiple Binding Sites of a Transcription Factor on DNA Using ChIP-seq Data
Protein-DNA interaction is vital to many biological processes in cells such as cell division, embryo development and regulating gene expression. Chromatin Immunoprecipitation followed by massively parallel sequencing (ChIP-seq) is a new technology that can reveal protein binding sites in genome with superior accuracy. Although many methods have been proposed to find binding sites for ChIP-seq data, they can find only one binding site within a short region of the genome. In this study we introduce a statistical model to identify multiple binding sites of a transcription factor within a short region of the genome using the ChIP-seq data. Mapped sequence reads from the ChIP-seq experiments are modeled as the sum of observations from unknown number of Poisson distributions. The rate parameters of these Poisson distributions are considered as a function of the underlying distribution of the tags that depends on the locations of the binding sites and their intensity parameters. For the parameter estimation of the model, two major approaches are discussed: one is a Bayesian method, the other, the EM algorithm. For the Bayesian method the reversible jump Markov chain Monte Carlo (RJMCMC) method is used for computation. An extensive simulation study was performed for the selection of proposal methods and priors in RJMCMC as well as for the comparison of model selection criteria in the EM algorithm. Real ChIP-seq datasets for transcription factors STAT1 and ZNF143 were used to demonstrate the performance of the proposed model. The results from the multiple binding sites model were compared with existing peak-calling programs
Recommended from our members
Establishment of Zygotic Transcription and Chromatin Organization in the Early <i>Drosophila</i> Embryo
During animal development, the first cell divisions of a fertilized egg are under maternal control. Zygotic transcription largely begins during the midblastula transition (MBT), but some genes may be transcribed earlier. How gene activation is accomplished is poorly understood. For example, it is unclear whether any pre-pattemed markers, e.g. paused Pol II or histone modifications, are present at genes prior to activation.
In this study, I systemically investigated the dynamics of Pol II recruitment, histone modifications, and nucleosome accessibility on tightly staged Drosophila embryos to understand the establishment of zygotic transcription and chromatin organization during early embryogenesis.
Supported by evidence from histological assays, I found that the chromatin initially is loosely packed and there are no pre-recruited general transcription factors or prepatterned histone modifications in the pre-MBT embryos for the global zygotic genome activation (ZGA). In addition, widespread Pol II pausing at developmental genes is established during the MBT for later activation while massive de novo Pol II recruitment occurs. Moreover, by comparing genes activated during MBT and ~110 genes strongly occupied by Pol II in the pre-MBT embryos, I found that the lack of Pol II pausing at pre-MBT genes correlates with strong core promoters that contain the TATA-box and binding motif of Zelda, a general activator recently identified for Drosophila ZGA.
Taken together, the function of core promoters might be an underappreciated mechanism for the general regulation of the de novo establishment of chromatin structure during early Drosophila embryogenesis. It is possible that this mechanism has evolved for adapting to the quick divisions in Drosophila early embryogenesis, and this may also be true for the ZGA of other vertebrate organisms with similar dividing patterns, e. g. zebrafish and Xenopus
Programming and reprogramming cellular identity
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Biology, 2008.Includes bibliographical references.Every cell in the human body contains the same genetic information, with few exceptions, yet each cell type enacts a distinct gene expression program to allow for highly specialized functions. These tightly controlled programs are the results of transcriptional regulation, by transcription factors and chromatin regulators, as well as post-transcriptional regulation, mediated in part by microRNAs (miRNAs). Additionally, cells must respond to external cues, and signal transduction pathways converge on gene regulatory machinery to shape cellular identity. The work presented here focuses on the mechanisms by which transcription factors, chromatin regulators, miRNAs and signal transduction pathways coordinately regulate two particular medically important gene expression programs: (1) the program that controls pluripotency in embryonic stem (ES) cells, giving these cells the capacity to differentiate into every adult cell type, and (2) the program that allows regulatory T (Treg) cells to prevent autoimmunity by suppressing the response of self-reactive conventional T cells. Genomic investigations of the core regulatory circuitry of each of these cells types presented here provide new insight into the genetics of pluripotency and autoimmunity, and suggest a strategy for reprogramming based on chemical manipulation of the cellular programs that control cell identity.by Alexander Marson.Ph.D
Recommended from our members
The Super Elongation Complex (SEC) in Development and Disease
Chromosomal translocations involving the mixed lineage leukemia (MLL) gene are associated with infant acute leukemia. There are a large number of translocation partners of MLL that share very little sequence similarities, yet their translocations into MLL result in the pathogenesis of leukemia. To define the molecular reason why these translocations result in leukemogenesis, I purified several of the commonly occurring MLL chimeras and identified a novel Super Elongation Complex (SEC) associated with all chimeras purified. SEC consists of the RNA Pol II elongation factors ELL1-3, P-TEFb, and several frequent MLL-translocation partners. SEC is one of the most active P-TEFb complexes and is required for the proper expression of MLL chimera target genes and the oncogene, MYC, suggesting that the regulation of transcription elongation checkpoint control (TECC) by SEC could play essential roles in leukemia.
Paused Pol II has been proposed to be associated with loci that respond rapidly to environmental stimuli. My studies in mouse ES cells demonstrated that SEC is required for rapid transcriptional activation of genes, many of which contain paused Pol II. However, SEC is also required for the activation of the Cyp26al gene, which does not contain detectable Pol II, yet responds much more rapidly to retinoic acid than those paused genes, suggesting that paused Pol II is not a prerequisite for rapid gene activation. Furthermore, Ell3, a member of the ELL family of proteins, predominately occupies poised, active, and inactive enhancers of many developmental genes in ES cells. Ell3’s association with enhancers is required for setting up proper Pol II occupancy at the promoter-proximal regions of neighboring genes, providing a yet to be discovered mechanism for the transition from Ell3’s presence at poised enhancers in ES cells to Ell2’s role in the release of paused Pol II during gene activation
Untangling Mechanisms of the Architectural Protein CTCF
The three-dimensional structure of chromatin regulates gene expression by facilitating contacts between promoters and regulatory elements, demarcating regions of activity and repression and compartmentalising the genome into Topologically Associating Domains (TADs). Architectural protein CTCF is an established mediator of chromatin conformation, however attempts to deplete CTCF have had inconsistent consequences on chromatin and gene expression, which has prevented full comprehension of its role. The aim of this work is to clarify CTCF function and regulation. We began by performing a transient (144hrs) RNAi knockdown to deplete CTCF in LNCaP prostate cancer cells. CTCF ChIP-seq following knockdown revealed that 2949/25505 CTCF sites reproducibly remained bound following RNAi. CTCF RNAi knockdown was next performed in IMR-90 normal lung fibroblasts and revealed the same persistent subset of CTCF sites. To investigate the functionality of these sites we performed CRISPR-Cas9n experiments on candidate persistent sites in LNCaP cells. We chose a locus with a repressive loop containing eight genes, which had lost CTCF binding at 10/12 sites, but had not undergone any changes in gene expression. CRISPR-Cas9 of the 2 persistent sites caused activation of all genes within the loop and concordant alteration to the loop structure when measured by 3C. Hi-C demonstrated a general maintenance of TADs and that persistent sites were located at TAD boundaries. Bioinformatic profiling showed that these sites are constitutively bound in cell types of both normal and disease conditions and derived from different germ layers. We have identified a novel subset of CTCF sites that are persistent following CTCF RNAi and have shown evidence that maintenance of these sites can impede changes to chromatin conformation and gene expression. These results help clarify the current ambiguity in understanding the function and regulation of CTCF
Dissecting the transcriptional regulatory network of embryonic stem cells
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Biology, 2008.Includes bibliographical references.The process by which a single fertilized egg develops into a human being with over 200 cell types, each with a distinct gene expression pattern controlling its cellular state, is poorly understood. An understanding of the transcriptional regulatory networks that establish and maintain gene expression programs in mammalian cells is fundamental to understand development and should provide the foundation for improved diagnosis and treatment of disease. Although it is not yet feasible to map the entirety of these networks in vertebrate cells, recent work in embryonic stem (ES) cells has demonstrated that core features of the network can be discovered by focusing on key transcriptional regulators and their target genes. Here, I describe important insights that have emerged from such studies and highlight how similar approaches can be used to discover the core networks of other vertebrate cell types. Knowledge of the regulatory networks controlling gene expression programs and cell states can guide efforts to reprogram cell states and holds great promise for both disease therapeutics and regenerative medicine.by Megan F. Cole.Ph.D
Genome-wide profiling of p53-regulated enhancer RNAs uncovers a subset of enhancers controlled by a lncRNA
p53 binds enhancers to regulate key target genes. Here, we globally mapped p53-regulated enhancers by looking at enhancer RNA (eRNA) production. Intriguingly, while many p53-induced enhancers contained p53-binding sites, most did not. As long non-coding RNAs(lncRNAs) are prominent regulators of chromatin dynamics, we hypothesized that p53-induced lncRNAs contribute to the activation of enhancers by p53. Among p53-induced lncRNAs, we identified LED and demonstrate that its suppression attenuates p53 function. Chromatin-binding and eRNA expression analyses show that LED associates with and activates strong enhancers. One prominent target of LED was located at an enhancer region within CDKN1A gene, a potent p53-responsive cell cycle inhibitor. LED knockdown reduces CDKN1A enhancer induction and activity, and cell cycle arrest following p53 activation. Finally, promoter-associated hypermethylation analysis shows silencing of LED in human tumours. Thus, our study identifies a new layer of complexity in the p53 pathway and suggests its dysregulation in cancer
ZFX Mediates Non-canonical Oncogenic Functions of the Androgen Receptor Splice Variant 7 in Castrate-Resistant Prostate Cancer
Androgen receptor splice variant 7 (AR-V7) is crucial for prostate cancer progression and therapeutic resistance. We show that, independent of ligand, AR-V7 binds both androgen-responsive elements (AREs) and non-canonical sites distinct from full-length AR (AR-FL) targets. Consequently, AR-V7 not only recapitulates AR-FL's partial functions but also regulates an additional gene expression program uniquely via binding to gene promoters rather than ARE enhancers. AR-V7 binding and AR-V7-mediated activation at these unique targets do not require FOXA1 but rely on ZFX and BRD4. Knockdown of ZFX or select unique targets of AR-V7/ZFX, or BRD4 inhibition, suppresses growth of castration-resistant prostate cancer cells. We also define an AR-V7 direct target gene signature that correlates with AR-V7 expression in primary tumors, differentiates metastatic prostate cancer from normal, and predicts poor prognosis. Thus, AR-V7 has both ARE/FOXA1 canonical and ZFX-directed non-canonical regulatory functions in the evolution of anti-androgen therapeutic resistance, providing information to guide effective therapeutic strategies. By cistrome profiling of endogenous androgen receptor (AR) versus an AR splice variant, AR-V7, Cai et al. uncovered non-canonical pathways uniquely targeted by AR-V7 and ZFX, a previously unknown AR-V7 partner. Targeting cofactors (ZFX or BRD4) or non-canonical downstream pathways of AR-V7 provides potential therapeutic ways for treating prostate cancer
Transcriptional regulation of adipose insulin resistance
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Biological Engineering, 2012.Page 168 blank. Cataloged from PDF version of thesis.Includes bibliographical references (p. 155-167).Insulin resistance is a condition that underlies type 2 diabetes and various cardiovascular diseases. It is highly associated with obesity, making it a pressing medical problem in face of the obesity epidemic. The obesity association also makes adipose tissue the target of interest for ongoing research. Previous work on adipose insulin resistance has largely been focused on deciphering the signaling defects and abnormal adipokine secretion profiles. There is increasing awareness that transcriptional control is a source of dysregulation as well as an avenue of therapeutic intervention for insulin resistance. However, knowledge of transcriptional regulation and dysregulation of adipose insulin resistance remains fragmentary. Here, we present a genome-wide perspective on transcriptional regulation of adipocyte biology and adipose insulin resistance. We made use of the latest high-throughput sequencing technology to interrogate different aspects of transcriptional regulation, namely, histone modifications, protein-DNA interactions, and chromatin accessibility in adipocytes. In combination with the transcriptional outcomes measured by microarray and RNA-sequencing, we (1) characterized a largely unknown histone modification, H3K56 acetylation, in human adipocytes, and (2) set up four diverse in vitro insulin resistance models in mouse adipocytes and analyzed them in parallel with mouse adipose tissues from diet-induced obese mice. In both cases, through computational analysis of the experimentally identified cis-regulatory regions, we identified existing and novel trans-regulators responsible for adipose transcriptional regulation. Furthermore, by comprehensive pathway analysis of the in vitro models and mouse models, we identified aspects of in vivo adipose insulin resistance that are captured by the different in vitro models. Taken together, our studies present a systems view on adipose transcriptional regulation, which provides a wealth of novel resources for gaining insights into adipose biology and insulin resistance.by Kin Yui Alice Lo.Ph.D
The FaceBase Consortium: A comprehensive program to facilitate craniofacial research
The FaceBase Consortium consists of ten interlinked research and technology projects whose goal is to generate craniofacial research data and technology for use by the research community through a central data management and integrated bioinformatics hub. Funded by the National Institute of Dental and Craniofacial Research (NIDCR) and currently focused on studying the development of the middle region of the face, the Consortium will produce comprehensive datasets of global gene expression patterns, regulatory elements and sequencing; will generate anatomical and molecular atlases; will provide human normative facial data and other phenotypes; conduct follow up studies of a completed genome-wide association study; generate independent data on the genetics of craniofacial development, build repositories of animal models and of human samples and data for community access and analysis; and will develop software tools and animal models for analyzing and functionally testing and integrating these data. The FaceBase website (http://www.facebase.org) will serve as a web home for these efforts, providing interactive tools for exploring these datasets, together with discussion forums and other services to support and foster collaboration within the craniofacial research community
- …