14 research outputs found

    An NF-κB Transcription-Factor-Dependent Lineage-Specific Transcriptional Program Promotes Regulatory T Cell Identity and Function

    Get PDF
    Both conventional T (Tconv) cells and regulatory T (Treg) cells are activated through ligation of the T cell receptor (TCR) complex, leading to the induction of the transcription factor NF-κB. In Tconv cells, NF-κB regulates expression of genes essential for T cell activation, proliferation, and function. However the role of NF-κB in Treg function remains unclear. We conditionally deleted canonical NF-κB members p65 and c-Rel in developing and mature Treg cells and found they have unique but partially redundant roles. c-Rel was critical for thymic Treg development while p65 was essential for mature Treg identity and maintenance of immune tolerance. Transcriptome and NF-κB p65 binding analyses demonstrated a lineage specific, NF-κB-dependent transcriptional program, enabled by enhanced chromatin accessibility. These dual roles of canonical NF-κB in Tconv and Treg cells highlight the functional plasticity of the NF-κB signaling pathway and underscores the need for more selective strategies to therapeutically target NF-κB

    Role of Cis-regulatory Elements in Transcriptional Regulation: From Evolution to 4D Interactions

    Get PDF
    Transcriptional regulation is the principal mechanism in establishing cell-type specific gene activity by exploring an almost infinite space of different combinations of regulatory elements, transcription factors with high precision. Recent efforts have mapped thousands of candidate regulatory elements, of which a great portion is cell-type specific yet it is still unclear as to what fraction of these elements is functional, what genes these elements regulate, or how they are established in a cell-type specific manner. In this dissertation, I will discuss methods and approaches I developed to better understand the role of regulatory elements and transcription factors in gene expression regulation. First, by comparing the transcriptome and chromatin landscape between mouse and human innate immune cells I showed specific gene expression programs are regulated by highly conserved regulatory elements that contain a set of constrained sequence motifs, which can successfully classify gene-induction in both species. Next, using chromatin interactions I accurately defined functional enhancers and their target genes. This fine mapping dramatically improved the prediction of transcriptional changes. Finally, we built a supervised learning approach to detect the short DNA sequences motifs that regulate the activation of regulatory elements following LPS stimulation. This approach detected several transcription factors to be critical in remodeling the epigenetic landscape both across time and individuals. Overall this thesis addresses several important aspects of cis-regulatory elements in transcriptional regulation and started to derive principles and models of gene-expression regulation that address the fundamental question: “How do cis-regulatory elements drive cell-type-specific transcription?

    BROCKMAN: deciphering variance in epigenomic regulators by k-mer factorization

    Get PDF
    Background: Variation in chromatin organization across single cells can help shed important light on the mechanisms controlling gene expression, but scale, noise, and sparsity pose significant challenges for interpretation of single cell chromatin data. Here, we develop BROCKMAN (Brockman Representation Of Chromatin by K-mers in Mark-Associated Nucleotides), an approach to infer variation in transcription factor (TF) activity across samples through unsupervised analysis of the variation in DNA sequences associated with an epigenomic mark. Results: BROCKMAN represents each sample as a vector of epigenomic-mark-associated DNA word frequencies, and decomposes the resulting matrix to find hidden structure in the data, followed by unsupervised grouping of samples and identification of the TFs that distinguish groups. Applied to single cell ATAC-seq, BROCKMAN readily distinguished cell types, treatments, batch effects, experimental artifacts, and cycling cells. We show that each variable component in the k-mer landscape reflects a set of co-varying TFs, which are often known to physically interact. For example, in K562 cells, AP-1 TFs were central determinant of variability in chromatin accessibility through their variable expression levels and diverse interactions with other TFs. We provide a theoretical basis for why cooperative TF binding – and any associated epigenomic mark – is inherently more variable than non-cooperative binding. Conclusions: BROCKMAN and related approaches will help gain a mechanistic understanding of the trans determinants of chromatin variability between cells, treatments, and individuals. Keywords: Single-cell, Epigenome, Chromatin, scATAC-seq, K-mer, N-gram, Factorization, Decomposition, Clustering, Transcription factorNational Human Genome Research Institute (U.S.) (Centers of Excellence in Genomic Science Grant)Howard Hughes Medical Institute (Centers of Excellence in Genomic Science Grant

    Computational Modelling of Human Transcriptional Regulation by an Information Theory-based Approach

    Get PDF
    ChIP-seq experiments can identify the genome-wide binding site motifs of a transcription factor (TF) and determine its sequence specificity. Multiple algorithms were developed to derive TF binding site (TFBS) motifs from ChIP-seq data, including the entropy minimization-based Bipad that can derive both contiguous and bipartite motifs. Prior studies applying these algorithms to ChIP-seq data only analyzed a small number of top peaks with the highest signal strengths, biasing their resultant position weight matrices (PWMs) towards consensus-like, strong binding sites; nor did they derive bipartite motifs, disabling the accurate modelling of binding behavior of dimeric TFs. This thesis presents a novel motif discovery pipeline by adding the recursive masking and thresholding functionalities to Bipad to improve detection of primary binding motifs. Analyzing 765 ENCODE ChIP-seq datasets with this pipeline generated contiguous and bipartite information theory-based PWMs (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The accuracy of these iPWMs were determined via four independent validation methods, including detection of experimentally proven TFBSs, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. Novel cofactor motifs supported previously unreported TF coregulatory interactions. This thesis further presents a unified framework to identify variants in hereditary breast and ovarian cancer (HBOC), successfully applying these iPWMs to prioritize TFBS variants in 20 complete genes of HBOC patients. The spatial distribution and information composition of cis-regulatory modules (e.g. TFBS clusters) in promoters substantially determine gene expression patterns and TF target genes. Multiple algorithms were developed to detect TFBS clusters, including the information density-based clustering (IDBC) algorithm that simultaneously considers the spatial and information densities of TFBSs. Prior studies predicting tissue-specific gene expression levels and differentially expressed (DE) TF targets used log likelihood ratios to quantify TFBS strengths and merged adjacent TFBSs into clusters. This thesis presents a machine learning framework that uses the Bray-Curtis function to quantify the similarity between tissue-wide expression profiles of genes, and IDBC-identified clusters from iPWM-detected TFBSs to predict gene expression profiles and DE direct TF targets. Multiple clusters enable gene expression to be robust against TFBS mutations

    DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS

    Get PDF
    Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them
    corecore