253 research outputs found

    In silico regulatory analysis for exploring human disease progression

    Get PDF
    © 2008 Holloway et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licens

    Uncovering the Transcription Factor Network Underlying Mammalian Sex Determination

    Get PDF
    <p>Understanding transcriptional regulation in development and disease is one of the central questions in modern biology. The current working model is that Transcription Factors (TFs) combinatorially bind to specific regions of the genome and drive the expression of groups of genes in a cell-type specific fashion. In organisms with large genomes, particularly mammals, TFs bind to enhancer regions that are often several kilobases away from the genes they regulate, which makes identifying the regulators of gene expression difficult. In order to overcome these obstacles and uncover transcriptional regulatory networks, we used an approach combining expression profiling and genome-wide identification of enhancers followed by motif analysis. Further, we applied these approaches to uncover the TFs important in mammalian sex determination.</p><p>Using expression data from a panel of 19 human cell lines we identified genes showing patterns of cell-type specific up-regulation, down-regulation and constitutive expression. We then utilized matched DNase-seq data to assign DNase Hypersensitivity Sites (DHSs) to each gene based on proximity. These DHSs were scanned for matches to motifs and compiled to generate scores reflecting the presence of TF binding sites (TFBSs) in each gene's putative regulatory regions. We used a sparse logistic regression classifier to classify differentially regulated groups of genes. Comparing our approach to proximal promoter regions, we discovered that using sequence features in regions of open chromatin provided significant performance improvement. Crucially, we discovered both known and novel regulators of gene expression in different cell types. For some of these TFs, we found cell-type specific footprints indicating direct binding to their cognate motifs.</p><p>The mammalian gonad is an excellent system to study cell fate determination processes and the dynamic regulation orchestrated by TFs in development. At embryonic day (E) 10.5, the bipotential gonad initiates either testis development in XY embryos, or ovarian development in XX embryos. Genetic studies over the last 3 decades have revealed about 30 genes important in this process, but there are still significant gaps in our understanding. Specifically, we do not know the network of TFs and their specific combinations that cause the rapid changes in gene expression observed during gonadal fate commitment. Further, more than half the cases of human sex reversal are as yet unexplained. </p><p>To apply the methods we developed to identify regulators of gene expression to the gonad, we took two approaches. First, we carried out a careful dissection of the transcriptional dynamics during gonad differentiation in the critical window between E11.0 and E12.0. We profiled the transcriptome at 6 equally spaced time points and developed a Hidden Markov Model to reveal the cascades of transcription that drive the differentiation of the gonad. Further, we discovered that while the ovary maintains its transcriptional state at this early stage, concurrent up- and down-regulation of hundreds of genes are orchestrated by the testis pathway. Further, we compared two different strains of mice with differential susceptibility to XY male-to-female sex reversal. This analysis revealed that in the C57BL/6J strain, the male pathway is delayed by ~5 hours, likely explaining the increased susceptibility to sex reversal in this strain. Finally, we validated the function of Lmo4, a transcriptional co-factor up-regulated in XY gonads at E11.6 in both strains. RNAi mediated knockdown of Lmo4 in primary gonadal cells led to the down-regulation of male pathway genes including key regulators such as Sox9 and Fgf9. </p><p>To find the enhancers in the XY gonad, we conducted DNase-seq in E13.5 XY supporting cells. In addition, we conducted ChIP-seq for H3K27ac, a mark correlated with active enhancer activity. Further, we conducted motif analysis to reveal novel regulators of sex determination. Our work is an important step towards combining expression and chromatin profiling data to assemble transcriptional networks and is applicable to several systems.</p>Dissertatio

    Discovering cell-type dynamics in the nervous system by single-cell transcriptomics

    Get PDF
    The mammalian nervous system is arguably the most intricate system known to science. At its basis lie highly specialized single cells, specifically interacting to ensure everything from normal functionality to complex behavior and cognition. For over a century, neuroscientists have been fascinated by the diversity of cell types that make up the nervous system, and have sought ever-new strategies to characterize them. With the advance of single-cell transcriptomics, particularly RNA-seq, a new toolbox has become available for molecular cell type classification. In this thesis, I will discuss the development of relevant technologies leading up to cellular taxonomy studies, the concept of cell types on a more generalized level, and focus on cell type characterization in the context of continuous, dynamic processes such as development and maturation. Further, I will present the results of two published papers and two manuscripts, as well as preliminary data from our lab’s biggest effort so far, to build an atlas of cell types across the entire nervous system. In paper I, we describe previously uncharacterized heterogeneity in the CNS myelinating cell population, the oligodendrocytes (OL). We delineate the continuous maturation process from oligodendrocyte progenitors (OPCs), via a number of distinct stages, to mature OLs. In paper II, we use single-cell RNA-seq to explore neurons in the sympathetic nervous system, describing seven distinct types. Retrograde and developmental tracing directly associated two of the cell types with distinct functions as erector muscle neurons. Paper III describes the development and application of STRT-seq-2i, a 5’ single-cell RNAseq platform adapted to a high-throughput 9600-well plate. We discuss technical aspects, throughput and flexibility, as well as results from cortical samples of fresh mouse cells and human post mortem nuclei. In paper IV, we performed high throughput unbiased sampling of early postnatal and adult mouse dentate gyrus, a region known for postnatal and maintained adult neurogenesis. We describe distinct stages in the developmental trajectory, holding true for the early and adult neurogenesis. Overall, this thesis aims to shed light on molecular cell-type dynamics in different contexts, as well as discuss key concepts emerging and reevaluated along with the technological advances in the field

    Computational Integrative Models for Cellular Conversion: Application to Cellular Reprogramming and Disease Modeling

    Get PDF
    The groundbreaking identification of only four transcription factors that are able to induce pluripotency in any somatic cell upon perturbation stimulated the discovery of copious amounts of instructive factors triggering different cellular conversions. Such conversions are highly significant to regenerative medicine with its ultimate goal of replacing or regenerating damaged and lost cells. Precise directed conversion of damaged cells into healthy cells offers the tantalizing prospect of promoting regeneration in situ. In the advent of high-throughput sequencing technologies, the distinct transcriptional and accessible chromatin landscapes of several cell types have been characterized. This characterization provided clear evidences for the existence of cell type specific gene regulatory networks determined by their distinct epigenetic landscapes that control cellular phenotypes. Further, these networks are known to dynamically change during the ectopic expression of genes initiating cellular conversions and stabilize again to represent the desired phenotype. Over the years, several computational approaches have been developed to leverage the large amounts of high-throughput datasets for a systematic prediction of instructive factors that can potentially induce desired cellular conversions. To date, the most promising approaches rely on the reconstruction of gene regulatory networks for a panel of well-studied cell types relying predominantly on transcriptional data alone. Though useful, these methods are not designed for newly identified cell types as their frameworks are restricted only to the panel of cell types originally incorporated. More importantly, these approaches rely majorly on gene expression data and cannot account for the cell type specific regulations modulated by the interplay of the transcriptional and epigenetic landscape. In this thesis, a computational method for reconstructing cell type specific gene regulatory networks is proposed that aims at addressing the aforementioned limitations of current approaches. This method integrates transcriptomics, chromatin accessibility assays and available prior knowledge about gene regulatory interactions for predicting instructive factors that can potentially induce desired cellular conversions. Its application to the prioritization of drugs for reverting pathologic phenotypes and the identification of instructive factors for inducing the cellular conversion of adipocytes into osteoblasts underlines the potential to assist in the discovery of novel therapeutic interventions

    Learning the Regulatory Code of Gene Expression

    Get PDF
    Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology

    Analysis, Visualization, and Machine Learning of Epigenomic Data

    Get PDF
    The goal of the Encyclopedia of DNA Elements (ENCODE) project has been to characterize all the functional elements of the human genome. These elements include expressed transcripts and genomic regions bound by transcription factors (TFs), occupied by nucleosomes, occupied by nucleosomes with modified histones, or hypersensitive to DNase I cleavage, etc. Chromatin Immunoprecipitation (ChIP-seq) is an experimental technique for detecting TF binding in living cells, and the genomic regions bound by TFs are called ChIP-seq peaks. ENCODE has performed and compiled results from tens of thousands of experiments, including ChIP-seq, DNase, RNA-seq and Hi-C. These efforts have culminated in two web-based resources from our lab—Factorbook and SCREEN—for the exploration of epigenomic data for both human and mouse. Factorbook is a peak-centric resource presenting data such as motif enrichment and histone modification profiles for transcription factor binding sites computed from ENCODE ChIP-seq data. SCREEN provides an encyclopedia of ~2 million regulatory elements, including promoters and enhancers, identified using ENCODE ChIP-seq and DNase data, with an extensive UI for searching and visualization. While we have successfully utilized the thousands of available ENCODE ChIP-seq experiments to build the Encyclopedia and visualizers, we have also struggled with the practical and theoretical inability to assay every possible experiment on every possible biosample under every conceivable biological scenario. We have used machine learning techniques to predict TF binding sites and enhancers location, and demonstrate machine learning is critical to help decipher functional regions of the genome

    Computational Integrative Models for Cellular Conversion: Application to Cellular Reprogramming and Disease Modeling

    Get PDF
    The groundbreaking identification of only four transcription factors that are able to induce pluripotency in any somatic cell upon perturbation stimulated the discovery of copious amounts of instructive factors triggering different cellular conversions. Such conversions are highly significant to regenerative medicine with its ultimate goal of replacing or regenerating damaged and lost cells. Precise directed conversion of damaged cells into healthy cells offers the tantalizing prospect of promoting regeneration in situ. In the advent of high-throughput sequencing technologies, the distinct transcriptional and accessible chromatin landscapes of several cell types have been characterized. This characterization provided clear evidences for the existence of cell type specific gene regulatory networks determined by their distinct epigenetic landscapes that control cellular phenotypes. Further, these networks are known to dynamically change during the ectopic expression of genes initiating cellular conversions and stabilize again to represent the desired phenotype. Over the years, several computational approaches have been developed to leverage the large amounts of high-throughput datasets for a systematic prediction of instructive factors that can potentially induce desired cellular conversions. To date, the most promising approaches rely on the reconstruction of gene regulatory networks for a panel of well-studied cell types relying predominantly on transcriptional data alone. Though useful, these methods are not designed for newly identified cell types as their frameworks are restricted only to the panel of cell types originally incorporated. More importantly, these approaches rely majorly on gene expression data and cannot account for the cell type specific regulations modulated by the interplay of the transcriptional and epigenetic landscape. In this thesis, a computational method for reconstructing cell type specific gene regulatory networks is proposed that aims at addressing the aforementioned limitations of current approaches. This method integrates transcriptomics, chromatin accessibility assays and available prior knowledge about gene regulatory interactions for predicting instructive factors that can potentially induce desired cellular conversions. Its application to the prioritization of drugs for reverting pathologic phenotypes and the identification of instructive factors for inducing the cellular conversion of adipocytes into osteoblasts underlines the potential to assist in the discovery of novel therapeutic interventions

    Development and Application of Next-Generation Sequencing Methods to Profile Cellular Translational Dynamics

    Full text link
    The transmission of genetic information from the transcription of DNA to RNA and the subsequent translation of RNA into protein is often abstracted into a linear process. However, as methods and technologies to measure the genomic, transcriptomic, and proteomic content of cells have advanced, so too has our understanding that the transmission of genetic information does not always flow in a lossless manner. For instance, changes observed in messenger RNA (mRNA) abundance are not always retained at the proteomic level. Indeed, a diverse array of mechanisms have been identified that exert regulatory control over this transmission of information. Next-generation short read sequencing has driven many of these insights and provided increasingly nuanced understanding of these regulatory mechanisms. However, the continued development and application of sequencing methodologies and analytics are required to properly contextualize many of these insights on a more global scale. Ribosome profiling is one such recent advancement which enriches for ribosome-protected fragments of mRNA; sequencing and analysis of these ribosome-protected mRNA fragments enables profiling of the translational content of a sample. The aim of this dissertation is to address the need for the development and application of statistical and analytical algorithms to profile the regulatory factors that contribute to the translational dynamics in cells. In the first chapter, I survey the development and application of next-generation sequencing methods for the profiling and computational analysis of translation and translational dynamics. In the second chapter of this thesis, I present SPECtre, a software package that identifies regions of active translation through measurement of the translational engagement of ribosomes over a transcript. SPECtre achieves high sensitivity and specificity in its classification of regions undergoing translation by leveraging the codon-dependent elongation of peptides; this tri-nucleotide periodicity is evident in the alignment of ribosome profiling sequence reads to a reference transcriptome. SPECtre classifies actively translated transcripts according to their coherence in read coverage over a region to an optimal tri-nucleotide signal. In the third chapter, I describe the application of SPECtre to identify the translation of upstream-initiated open-reading frames that may regulate differentiation in a neuron-like cell model. uORFs are transcripts that result from the initiation of translation from AUG, and under certain biological constraints, from non-AUG sequences localized in the 5’ untranslated regions of annotated protein-coding genes. Subsets of these uORFs have been implicated in the regulation of their downstream protein-coding genes in yeast, mice and humans. In this chapter, I provide further evidence for this regulation as well as the spatial context for the functional consequences of uORF translation on downstream protein-coding genes in a neuron-like cell line model of differentiation. Finally, in the fourth chapter, I outline a strategy using our coherence-based translational scoring algorithm to profile ribosomal engagement over chimeric gene fusion breakpoints in prostate cancer. Here, known breakpoints from current annotation databases are integrated with novel junctions nominated by existing whole genome and transcriptomic gene fusion detection algorithms, and the translational profile over these chimeric junctions using SPECtre is measured. This provides an additional layer of translational evidence to known and novel gene fusion breakpoints in prostate cancer. Ongoing development of a database and visualization platform based on these results will enable integrative insights into the transcriptional and translational topology of these breakpoints.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144106/1/stonyc_1.pd
    • …
    corecore