4 research outputs found
Deconvolving sequence features that discriminate between overlapping regulatory annotations.
Genomic loci with regulatory potential can be annotated with various properties. For example, genomic sites bound by a given transcription factor (TF) can be divided according to whether they are proximal or distal to known promoters. Sites can be further labeled according to the cell types and conditions in which they are active. Given such a collection of labeled sites, it is natural to ask what sequence features are associated with each annotation label. However, discovering such label-specific sequence features is often confounded by overlaps between the labels; e.g. if regulatory sites specific to a given cell type are also more likely to be promoter-proximal, it is difficult to assess whether motifs identified in that set of sites are associated with the cell type or associated with promoters. In order to meet this challenge, we developed SeqUnwinder, a principled approach to deconvolving interpretable discriminative sequence features associated with overlapping annotation labels. We demonstrate the novel analysis abilities of SeqUnwinder using three examples. Firstly, SeqUnwinder is able to unravel sequence features associated with the dynamic binding behavior of TFs during motor neuron programming from features associated with chromatin state in the initial embryonic stem cells. Secondly, we characterize distinct sequence properties of multi-condition and cell-specific TF binding sites after controlling for uneven associations with promoter proximity. Finally, we demonstrate the scalability of SeqUnwinder to discover cell-specific sequence features from over one hundred thousand genomic loci that display DNase I hypersensitivity in one or more ENCODE cell lines
Recommended from our members
An integrative view of the regulatory and transcriptional landscapes in mouse hematopoiesis.
Thousands of epigenomic data sets have been generated in the past decade, but it is difficult for researchers to effectively use all the data relevant to their projects. Systematic integrative analysis can help meet this need, and the VISION project was established for validated systematic integration of epigenomic data in hematopoiesis. Here, we systematically integrated extensive data recording epigenetic features and transcriptomes from many sources, including individual laboratories and consortia, to produce a comprehensive view of the regulatory landscape of differentiating hematopoietic cell types in mouse. By using IDEAS as our integrative and discriminative epigenome annotation system, we identified and assigned epigenetic states simultaneously along chromosomes and across cell types, precisely and comprehensively. Combining nuclease accessibility and epigenetic states produced a set of more than 200,000 candidate cis-regulatory elements (cCREs) that efficiently capture enhancers and promoters. The transitions in epigenetic states of these cCREs across cell types provided insights into mechanisms of regulation, including decreases in numbers of active cCREs during differentiation of most lineages, transitions from poised to active or inactive states, and shifts in nuclease accessibility of CTCF-bound elements. Regression modeling of epigenetic states at cCREs and gene expression produced a versatile resource to improve selection of cCREs potentially regulating target genes. These resources are available from our VISION website to aid research in genomics and hematopoiesis.National Institute of Diabetes and Digestive and Kidney Diseases (grant number R24DK106766-01A1), the National Human Genome Research Institute (grant number U54HG006998
COMPUTATIONAL MODELING OF GENE REGULATION, GAMETE FORMATION, AND EMBRYO IMPLANTATION
DNA located in genes is transcribed into RNA which is translated into protein. The regulation of transcription and translation is carried out by several factors including a gene’s primary sequence, cis-regulatory elements (CREs) in non-coding DNA regions, epigenetic marks on the histones which compact DNA, and trans-binding factors (or proteins). The differential expression of a gene is crucial for establishing lineage-specific cell identity and phenotypic variability. Mutation or dysregulation may lead to natural variation within a population or aberrant gene expression and disease; trait-associated variation is known to be enriched in putative CREs, supporting their role in the origins of disease. Understanding the mechanisms by which CREs interact with one another and their cellular environment to regulate transcription may inform knowledge of biological pathways and provide a crucial foundation for developing new treatments. Further, because all DNA is passed to an offspring from their parents, it is important to understand not just the outcomes on expression due to coding and non-coding variation, but also how genetic material is passed to future generations. These dissertation chapters apply modeling approaches to large amounts of genetic and gene expression data in order to 1) better understand how the sequence and epigenetic makeup of CREs impact gene expression within hematopoiesis; 2) scan for selfish genetic elements which are preferentially passed to offspring within human sperm samples; and 3) predict implantation success for euploid embryos given gene expression profiles. Our models within Chapters 2-4 describe the impact of CREs within the blood cell lineage, connecting CREs to putative target genes, and establishing that the hematopoietic CREs were enriched for blood-trait associated genetic variation. Within Chapter 5, we find no compelling evidence of selfish genetic elements within a large sample of human sperm. Finally, within Chapter 6, we identify some genes which seem to impact the success of IVF embryo implantation by acting through regulation of translation