4 research outputs found

    Deconvolving sequence features that discriminate between overlapping regulatory annotations

    No full text

    Deconvolving sequence features that discriminate between overlapping regulatory annotations.

    No full text
    Genomic loci with regulatory potential can be annotated with various properties. For example, genomic sites bound by a given transcription factor (TF) can be divided according to whether they are proximal or distal to known promoters. Sites can be further labeled according to the cell types and conditions in which they are active. Given such a collection of labeled sites, it is natural to ask what sequence features are associated with each annotation label. However, discovering such label-specific sequence features is often confounded by overlaps between the labels; e.g. if regulatory sites specific to a given cell type are also more likely to be promoter-proximal, it is difficult to assess whether motifs identified in that set of sites are associated with the cell type or associated with promoters. In order to meet this challenge, we developed SeqUnwinder, a principled approach to deconvolving interpretable discriminative sequence features associated with overlapping annotation labels. We demonstrate the novel analysis abilities of SeqUnwinder using three examples. Firstly, SeqUnwinder is able to unravel sequence features associated with the dynamic binding behavior of TFs during motor neuron programming from features associated with chromatin state in the initial embryonic stem cells. Secondly, we characterize distinct sequence properties of multi-condition and cell-specific TF binding sites after controlling for uneven associations with promoter proximity. Finally, we demonstrate the scalability of SeqUnwinder to discover cell-specific sequence features from over one hundred thousand genomic loci that display DNase I hypersensitivity in one or more ENCODE cell lines

    COMPUTATIONAL MODELING OF GENE REGULATION, GAMETE FORMATION, AND EMBRYO IMPLANTATION

    Get PDF
    DNA located in genes is transcribed into RNA which is translated into protein. The regulation of transcription and translation is carried out by several factors including a gene’s primary sequence, cis-regulatory elements (CREs) in non-coding DNA regions, epigenetic marks on the histones which compact DNA, and trans-binding factors (or proteins). The differential expression of a gene is crucial for establishing lineage-specific cell identity and phenotypic variability. Mutation or dysregulation may lead to natural variation within a population or aberrant gene expression and disease; trait-associated variation is known to be enriched in putative CREs, supporting their role in the origins of disease. Understanding the mechanisms by which CREs interact with one another and their cellular environment to regulate transcription may inform knowledge of biological pathways and provide a crucial foundation for developing new treatments. Further, because all DNA is passed to an offspring from their parents, it is important to understand not just the outcomes on expression due to coding and non-coding variation, but also how genetic material is passed to future generations. These dissertation chapters apply modeling approaches to large amounts of genetic and gene expression data in order to 1) better understand how the sequence and epigenetic makeup of CREs impact gene expression within hematopoiesis; 2) scan for selfish genetic elements which are preferentially passed to offspring within human sperm samples; and 3) predict implantation success for euploid embryos given gene expression profiles. Our models within Chapters 2-4 describe the impact of CREs within the blood cell lineage, connecting CREs to putative target genes, and establishing that the hematopoietic CREs were enriched for blood-trait associated genetic variation. Within Chapter 5, we find no compelling evidence of selfish genetic elements within a large sample of human sperm. Finally, within Chapter 6, we identify some genes which seem to impact the success of IVF embryo implantation by acting through regulation of translation
    corecore