9 research outputs found

    Methods for identifying regulatory grammars

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. [37]-40).Recent advancements in sequencing technology have made it possible to study the mechanisms of gene regulation, such as protein-DNA binding, at greater resolution and on a greater scale than was previously possible. We present an expectation-maximization learning algorithm that identifies enriched spatial relationships between motifs in sets of DNA sequences. For example, the method will identify spatially constrained motifs colocated in the same regulatory region. We apply our method to biological sequence data and recover previously known prokaryotic promoter spacing constraints demonstrating that joint learning of motifs and spacing constraints is superior to other methods for this task.by Tahin Fahmid Syed.S.M

    High-throughput mapping of regulatory DNA

    Get PDF
    Quantifying the effects of cis-regulatory DNA on gene expression is a major challenge. Here, we present the multiplexed editing regulatory assay (MERA), a high-throughput CRISPR-Cas9–based approach that analyzes the functional impact of the regulatory genome in its native context. MERA tiles thousands of mutations across ~40 kb of cis-regulatory genomic space and uses knock-in green fluorescent protein (GFP) reporters to read out gene activity. Using this approach, we obtain quantitative information on the contribution of cis-regulatory regions to gene expression. We identify proximal and distal regulatory elements necessary for expression of four embryonic stem cell–specific genes. We show a consistent contribution of neighboring gene promoters to gene expression and identify unmarked regulatory elements (UREs) that control gene expression but do not have typical enhancer epigenetic or chromatin features. We compare thousands of functional and nonfunctional genotypes at a genomic location and identify the base pair–resolution functional motifs of regulatory elements.National Institutes of Health (U.S.) (1U01HG007037

    Predicting genomic interactions using deep learning

    No full text
    Thesis: E.C.S., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019Cataloged from PDF version of thesis.Includes bibliographical references (pages 39-43).Physical promoter-enhancer and CTCF-CTCF interactions organize the human genome in 3-dimensions, and contribute to the regulation of gene expression. Hi-C and related approaches have enabled profiling of these interactions, though how the instructions for these interactions are encoded in the genome is still largely not understood. We develop a deep learning model, Deep3DGenome, to predict genomic interactions using both genomic sequence data and chromatin features. We find that a machine learning model that has anchor specific modules and uses rich chromatin features outperforms previous approaches at predicting 3D interactions.by Tahin Fahmid Syed.E.C.S.E.C.S. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienc

    Broad metabolic sensitivity profiling of a prototrophic yeast deletion collection

    No full text
    Abstract Background Genome-wide sensitivity screens in yeast have been immensely popular following the construction of a collection of deletion mutants of non-essential genes. However, the auxotrophic markers in this collection preclude experiments on minimal growth medium, one of the most informative metabolic environments. Here we present quantitative growth analysis for mutants in all 4,772 non-essential genes from our prototrophic deletion collection across a large set of metabolic conditions. Results The complete collection was grown in environments consisting of one of four possible carbon sources paired with one of seven nitrogen sources, for a total of 28 different well-defined metabolic environments. The relative contributions to mutants' fitness of each carbon and nitrogen source were determined using multivariate statistical methods. The mutant profiling recovered known and novel genes specific to the processing of nutrients and accurately predicted functional relationships, especially for metabolic functions. A benchmark of genome-scale metabolic network modeling is also given to demonstrate the level of agreement between current in silico predictions and hitherto unavailable experimental data. Conclusions These data address a fundamental deficiency in our understanding of the model eukaryote Saccharomyces cerevisiae and its response to the most basic of environments. While choice of carbon source has the greatest impact on cell growth, specific effects due to nitrogen source and interactions between the nutrients are frequent. We demonstrate utility in characterizing genes of unknown function and illustrate how these data can be integrated with other whole-genome screens to interpret similarities between seemingly diverse perturbation types

    Perspectives on ENCODE

    No full text
    The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.11Nsciescopu

    Expanded encyclopaedias of DNA elements in the human and mouse genomes

    No full text
    AbstractThe human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.11Nsciescopu
    corecore