33 research outputs found

    Motifs, binding, and expression : computational studies of transcriptional regulation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (p. 141-150).Organisms must control gene expression in response to developmental, nutritional, or other environmental cues. This process is known as transcriptional regulation and occurs through complex networks of proteins interacting with specific regulatory sites in the genome. Recently, high throughput variations of experimental techniques like transcriptional profiling and chromatin immunoprecipitation have emerged and taken on increasing importance in the study of regulatory processes. Mining these experiments for useful biological information requires methods that can handle large quantities of noisy data and integrate information from disparate experimental sources in a principled manner. Not coincidentally, computational and statistical methods for analyzing these data have increasingly become a focal point of research efforts. In this thesis we address three key challenges in the analysis of genomic sequence, protein localization, and expression data: (1) learning representations of the specific binding interactions that determine connectivity in regulatory networks, (2) developing physically grounded models describing these interactions, and (3) relating binding to its ultimate effect on the expression of regulated genes. To this end, we present several different algorithms and modeling techniques and apply them to real biological data in yeast, mouse, and human. Our results demonstrate the utility of leveraging multiple sources of information for improving motif analyses of chromatin immunoprecipitation data. Phylogenetic conservation information and knowledge of an immunoprecipitated protein's DNA binding domain are both shown to have great value in this context.(cont.) We next present a biophysically motivated framework for modeling protein-DNA interactions and show how it leads to very natural algorithms for analyzing the binding specificity of an immunoprecipitated protein, and jointly analyzing protein localization data for multiple regulators or multiple conditions. Finally, we present an analysis of transcriptional coregulator binding in a variety of mouse tissues and a method for predicting which proteins form complexes with the coregulator based purely on the sequence of the regions it binds. We detail a simple but powerful model relating regulator binding to gene expression, and show how the position of regulatory regions is of crucial importance for predicting the expression level of nearby genes.by Kenzie Daniel MacIsaac.Ph.D

    An improved map of conserved regulatory sites for Saccharomyces cerevisiae

    Get PDF
    BACKGROUND: The regulatory map of a genome consists of the binding sites for proteins that determine the transcription of nearby genes. An initial regulatory map for S. cerevisiae was recently published using six motif discovery programs to analyze genome-wide chromatin immunoprecipitation data for 203 transcription factors. The programs were used to identify sequence motifs that were likely to correspond to the DNA-binding specificity of the immunoprecipitated proteins. We report improved versions of two conservation-based motif discovery algorithms, PhyloCon and Converge. Using these programs, we create a refined regulatory map for S. cerevisiae by reanalyzing the same chromatin immunoprecipitation data. RESULTS: Applying the same conservative criteria that were applied in the original study, we find that PhyloCon and Converge each separately discover more known specificities than the combination of all six programs in the previous study. Combining the results of PhyloCon and Converge, we discover significant sequence motifs for 36 transcription factors that were previously missed. The new set of motifs identifies 636 more regulatory interactions than the previous one. The new network contains 28% more regulatory interactions among transcription factors, evidence of greater cross-talk between regulators. CONCLUSION: Combining two complementary computational strategies for conservation-based motif discovery improves the ability to identify the specificity of transcriptional regulators from genome-wide chromatin immunoprecipitation data. The increased sensitivity of these methods significantly expands the map of yeast regulatory sites without the need to alter any of the thresholds for statistical significance. The new map of regulatory sites reveals a more elaborate and complex view of the yeast genetic regulatory network than was observed previously

    Education Practical Strategies for Discovering Regulatory DNA Sequence Motifs

    No full text
    Many functionally important regions of the genome can be recognized by searching for sequence patterns, or ‘‘motifs.’’ Aside from the genes themselves, examples include CpG islands, often present in promoter regions, and splice sites that denote intron/exon boundaries. Other motifs of great interest correspond to sites bound by regulatory proteins. Differential expression of genes in response to environmental and developmental cues depends on the action of these proteins, which are also known as transcription factors. Identifying the regulatory motifs bound by transcription factors can provide crucial insight into the mechanisms of transcriptional regulation. However, the search for these sites is challenging because a single regulatory protein will often recognize a variety of similar sequences. In this tutorial, we review computational techniques, termed ‘‘motif discovery,’ ’ to lear

    Scanning for Motifs with PWMs

    No full text
    <p>Scanning for Motifs with PWMs</p

    Motif Discovery Workflow

    No full text
    <p>Motif Discovery Workflow</p

    Resources

    No full text
    <p>Resources</p

    An improved map of conserved regulatory sites for <it>Saccharomyces cerevisiae</it>

    No full text
    Abstract Background The regulatory map of a genome consists of the binding sites for proteins that determine the transcription of nearby genes. An initial regulatory map for S. cerevisiae was recently published using six motif discovery programs to analyze genome-wide chromatin immunoprecipitation data for 203 transcription factors. The programs were used to identify sequence motifs that were likely to correspond to the DNA-binding specificity of the immunoprecipitated proteins. We report improved versions of two conservation-based motif discovery algorithms, PhyloCon and Converge. Using these programs, we create a refined regulatory map for S. cerevisiae by reanalyzing the same chromatin immunoprecipitation data. Results Applying the same conservative criteria that were applied in the original study, we find that PhyloCon and Converge each separately discover more known specificities than the combination of all six programs in the previous study. Combining the results of PhyloCon and Converge, we discover significant sequence motifs for 36 transcription factors that were previously missed. The new set of motifs identifies 636 more regulatory interactions than the previous one. The new network contains 28% more regulatory interactions among transcription factors, evidence of greater cross-talk between regulators. Conclusion Combining two complementary computational strategies for conservation-based motif discovery improves the ability to identify the specificity of transcriptional regulators from genome-wide chromatin immunoprecipitation data. The increased sensitivity of these methods significantly expands the map of yeast regulatory sites without the need to alter any of the thresholds for statistical significance. The new map of regulatory sites reveals a more elaborate and complex view of the yeast genetic regulatory network than was observed previously.</p
    corecore