15 research outputs found
Integrating genomic conservation data with motif discovery
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (leaves 94-99).We formalize a probabilistic model of inter-species sequence conservation for motif discovery, and demonstrate that adding large-scale genomic conservation data to an existing motif discovery procedure improves the quality of that procedure's results. Existing motif discovery algorithms reveal binding motifs that are statistically over-represented in small sets of promoter regions. To the extent that binding motifs form a reliable part of a cell's regulatory apparatus, and that apparatus is preserved across closely related species, these binding motifs should also be conserved in the corresponding genomes. Previous studies have tried to assess levels of conservation in genomic fragments of several yeast species. Our approach computes the conditional probability of inter-species sequences, and uses this probability measure to maximize the likelihood of the data from different species with a motif model.by Timothy W. Danford.S.M
Dissecting the spatial structure of overlapping transcription in budding yeast
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 95-102).This thesis presents a computational and algorithmic method for the analysis of high-resolution transcription data in the budding yeast Saccharomyces cerevisiae. We begin by describing a computational system for storing and retrieving spatially-mapped genomic data. This system forms the infrastructure for a novel algorithmic approach to detect and recover instances of same-strand overlapping transcripts in high resolution expression experiments. We then apply these algorithms to a set of transcription experiments in budding yeast, Saccharomyces cerevisiae, in order to identify potential sites of same-strand overlapping transcripts that may be involved in novel forms of transcriptional regulation.by Timothy Danford.Ph.D
Zebrafish promoter microarrays identify actively transcribed embryonic genes.
We have designed a zebrafish genomic microarray to identify DNA-protein interactions in the proximal promoter regions of over 11,000 zebrafish genes. Using these microarrays, together with chromatin immunoprecipitation with an antibody directed against tri-methylated lysine 4 of Histone H3, we demonstrate the feasibility of this method in zebrafish. This approach will allow investigators to determine the genomic binding locations of DNA interacting proteins during development and expedite the assembly of the genetic networks that regulate embryogenesis.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are
Core transcriptional regulatory circuitry in human hepatocytes
We mapped the transcriptional regulatory circuitry for six master regulators in human hepatocytes using chromatin immunoprecipitation and high-resolution promoter microarrays. The results show that these regulators form a highly interconnected core circuitry, and reveal the local regulatory network motifs created by regulator–gene interactions. Autoregulation was a prominent theme among these regulators. We found that hepatocyte master regulators tend to bind promoter regions combinatorially and that the number of transcription factors bound to a promoter corresponds with observed gene expression. Our studies reveal portions of the core circuitry of human hepatocytes
Core transcriptional regulatory circuitry in human hepatocytes
We mapped the transcriptional regulatory circuitry for six master regulators in human hepatocytes using chromatin immunoprecipitation and high-resolution promoter microarrays. The results show that these regulators form a highly interconnected core circuitry, and reveal the local regulatory network motifs created by regulator–gene interactions. Autoregulation was a prominent theme among these regulators. We found that hepatocyte master regulators tend to bind promoter regions combinatorially and that the number of transcription factors bound to a promoter corresponds with observed gene expression. Our studies reveal portions of the core circuitry of human hepatocytes
Analysis of the mouse embryonic stem cell regulatory networks obtained by ChIP-chip and ChIP-PET
Background: Genome-wide approaches have begun to reveal the transcriptional networks responsible for pluripotency in embryonic stem (ES) cells. Chromatin Immunoprecipitation (ChIP) followed either by hybridization to a microarray platform (ChIP-chip) or by DNA sequencing (ChIP-PET), has identified binding targets of the ES cell transcription factors OCT4 and NANOG in humans and mice, respectively. These studies have provided an outline of the transcriptional framework involved in maintaining pluripotency. Recent evidence with comparing multiple technologies suggests that expanding these datasets using different platforms would be a useful resource for examining the mechanisms underlying pluripotency regulation. Results: We have now identified OCT4 and NANOG genomic targets in mouse ES cells by ChIP-chip and provided the means to compare these data with previously reported ChIP-PET results in mouse ES cells. We have mapped the sequences of OCT4 and NANOG binding events from each dataset to genomic coordinates, providing a valuable resource to facilitate a better understanding of the ES cell regulatory circuitry. Interestingly, although considerable differences are observed in OCT4 and NANOG occupancy as identified by each method, a substantial number of targets in both datasets are enriched for genes that have known roles in cell-fate specification and that are differentially expressed upon Oct4 or Nanog knockdown. Conclusion: This study suggests that each dataset is a partial representation of the overall ES cell regulatory circuitry, and through integrating binding data obtained by ChIP-chip and ChIP-PET, the methods presented here provide a useful means for integrating datasets obtained by different techniques in the future.National Institutes of Health (U.S) ( RO1-HD045022 )National Institutes of Health (U.S) (R37-CA084198)National Institutes of Health (U.S) ( HG002688
Rethinking Data-Intensive Science Using Scalable Analytics Systems
"Next generation" data acquisition technologies are allowing scientists to collect exponentially more data at a lower cost. These trends are broadly impacting many scientific fields, including genomics, astronomy, and neuroscience. We can attack the problem caused by exponential data growth by applying horizontally scalable techniques from current analytics systems to accelerate scientific processing pipelines. In this paper, we describe ADAM, an example genomics pipeline that leverages the open-source Apache Spark and Parquet systems to achieve a 28x speedup over current genomics pipelines, while reducing cost by 63%. From building this system, we were able to distill a set of techniques for implementing scientific analyses efficiently using commodity "big data" systems. To demonstrate the generality of our architecture, we then implement a scalable astronomy image processing system which achieves a 2.8--8.9x improvement over the state-of-the-art MPI-based system