19 research outputs found

    Computation for ChIP-seq and RNA-seq studies

    Get PDF
    Genome-wide measurements of protein-DNA interactions and transcriptomes are increasingly done by deep DNA sequencing methods (ChIP-seq and RNA-seq). The power and richness of these counting-based measurements comes at the cost of routinely handling tens to hundreds of millions of reads. Whereas early adopters necessarily developed their own custom computer code to analyze the first ChIP-seq and RNA-seq datasets, a new generation of more sophisticated algorithms and software tools are emerging to assist in the analysis phase of these projects. Here we describe the multilayered analyses of ChIP-seq and RNA-seq datasets, discuss the software packages currently available to perform tasks at each layer and describe some upcoming challenges and features for future analysis tools. We also discuss how software choices and uses are affected by specific aspects of the underlying biology and data structure, including genome size, positional clustering of transcription factor binding sites, transcript discovery and expression quantification

    Multi-Stage Modeling of the Kinetics of Activation of CaMKII

    Get PDF
    Ca 2+ /calmodulin-dependent protein kinase 2 (CaMKII) plays an important role in induction of long-term potentiation and formation of memory. It is abundant in dendritic spines, and is activated when Ca 2+ flows into the postsynaptic cytosol through open NMDA-type glutamate receptors. Its function is fine-tuned through interaction with other proteins as well as through subunit interactions and regulatory autophosphorylation. We have undertaken a multi-stage project to study the critical kinetics of activation of CaMKII in the spine by combining modeling and experimental studies. We are using computational modeling and simulations on various platforms, coupled with biochemical experiments in vitro, and eventually in vivo, to understand CaMKII regulation. The project includes the following steps: 1. Determining the parameters governing activation of a monomeric subunit. The CaMKII holoenzyme is a large dodecamer of similar, homologous subunits held together by interactions between the association domains located at the carboxyl end of each subunit. Individual, monomeric subunits can be expressed recombinantly by removing the association domain. Computer simulations of activation of monomeric CaMKII by Ca 2+ /calmodulin at both saturating and non-saturating concentrations in a test tube have helped to identify the binding parameters that are most crucial for modeling of regulation of CaMKII and thus have indicated the most useful biochemical assays to measure those parameters (Pepke et al., 2010). We are using these measurements to fine-tune our model of activation of individual catalytic subunits. 2. Building a model of the holoenzyme. Because a CaMKII holoenzyme contains 12 similar subunits, each of which can exist in several states, the holoenzyme can have a large number of state combinations. Thus, modeling the entire holoenzyme requires a computational framework that avoids the ensuing combinatorial complexity. The stochastic simulator MCell provides an elegant, rule-based way of modeling state changes in the CaMKII holoenzyme. 3. Modeling cooperativity that arises from the dodecameric structure of CaMKII. Autophosphorylation at threonine-286, which activates CaMKII subunits, is an inter-subunit event. Thus, it is greatly facilitated by the close proximity of subunits in the holoenzyme. In addition, the subunits within the holoenzyme are arranged as dimers which appears to result in cooperativity in the binding of Ca 2+ /CaM to individual subunits of the dimer (Chao et al., 2010). An accurate model of activation of subunits in the holoenzyme and their autophosphorylation will allow us to explore the effects of cooperativity on CaMKII activation on various time scales. 4. Modeling CaMKII within the context of a postsynaptic spine CaMKII interacts with a variety of other proteins, both in the postsynaptic density (PSD), close to major sources of Ca 2+ influx, and in other parts of the spine. In the fourth stage of this project we plan to implement kinetic models of activation of CaMKII in the context of an MCell model of Ca 2+ influx into a spine upon activation of NMDA-type glutamate receptors (Keller et al., 2008; Keller et al., 2011, submitted). We will explore the effects of different localization and numbers of CaMKII holoenzymes in the spine on CaMKII activation. References: Pepke, S., Kinzer-Ursem, T., Mihalas, S., and Kennedy, M.B. (2010). A dynamic model of interactions of Ca 2+ , calmodulin, and catalytic subunits of Ca 2+ /calmodulin-dependent protein kinase II. PLoS Comput Biol 6, e1000675. Chao, L.H., Pellicena, P., Deindl, S., Barclay, L.A., Schulman, H., and Kuriyan, J. (2010). Intersubunit capture of regulatory segments is a component of cooperative CaMKII activation. Nat Struct Mol Biol 17, 264-272. Keller, D.X., Franks, K.M., Bartol, T.M., Jr., and Sejnowski, T.J. (2008). Calmodulin activation by calcium transients in the postsynaptic density of dendritic spines. PLoS ONE 3, e2045. Keller, D.X., Bartol, T.M., Kinney, J.P, Kennedy, M.B., Bajaj, C., Harris, K.M., and Sejnowski, T.J. Regulation of synaptic calcium transients in reconstructed dendritic spines of hippocampal CA1 pyramidal neurons, submitted

    High resolution mapping of Twist to DNA in Drosophila embryos: Efficient functional analysis and evolutionary conservation

    Get PDF
    Cis-regulatory modules (CRMs) function by binding sequence specific transcription factors, but the relationship between in vivo physical binding and the regulatory capacity of factor-bound DNA elements remains uncertain. We investigate this relationship for the well-studied Twist factor in Drosophila melanogaster embryos by analyzing genome-wide factor occupancy and testing the functional significance of Twist occupied regions and motifs within regions. Twist ChIP-seq data efficiently identified previously studied Twist-dependent CRMs and robustly predicted new CRM activity in transgenesis, with newly identified Twist-occupied regions supporting diverse spatiotemporal patterns (>74% positive, n = 31). Some, but not all, candidate CRMs require Twist for proper expression in the embryo. The Twist motifs most favored in genome ChIP data (in vivo) differed from those most favored by Systematic Evolution of Ligands by EXponential enrichment (SELEX) (in vitro). Furthermore, the majority of ChIP-seq signals could be parsimoniously explained by a CABVTG motif located within 50 bp of the ChIP summit and, of these, CACATG was most prevalent. Mutagenesis experiments demonstrated that different Twist E-box motif types are not fully interchangeable, suggesting that the ChIP-derived consensus (CABVTG) includes sites having distinct regulatory outputs. Further analysis of position, frequency of occurrence, and sequence conservation revealed significant enrichment and conservation of CABVTG E-box motifs near Twist ChIP-seq signal summits, preferential conservation of Β±150 bp surrounding Twist occupied summits, and enrichment of GA- and CA-repeat sequences near Twist occupied summits. Our results show that high resolution in vivo occupancy data can be used to drive efficient discovery and dissection of global and local cis-regulatory logic

    Integrating and mining the chromatin landscape of cell-type specificity using self-organizing maps

    Get PDF
    We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, and mine diverse genomics data types, including complex chromatin signatures. A fine-grained SOM was trained on 72 ChIP-seq histone modifications and DNase-seq data sets from six biologically diverse cell lines studied by The ENCODE Project Consortium. We mined the resulting SOM to identify chromatin signatures related to sequence-specific transcription factor occupancy, sequence motif enrichment, and biological functions. To highlight clusters enriched for specific functions such as transcriptional promoters or enhancers, we overlaid onto the map additional data sets not used during training, such as ChIP-seq, RNA-seq, CAGE, and information on cis-acting regulatory modules from the literature. We used the SOM to parse known transcriptional enhancers according to the cell-type-specific chromatin signature, and we further corroborated this pattern on the map by EP300 (also known as p300) occupancy. New candidate cell-type-specific enhancers were identified for multiple ENCODE cell types in this way, along with new candidates for ubiquitous enhancer activity. An interactive web interface was developed to allow users to visualize and custom-mine the ENCODE SOM. We conclude that large SOMs trained on chromatin data from multiple cell types provide a powerful way to identify complex relationships in genomic data at user-selected levels of granularity

    A User's Guide to the Encyclopedia of DNA Elements (ENCODE)

    Get PDF
    The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome

    Automated Diagnosis and Control of Complex Systems

    Get PDF
    Livingstone2 is a reusable, artificial intelligence (AI) software system designed to assist spacecraft, life support systems, chemical plants, or other complex systems by operating with minimal human supervision, even in the face of hardware failures or unexpected events. The software diagnoses the current state of the spacecraft or other system, and recommends commands or repair actions that will allow the system to continue operation. Livingstone2 is an enhancement of the Livingstone diagnosis system that was flight-tested onboard the Deep Space One spacecraft in 1999. This version tracks multiple diagnostic hypotheses, rather than just a single hypothesis as in the previous version. It is also able to revise diagnostic decisions made in the past when additional observations become available. In such cases, Livingstone might arrive at an incorrect hypothesis. Re-architecting and re-implementing the system in C++ has increased performance. Usability has been improved by creating a set of development tools that is closely integrated with the Livingstone2 engine. In addition to the core diagnosis engine, Livingstone2 includes a compiler that translates diagnostic models written in a Java-like language into Livingstone2's language, and a broad set of graphical tools for model development

    Single-Cell Transcriptome Analysis Reveals Dynamic Changes in lncRNA Expression during Reprogramming

    Get PDF
    Cellular reprogramming highlights the epigenetic plasticity of the somatic cell state. Long noncoding RNAs (lncRNAs) have emerging roles in epigenetic regulation, but their potential functions in reprogramming cell fate have been largely unexplored. We used single-cell RNA sequencing to characterize the expression patterns of over 16,000 genes, including 437 lncRNAs, during defined stages of reprogramming to pluripotency. Self-organizing maps (SOMs) were used as an intuitive way to structure and interrogate transcriptome data at the single-cell level. Early molecular events during reprogramming involved the activation of Ras signaling pathways, along with hundreds of lncRNAs. Loss-of-function studies showed that activated lncRNAs can repress lineage-specific genes, while lncRNAs activated in multiple reprogramming cell types can regulate metabolic gene expression. Our findings demonstrate that reprogramming cells activate defined sets of functionally relevant lncRNAs and provide a resource to further investigate how dynamic changes in the transcriptome reprogram cell state

    A User's Guide to the Encyclopedia of DNA Elements (ENCODE)

    Get PDF
    The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome
    corecore