28 research outputs found

    Reusable, extensible, and modifiable R scripts and Kepler workflows for comprehensive single set ChIP-seq analysis

    Get PDF
    BACKGROUND: There has been an enormous expansion of use of chromatin immunoprecipitation followed by sequencing (ChIP-seq) technologies. Analysis of large-scale ChIP-seq datasets involves a complex series of steps and production of several specialized graphical outputs. A number of systems have emphasized custom development of ChIP-seq pipelines. These systems are primarily based on custom programming of a single, complex pipeline or supply libraries of modules and do not produce the full range of outputs commonly produced for ChIP-seq datasets. It is desirable to have more comprehensive pipelines, in particular ones addressing common metadata tasks, such as pathway analysis, and pipelines producing standard complex graphical outputs. It is advantageous if these are highly modular systems, available as both turnkey pipelines and individual modules, that are easily comprehensible, modifiable and extensible to allow rapid alteration in response to new analysis developments in this growing area. Furthermore, it is advantageous if these pipelines allow data provenance tracking. RESULTS: We present a set of 20 ChIP-seq analysis software modules implemented in the Kepler workflow system; most (18/20) were also implemented as standalone, fully functional R scripts. The set consists of four full turnkey pipelines and 16 component modules. The turnkey pipelines in Kepler allow data provenance tracking. Implementation emphasized use of common R packages and widely-used external tools (e.g., MACS for peak finding), along with custom programming. This software presents comprehensive solutions and easily repurposed code blocks for ChIP-seq analysis and pipeline creation. Tasks include mapping raw reads, peakfinding via MACS, summary statistics, peak location statistics, summary plots centered on the transcription start site (TSS), gene ontology, pathway analysis, and de novo motif finding, among others. CONCLUSIONS: These pipelines range from those performing a single task to those performing full analyses of ChIP-seq data. The pipelines are supplied as both Kepler workflows, which allow data provenance tracking, and, in the majority of cases, as standalone R scripts. These pipelines are designed for ease of modification and repurposing

    Mutation in Folate Metabolism Causes Epigenetic Instability and Transgenerational Effects on Development

    Get PDF
    SummaryThe importance of maternal folate consumption for normal development is well established, yet the molecular mechanism linking folate metabolism to development remains poorly understood. The enzyme methionine synthase reductase (Mtrr) is necessary for utilization of methyl groups from the folate cycle. We found that a hypomorphic mutation of the mouse Mtrr gene results in intrauterine growth restriction, developmental delay, and congenital malformations, including neural tube, heart, and placental defects. Importantly, these defects were dependent upon the Mtrr genotypes of the maternal grandparents. Furthermore, we observed widespread epigenetic instability associated with altered gene expression in the placentas of wild-type grandprogeny of Mtrr-deficient maternal grandparents. Embryo transfer experiments revealed that Mtrr deficiency in mice lead to two distinct, separable phenotypes: adverse effects on their wild-type daughters’ uterine environment, leading to growth defects in wild-type grandprogeny, and the appearance of congenital malformations independent of maternal environment that persist for five generations, likely through transgenerational epigenetic inheritance.PaperFlic

    Tiling array data analysis: a multiscale approach using wavelets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Tiling array data is hard to interpret due to noise. The wavelet transformation is a widely used technique in signal processing for elucidating the true signal from noisy data. Consequently, we attempted to denoise representative tiling array datasets for ChIP-chip experiments using wavelets. In doing this, we used specific wavelet basis functions, <it>Coiflets</it>, since their triangular shape closely resembles the expected profiles of true ChIP-chip peaks.</p> <p>Results</p> <p>In our wavelet-transformed data, we observed that noise tends to be confined to small scales while the useful signal-of-interest spans multiple large scales. We were also able to show that wavelet coefficients due to non-specific cross-hybridization follow a log-normal distribution, and we used this fact in developing a thresholding procedure. In particular, wavelets allow one to set an unambiguous, absolute threshold, which has been hard to define in ChIP-chip experiments. One can set this threshold by requiring a similar confidence level at different length-scales of the transformed signal. We applied our algorithm to a number of representative ChIP-chip data sets, including those of Pol II and histone modifications, which have a diverse distribution of length-scales of biochemical activity, including some broad peaks.</p> <p>Conclusions</p> <p>Finally, we benchmarked our method in comparison to other approaches for scoring ChIP-chip data using spike-ins on the ENCODE Nimblegen tiling array. This comparison demonstrated excellent performance, with wavelets getting the best overall score.</p

    Direct patterning of periodic semiconductor nanostructures using single-pulse nanosecond laser interference

    Get PDF
    We demonstrate an effective method for fabricating large area periodic two-dimensional semiconductor nanostructures by means of single-pulse laser interference. Utilizing a pulsed nanosecond laser with a wavelength of 355 nm, precisely ordered square arrays of nanoholes with a periodicity of 300 nm were successfully obtained on UV photoresist and also directly via a resist-free process onto semiconductor wafers. We show improved uniformity using a beam-shaping system consisting of cylindrical lenses with which we can demonstrate highly regular arrays over hundreds of square micrometers. We propose that our novel observation of direct pattern transfer to GaAs is due to local congruent evaporation and subsequent droplet etching of the surface. The results show that single-pulse interference can provide a rapid and highly efficient route for the realization of wide-area periodic nanostructures on semiconductors and potentially on other engineering materials

    Modeling the relative relationship of transcription factor binding and histone modifications to gene expression levels in mouse embryonic stem cells

    Get PDF
    Transcription factor (TF) binding and histone modification (HM) are important for the precise control of gene expression. Hence, we constructed statistical models to relate these to gene expression levels in mouse embryonic stem cells. While both TF binding and HMs are highly ‘predictive’ of gene expression levels (in a statistical, but perhaps not strictly mechanistic, sense), we find they show distinct differences in the spatial patterning of their predictive strength: TF binding achieved the highest predictive power in a small DNA region centered at the transcription start sites of genes, while the HMs exhibited high predictive powers across a wide region around genes. Intriguingly, our results suggest that TF binding and HMs are redundant in strict statistical sense for predicting gene expression. We also show that our TF and HM models are cell line specific; specifically, TF binding and HM are more predictive of gene expression in the same cell line, and the differential gene expression between cell lines is predictable by differential HMs. Finally, we found that the models trained solely on protein-coding genes are predictive of expression levels of microRNAs, suggesting that their regulation by TFs and HMs may share a similar mechanism to that for protein-coding genes

    YesWorkflow:A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts

    Get PDF
    Scientific workflow management systems offer features for composing complex computational pipelines from modular building blocks, for executing the resulting automated workflows, and for recording the provenance of data products resulting from workflow runs. Despite the advantages such features provide, many automated workflows continue to be implemented and executed outside of scientific workflow systems due to the convenience and familiarity of scripting languages (such as Perl, Python, R, and MATLAB), and to the high productivity many scientists experience when using these languages. YesWorkflow is a set of software tools that aim to provide such users of scripting languages with many of the benefits of scientific workflow systems. YesWorkflow requires neither the use of a workflow engine nor the overhead of adapting code to run effectively in such a system. Instead, YesWorkflow enables scientists to annotate existing scripts with special comments that reveal the computational modules and dataflows otherwise implicit in these scripts. YesWorkflow tools extract and analyze these comments, represent the scripts in terms of entities based on the typical scientific workflow model, and provide graphical renderings of this workflow-like view of the scripts. Future versions of YesWorkflow also will allow the prospective provenance of the data products of these scripts to be queried in ways similar to those available to users of scientific workflow systems

    Transcription Factors Bind Thousands of Active and Inactive Regions in the Drosophila Blastoderm

    Get PDF
    Identifying the genomic regions bound by sequence-specific regulatory factors is central both to deciphering the complex DNA cis-regulatory code that controls transcription in metazoans and to determining the range of genes that shape animal morphogenesis. We used whole-genome tiling arrays to map sequences bound in Drosophila melanogaster embryos by the six maternal and gap transcription factors that initiate anterior–posterior patterning. We find that these sequence-specific DNA binding proteins bind with quantitatively different specificities to highly overlapping sets of several thousand genomic regions in blastoderm embryos. Specific high- and moderate-affinity in vitro recognition sequences for each factor are enriched in bound regions. This enrichment, however, is not sufficient to explain the pattern of binding in vivo and varies in a context-dependent manner, demonstrating that higher-order rules must govern targeting of transcription factors. The more highly bound regions include all of the over 40 well-characterized enhancers known to respond to these factors as well as several hundred putative new cis-regulatory modules clustered near developmental regulators and other genes with patterned expression at this stage of embryogenesis. The new targets include most of the microRNAs (miRNAs) transcribed in the blastoderm, as well as all major zygotically transcribed dorsal–ventral patterning genes, whose expression we show to be quantitatively modulated by anterior–posterior factors. In addition to these highly bound regions, there are several thousand regions that are reproducibly bound at lower levels. However, these poorly bound regions are, collectively, far more distant from genes transcribed in the blastoderm than highly bound regions; are preferentially found in protein-coding sequences; and are less conserved than highly bound regions. Together these observations suggest that many of these poorly bound regions are not involved in early-embryonic transcriptional regulation, and a significant proportion may be nonfunctional. Surprisingly, for five of the six factors, their recognition sites are not unambiguously more constrained evolutionarily than the immediate flanking DNA, even in more highly bound and presumably functional regions, indicating that comparative DNA sequence analysis is limited in its ability to identify functional transcription factor targets

    Capsaicin-Induced Changes in LTP in the Lateral Amygdala Are Mediated by TRPV1

    Get PDF
    The transient receptor potential vanilloid type 1 (TRPV1) channel is a well recognized polymodal signal detector that is activated by painful stimuli such as capsaicin. Here, we show that TRPV1 is expressed in the lateral nucleus of the amygdala (LA). Despite the fact that the central amygdala displays the highest neuronal density, the highest density of TRPV1 labeled neurons was found within the nuclei of the basolateral complex of the amygdala. Capsaicin specifically changed the magnitude of long-term potentiation (LTP) in the LA in brain slices of mice depending on the anesthetic (ether, isoflurane) used before euthanasia. After ether anesthesia, capsaicin had a suppressive effect on LA-LTP both in patch clamp and in extracellular recordings. The capsaicin-induced reduction of LTP was completely blocked by the nitric oxide synthase (NOS) inhibitor L-NAME and was absent in neuronal NOS as well as in TRPV1 deficient mice. The specific antagonist of cannabinoid receptor type 1 (CB1), AM 251, was also able to reduce the inhibitory effect of capsaicin on LA-LTP, suggesting that stimulation of TRPV1 provokes the generation of anandamide in the brain which seems to inhibit NO synthesis. After isoflurane anesthesia before euthanasia capsaicin caused a TRPV1-mediated increase in the magnitude of LA-LTP. Therefore, our results also indicate that the appropriate choice of the anesthetics used is an important consideration when brain plasticity and the action of endovanilloids will be evaluated. In summary, our results demonstrate that TRPV1 may be involved in the amygdala control of learning mechanisms
    corecore