18 research outputs found

    Multi-species integrative biclustering

    Get PDF
    We describe an algorithm, multi-species cMonkey, for the simultaneous biclustering of heterogeneous multiple-species data collections and apply the algorithm to a group of bacteria containing Bacillus subtilis, Bacillus anthracis, and Listeria monocytogenes. The algorithm reveals evolutionary insights into the surprisingly high degree of conservation of regulatory modules across these three species and allows data and insights from well-studied organisms to complement the analysis of related but less well studied organisms

    An experimentally supported model of the Bacillus subtilis global transcriptional regulatory network

    Get PDF
    Organisms from all domains of life use gene regulation networks to control cell growth, identity, function, and responses to environmental challenges. Although accurate global regulatory models would provide critical evolutionary and functional insights, they remain incomplete, even for the best studied organisms. Efforts to build comprehensive networks are confounded by challenges including network scale, degree of connectivity, complexity of organism–environment interactions, and difficulty of estimating the activity of regulatory factors. Taking advantage of the large number of known regulatory interactions in Bacillus subtilis and two transcriptomics datasets (including one with 38 separate experiments collected specifically for this study), we use a new combination of network component analysis and model selection to simultaneously estimate transcription factor activities and learn a substantially expanded transcriptional regulatory network for this bacterium. In total, we predict 2,258 novel regulatory interactions and recall 74% of the previously known interactions. We obtained experimental support for 391 (out of 635 evaluated) novel regulatory edges (62% accuracy), thus significantly increasing our understanding of various cell processes, such as spore formation

    Comparative Microbial Modules Resource: Generation and Visualization of Multi-species Biclusters

    Get PDF
    The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures – results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation

    “Same difference”: comprehensive evaluation of four DNA methylation measurement platforms

    No full text
    Abstract Background DNA methylation in CpG context is fundamental to the epigenetic regulation of gene expression in higher eukaryotes. Changes in methylation patterns are implicated in many diseases, cellular differentiation, imprinting, and other biological processes. Techniques that enrich for biologically relevant genomic regions with high CpG content are desired, since, depending on the size of an organism’s methylome, the depth of sequencing required to cover all CpGs can be prohibitively expensive. Currently, restriction enzyme-based reduced representation bisulfite sequencing and its modified protocols are widely used to study methylation differences. Recently, Agilent Technologies, Roche NimbleGen, and Illumina have ventured to both reduce sequencing costs and capture CpGs of known biological relevance by marketing in-solution custom-capture hybridization platforms. We aimed to evaluate the similarities and differences of these four methods considering each platform targets approximately 10–13% of the human methylome. Results Overall, the regions covered per platform were as expected: targeted capture-based methods covered > 95% of their designed regions, whereas the restriction enzyme-based method covered > 70% of the expected fragments. While the total number of CpG loci shared by all methods was low, ~ 24% of any platform, the methylation levels of CpGs covered by all platforms were concordant. Annotation of CpG loci with genomic features revealed roughly the same proportions of feature annotations across the four platforms. Targeted capture methods comprise similar types and coverage of annotations and, relative to the targeted methods, the restriction enzyme method covers fewer promoters (~ 9%), CpG shores (~ 8%) and unannotated loci (~ 11%). Conclusions Although all methods are largely consistent in terms of covered CpG loci, the commercially available capture methods result in covering nearly all CpG sites in their target regions with few off-target loci and covering similar proportions of annotated CpG loci, the restriction-based enrichment results in more off-target and unannotated CpG loci. Quality of DNA is very important for restriction-based enrichment and starting material can be low. Conversely, quality of the starting material is less important for capture methods, and at least twice the amount of starting material is required. Pricing is marginally less for restriction-based enrichment, and the number of samples that can be prepared is not restricted to the number of capture reactions a kit supports. However, the advantage of capture libraries is the ability to custom design areas of interest. The choice of the technique would be decided by the number of samples, the quality and quantity of DNA available and the biological areas of interest since comparable data are obtained from all platforms

    Multiplexing of ChIP-Seq Samples in an Optimized Experimental Condition Has Minimal Impact on Peak Detection

    No full text
    <div><p>Multiplexing samples in sequencing experiments is a common approach to maximize information yield while minimizing cost. In most cases the number of samples that are multiplexed is determined by financial consideration or experimental convenience, with limited understanding on the effects on the experimental results. Here we set to examine the impact of multiplexing ChIP-seq experiments on the ability to identify a specific epigenetic modification. We performed peak detection analyses to determine the effects of multiplexing. These include false discovery rates, size, position and statistical significance of peak detection, and changes in gene annotation. We found that, for histone marker H3K4me3, one can multiplex up to 8 samples (7 IP + 1 input) at ~21 million single-end reads each and still detect over 90% of all peaks found when using a full lane for sample (~181 million reads). Furthermore, there are no variations introduced by indexing or lane batch effects and importantly there is no significant reduction in the number of genes with neighboring H3K4me3 peaks. We conclude that, for a well characterized antibody and, therefore, model IP condition, multiplexing 8 samples per lane is sufficient to capture most of the biological signal.</p></div

    MOESM1 of “Same difference”: comprehensive evaluation of four DNA methylation measurement platforms

    No full text
    Additional file 1. Table S1. Library input details. Table S2. Target region properties and CpGs covered. Table S3. Sequencing details. Figure S1. Number of CpG-units covered, Mean and median coverage per CpG-unit. Figure S2. Intra- and Inter-platform CpG-unit overlap and methylation levels concordance. Table S4. Intra- and Inter-platform details. Figure S3. Overlap of exon annotation of CpG-units as UpSet plot. Figure S4. Overlap of intron annotation of CpG-units as UpSet plot. Figure S5. Overlap of promoters annotation of CpG-units as UpSet plot. Figure S6. Overlap of CpG island annotation of CpG-units as UpSet plot. Figure S7. Overlap CpG shores annotation of CpG-units as UpSet plot. Figure S8. Overlap of unannotated CpG-units as UpSet plot

    Peak characteristics.

    No full text
    <p>A) P-values for detected peaks shift towards reduced significance as multiplexing increases. B) The difference in peak apex position of peaks detected in multiplexed libraries to peak apex positions of peaks detected in the non-multiplexed library shows consistent difference across all multiplexed levels while increasing variability as multiplexing increases. C) Peak width distributions show a marginal reduction across multiplex levels.</p

    ChIP-seq multiplexing sequencing scheme.

    No full text
    <p>The ChIP-seq multiplexing titration scheme consists of: one whole lane of ChIP sample (1-plex), one whole lane of input sample (1-plex), two lanes with 4 samples (4-plex) of 2 ChIP and 2 input samples in each lane, one lane with 6 samples (6-plex) of 1 input and 5 ChIP samples, and one lane with 8 samples (8-plex) of 1 input and 7 ChIP samples. Sample labels correspond to sample type and llumina TruSeq indexed used (e.g. ChIP-5 is IP library with index number 5)</p
    corecore