42 research outputs found

    Comparison of Pattern Detection Methods in Microarray Time Series of the Segmentation Clock

    Get PDF
    While genome-wide gene expression data are generated at an increasing rate, the repertoire of approaches for pattern discovery in these data is still limited. Identifying subtle patterns of interest in large amounts of data (tens of thousands of profiles) associated with a certain level of noise remains a challenge. A microarray time series was recently generated to study the transcriptional program of the mouse segmentation clock, a biological oscillator associated with the periodic formation of the segments of the body axis. A method related to Fourier analysis, the Lomb-Scargle periodogram, was used to detect periodic profiles in the dataset, leading to the identification of a novel set of cyclic genes associated with the segmentation clock. Here, we applied to the same microarray time series dataset four distinct mathematical methods to identify significant patterns in gene expression profiles. These methods are called: Phase consistency, Address reduction, Cyclohedron test and Stable persistence, and are based on different conceptual frameworks that are either hypothesis- or data-driven. Some of the methods, unlike Fourier transforms, are not dependent on the assumption of periodicity of the pattern of interest. Remarkably, these methods identified blindly the expression profiles of known cyclic genes as the most significant patterns in the dataset. Many candidate genes predicted by more than one approach appeared to be true positive cyclic genes and will be of particular interest for future research. In addition, these methods predicted novel candidate cyclic genes that were consistent with previous biological knowledge and experimental validation in mouse embryos. Our results demonstrate the utility of these novel pattern detection strategies, notably for detection of periodic profiles, and suggest that combining several distinct mathematical approaches to analyze microarray datasets is a valuable strategy for identifying genes that exhibit novel, interesting transcriptional patterns

    Comprehensive Structural and Substrate Specificity Classification of the Saccharomyces cerevisiae Methyltransferome

    Get PDF
    Methylation is one of the most common chemical modifications of biologically active molecules and it occurs in all life forms. Its functional role is very diverse and involves many essential cellular processes, such as signal transduction, transcriptional control, biosynthesis, and metabolism. Here, we provide further insight into the enzymatic methylation in S. cerevisiae by conducting a comprehensive structural and functional survey of all the methyltransferases encoded in its genome. Using distant homology detection and fold recognition, we found that the S. cerevisiae methyltransferome comprises 86 MTases (53 well-known and 33 putative with unknown substrate specificity). Structural classification of their catalytic domains shows that these enzymes may adopt nine different folds, the most common being the Rossmann-like. We also analyzed the domain architecture of these proteins and identified several new domain contexts. Interestingly, we found that the majority of MTase genes are periodically expressed during yeast metabolic cycle. This finding, together with calculated isoelectric point, fold assignment and cellular localization, was used to develop a novel approach for predicting substrate specificity. Using this approach, we predicted the general substrates for 24 of 33 putative MTases and confirmed these predictions experimentally in both cases tested. Finally, we show that, in S. cerevisiae, methylation is carried out by 34 RNA MTases, 32 protein MTases, eight small molecule MTases, three lipid MTases, and nine MTases with still unknown substrate specificity

    Stochasticity of replication forks' speeds plays a key role in the dynamics of DNA replication.

    No full text
    Eukaryotic DNA replication is elaborately orchestrated to duplicate the genome timely and faithfully. Replication initiates at multiple origins from which replication forks emanate and travel bi-directionally. The complex spatio-temporal regulation of DNA replication remains incompletely understood. To study it, computational models of DNA replication have been developed in S. cerevisiae. However, in spite of the experimental evidence of forks' speed stochasticity, all models assumed that forks' speeds are the same. Here, we present the first model of DNA replication assuming that speeds vary stochastically between forks. Utilizing data from both wild-type and hydroxyurea-treated yeast cells, we show that our model is more accurate than models assuming constant forks' speed and reconstructs dynamics of DNA replication faithfully starting both from population-wide data and data reflecting fork movement in individual cells. Completion of replication in a timely manner is a challenge due to its stochasticity; we propose an empirically derived modification to replication speed based on the distance to the approaching fork, which promotes timely completion of replication. In summary, our work discovers a key role that stochasticity of the forks' speed plays in the dynamics of DNA replication. We show that without including stochasticity of forks' speed it is not possible to accurately reconstruct movement of individual replication forks, measured by DNA combing

    Acta Crystallographica Section A Foundations of

    No full text
    electronic reprint The crystallographic fast Fourier transform. Recursive symmetry reductio

    Acta Crystallographica Section A Foundations of

    No full text
    electronic reprint The crystallographic fast Fourier transform. IV. FFT-asymmetric units in the reciprocal spac

    Strategies for achieving high sequencing accuracy for low diversity samples and avoiding sample bleeding using illumina platform.

    No full text
    Sequencing microRNA, reduced representation sequencing, Hi-C technology and any method requiring the use of in-house barcodes result in sequencing libraries with low initial sequence diversity. Sequencing such data on the Illumina platform typically produces low quality data due to the limitations of the Illumina cluster calling algorithm. Moreover, even in the case of diverse samples, these limitations are causing substantial inaccuracies in multiplexed sample assignment (sample bleeding). Such inaccuracies are unacceptable in clinical applications, and in some other fields (e.g. detection of rare variants). Here, we discuss how both problems with quality of low-diversity samples and sample bleeding are caused by incorrect detection of clusters on the flowcell during initial sequencing cycles. We propose simple software modifications (Long Template Protocol) that overcome this problem. We present experimental results showing that our Long Template Protocol remarkably increases data quality for low diversity samples, as compared with the standard analysis protocol; it also substantially reduces sample bleeding for all samples. For comprehensiveness, we also discuss and compare experimental results from alternative approaches to sequencing low diversity samples. First, we discuss how the low diversity problem, if caused by barcodes, can be avoided altogether at the barcode design stage. Second and third, we present modified guidelines, which are more stringent than the manufacturer's, for mixing low diversity samples with diverse samples and lowering cluster density, which in our experience consistently produces high quality data from low diversity samples. Fourth and fifth, we present rescue strategies that can be applied when sequencing results in low quality data and when there is no more biological material available. In such cases, we propose that the flowcell be re-hybridized and sequenced again using our Long Template Protocol. Alternatively, we discuss how analysis can be repeated from saved sequencing images using the Long Template Protocol to increase accuracy

    Comparison of quality of sequencing of the BLESS samples using Illumina default protocol (template of length 4) and our Long Template Protocol (template of length 20).

    No full text
    <p>Columns: Sample name, template generation protocol used, % of low diversity sample, % of diverse control, cluster density on the flowcell (expressed as percentage of optimal cluster density), number and percentage of reads with Phred score >30, percentage of reads with intact BLESS barcode among reads with bases with Phred scores >30 normalized for the percentage of the BLESS sample (i.e. if 10% of the sample is BLESS and 10% of reads among Q30 reads are barcoded, that percentage would be normalized to 100%), number of reads with intact BLESS barcode mapped to the human genome with 0 errors. Note that samples A4, LT20_1 and LT20_2 are technical replicates and there is 70-fold increase in number of reads with intact BLESS barcode after switching to our Long Template Protocol.</p><p>Comparison of quality of sequencing of the BLESS samples using Illumina default protocol (template of length 4) and our Long Template Protocol (template of length 20).</p

    Results of experiments with rehybridized flowcell and Long Template Protocol (template of length 16).

    No full text
    <p>Columns: Lane, cluster density (as in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0120520#pone.0120520.t001" target="_blank">Table 1</a>), tiles processed successfully, total number of reads sequenced, percentage of bases with Phred score >30 (according to Illumina >85% indicates good quality data), normalized percentage of reads with intact barcode (as in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0120520#pone.0120520.t001" target="_blank">Table 1</a>), percentage barcoded reads that mapped to genome.</p><p>Results of experiments with rehybridized flowcell and Long Template Protocol (template of length 16).</p

    Severe dilution combined with lowering cluster density substantially improves data quality from sequencing low diversity samples.

    No full text
    <p>Shown are median Phred quality scores per-base of our low initial sequence diversity sample (referenced later as A4) (red and black), spiked in with 50% PhiX, the maximal dilution recommended by the manufacturer. Extremely low quality scores lead to a practically unusable sample. Also shown are improved scores of this same sample re-sequenced using more dilution and lowering cluster density, as we recommend (blue). Lowering cluster density alone does not improve the results if standard Illumina data analysis protocol is used (green).</p
    corecore