121 research outputs found

    Automated Workflow for Preparation of cDNA for Cap Analysis of Gene Expression on a Single Molecule Sequencer

    Get PDF
    Background: Cap analysis of gene expression (CAGE) is a 59 sequence tag technology to globally determine transcriptional starting sites in the genome and their expression levels and has most recently been adapted to the HeliScope single molecule sequencer. Despite significant simplifications in the CAGE protocol, it has until now been a labour intensive protocol. Methodology: In this study we set out to adapt the protocol to a robotic workflow, which would increase throughput and reduce handling. The automated CAGE cDNA preparation system we present here can prepare 96 β€˜HeliScope ready ’ CAGE cDNA libraries in 8 days, as opposed to 6 weeks by a manual operator.We compare the results obtained using the same RNA in manual libraries and across multiple automation batches to assess reproducibility. Conclusions: We show that the sequencing was highly reproducible and comparable to manual libraries with an 8 fold increase in productivity. The automated CAGE cDNA preparation system can prepare 96 CAGE sequencing samples simultaneously. Finally we discuss how the system could be used for CAGE on Illumina/SOLiD platforms, RNA-seq and fulllengt

    Large scale variation in the rate of germ-line de novo mutation, base composition, divergence and diversity in humans

    Get PDF
    It has long been suspected that the rate of mutation varies across the human genome at a large scale based on the divergence between humans and other species. However, it is now possible to directly investigate this question using the large number of de novo mutations (DNMs) that have been discovered in humans through the sequencing of trios. We investi- gate a number of questions pertaining to the distribution of mutations using more than 130,000 DNMs from three large datasets. We demonstrate that the amount and pattern of variation differs between datasets at the 1MB and 100KB scales probably as a consequence of differences in sequencing technology and processing. In particular, datasets show differ- ent patterns of correlation to genomic variables such as replication time. Never-the-less there are many commonalities between datasets, which likely represent true patterns. We show that there is variation in the mutation rate at the 100KB, 1MB and 10MB scale that can- not be explained by variation at smaller scales, however the level of this variation is modest at large scales–at the 1MB scale we infer that ~90% of regions have a mutation rate within 50% of the mean. Different types of mutation show similar levels of variation and appear to vary in concert which suggests the pattern of mutation is relatively constant across the genome. We demonstrate that variation in the mutation rate does not generate large-scale variation in GC-content, and hence that mutation bias does not maintain the isochore struc- ture of the human genome. We find that genomic features explain less than 40% of the explainable variance in the rate of DNM. As expected the rate of divergence between spe- cies is correlated to the rate of DNM. However, the correlations are weaker than expected if all the variation in divergence was due to variation in the mutation rate. We provide evidence that this is due the effect of biased gene conversion on the probability that a mutation will become fixed. In contrast to divergence, we find that most of the variation in diversity can be explained by variation in the mutation rate. Finally, we show that the correlation between divergence and DNM density declines as increasingly divergent species are considered

    Bivalent-Like Chromatin Markers Are Predictive for Transcription Start Site Distribution in Human

    Get PDF
    Deep sequencing of 5β€² capped transcripts has revealed a variety of transcription initiation patterns, from narrow, focused promoters to wide, broad promoters. Attempts have already been made to model empirically classified patterns, but virtually no quantitative models for transcription initiation have been reported. Even though both genetic and epigenetic elements have been associated with such patterns, the organization of regulatory elements is largely unknown. Here, linear regression models were derived from a pool of regulatory elements, including genomic DNA features, nucleosome organization, and histone modifications, to predict the distribution of transcription start sites (TSS). Importantly, models including both active and repressive histone modification markers, e.g. H3K4me3 and H4K20me1, were consistently found to be much more predictive than models with only single-type histone modification markers, indicating the possibility of β€œbivalent-like” epigenetic control of transcription initiation. The nucleosome positions are proposed to be coded in the active component of such bivalent-like histone modification markers. Finally, we demonstrated that models trained on one cell type could successfully predict TSS distribution in other cell types, suggesting that these models may have a broader application range

    NIST interlaboratory study on glycosylation analysis of monoclonal antibodies : comparison of results from diverse analytical methods

    Get PDF
    Glycosylation is a topic of intense current interest in the development of biopharmaceuticals since it is related to drug safety and efficacy. This work describes results of an interlaboratory study on the glycosylation of the Primary Sample (PS) of NISTmAb, a monoclonal antibody reference material. Seventy‑six laboratories from industry, university, research, government, and hospital sectors in Europe, North America, Asia, and Australia submitted a total of 103 reports on glycan distributions. The principal objective of this study was to report and compare results for the full range of analytical methods presently used in the glycosylation Β analysis of mAbs. Therefore, participation was unrestricted, with laboratories choosing their own measurement techniques. Protein glycosylation was determined in various ways, including at the level of intact mAb, protein fragments, glycopeptides, or released glycans, using a wide variety of methods for derivatization, separation, identification, and quantification. Consequently, the diversity of results was enormous, with the number of glycan compositions identified by each laboratory ranging from 4 to 48. In total, one hundred sixteen glycan compositions were reported, of which 57 compositions could be assigned consensus abundance values. These consensus medians provide community-derived values for NISTmAb PS. Agreement with the consensus medians did not depend on the specific method or laboratory type.. The study provides a view of the current state-of-the-art for biologic glycosylation measurement and suggests a clear need for harmonization of glycosylation analysis methods

    Modeling double strand break susceptibility to interrogate structural variation in cancer

    Get PDF
    Abstract Background Structural variants (SVs) are known to play important roles in a variety of cancers, but their origins and functional consequences are still poorly understood. Many SVs are thought to emerge from errors in the repair processes following DNA double strand breaks (DSBs). Results We used experimentally quantified DSB frequencies in cell lines with matched chromatin and sequence features to derive the first quantitative genome-wide models of DSB susceptibility. These models are accurate and provide novel insights into the mutational mechanisms generating DSBs. Models trained in one cell type can be successfully applied to others, but a substantial proportion of DSBs appear to reflect cell type-specific processes. Using model predictions as a proxy for susceptibility to DSBs in tumors, many SV-enriched regions appear to be poorly explained by selectively neutral mutational bias alone. A substantial number of these regions show unexpectedly high SV breakpoint frequencies given their predicted susceptibility to mutation and are therefore credible targets of positive selection in tumors. These putatively positively selected SV hotspots are enriched for genes previously shown to be oncogenic. In contrast, several hundred regions across the genome show unexpectedly low levels of SVs, given their relatively high susceptibility to mutation. These novel coldspot regions appear to be subject to purifying selection in tumors and are enriched for active promoters and enhancers. Conclusions We conclude that models of DSB susceptibility offer a rigorous approach to the inference of SVs putatively subject to selection in tumors

    Quantitative Models of the Mechanisms That Control Genome-Wide Patterns of Transcription Factor Binding during Early Drosophila Development

    Get PDF
    Transcription factors that drive complex patterns of gene expression during animal development bind to thousands of genomic regions, with quantitative differences in binding across bound regions mediating their activity. While we now have tools to characterize the DNA affinities of these proteins and to precisely measure their genome-wide distribution in vivo, our understanding of the forces that determine where, when, and to what extent they bind remains primitive. Here we use a thermodynamic model of transcription factor binding to evaluate the contribution of different biophysical forces to the binding of five regulators of early embryonic anterior-posterior patterning in Drosophila melanogaster. Predictions based on DNA sequence and in vitro protein-DNA affinities alone achieve a correlation of ∼0.4 with experimental measurements of in vivo binding. Incorporating cooperativity and competition among the five factors, and accounting for spatial patterning by modeling binding in every nucleus independently, had little effect on prediction accuracy. A major source of error was the prediction of binding events that do not occur in vivo, which we hypothesized reflected reduced accessibility of chromatin. To test this, we incorporated experimental measurements of genome-wide DNA accessibility into our model, effectively restricting predicted binding to regions of open chromatin. This dramatically improved our predictions to a correlation of 0.6–0.9 for various factors across known target genes. Finally, we used our model to quantify the roles of DNA sequence, accessibility, and binding competition and cooperativity. Our results show that, in regions of open chromatin, binding can be predicted almost exclusively by the sequence specificity of individual factors, with a minimal role for protein interactions. We suggest that a combination of experimentally determined chromatin accessibility data and simple computational models of transcription factor binding may be used to predict the binding landscape of any animal transcription factor with significant precision
    • …
    corecore