59 research outputs found

    Sashimi plots: Quantitative visualization of RNA sequencing read alignments

    Full text link
    We introduce Sashimi plots, a quantitative multi-sample visualization of mRNA sequencing reads aligned to gene annotations. Sashimi plots are made using alignments (stored in the SAM/BAM format) and gene model annotations (in GFF format), which can be custom-made by the user or obtained from databases such as Ensembl or UCSC. We describe two implementations of Sashimi plots: (1) a stand-alone command line implementation aimed at making customizable publication quality figures, and (2) an implementation built into the Integrated Genome Viewer (IGV) browser, which enables rapid and dynamic creation of Sashimi plots for any genomic region of interest, suitable for exploratory analysis of alternatively spliced regions of the transcriptome. Isoform expression estimates outputted by the MISO program can be optionally plotted along with Sashimi plots. Sashimi plots can be used to quickly screen differentially spliced exons along genomic regions of interest and can be used in publication quality figures. The Sashimi plot software and documentation is available from: http://genes.mit.edu/burgelab/miso/docs/sashimi.htmlComment: 2 figure

    Transcriptome-wide Mapping Reveals Widespread Dynamic-Regulated Pseudouridylation of ncRNA and mRNA

    Get PDF
    Pseudouridine is the most abundant RNA modification, yet except for a few well-studied cases, little is known about the modified positions and their function(s). Here, we develop Ψ-seq for transcriptome-wide quantitative mapping of pseudouridine. We validate Ψ-seq with spike-ins and de novo identification of previously reported positions and discover hundreds of unique sites in human and yeast mRNAs and snoRNAs. Perturbing pseudouridine synthases (PUS) uncovers which pseudouridine synthase modifies each site and their target sequence features. mRNA pseudouridinylation depends on both site-specific and snoRNA-guided pseudouridine synthases. Upon heat shock in yeast, Pus7p-mediated pseudouridylation is induced at >200 sites, and PUS7 deletion decreases the levels of otherwise pseudouridylated mRNA, suggesting a role in enhancing transcript stability. rRNA pseudouridine stoichiometries are conserved but reduced in cells from dyskeratosis congenita patients, where the PUS DKC1 is mutated. Our work identifies an enhanced, transcriptome-wide scope for pseudouridine and methods to dissect its underlying mechanisms and function

    Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells

    Get PDF
    Recent molecular studies have shown that, even when derived from a seemingly homogenous population, individual cells can exhibit substantial differences in gene expression, protein levels and phenotypic output1, 2, 3, 4, 5, with important functional consequences4, 5. Existing studies of cellular heterogeneity, however, have typically measured only a few pre-selected RNAs1, 2 or proteins5, 6 simultaneously, because genomic profiling methods3 could not be applied to single cells until very recently7, 8, 9, 10. Here we use single-cell RNA sequencing to investigate heterogeneity in the response of mouse bone-marrow-derived dendritic cells (BMDCs) to lipopolysaccharide. We find extensive, and previously unobserved, bimodal variation in messenger RNA abundance and splicing patterns, which we validate by RNA-fluorescence in situ hybridization for select transcripts. In particular, hundreds of key immune genes are bimodally expressed across cells, surprisingly even for genes that are very highly expressed at the population average. Moreover, splicing patterns demonstrate previously unobserved levels of heterogeneity between cells. Some of the observed bimodality can be attributed to closely related, yet distinct, known maturity states of BMDCs; other portions reflect differences in the usage of key regulatory circuits. For example, we identify a module of 137 highly variable, yet co-regulated, antiviral response genes. Using cells from knockout mice, we show that variability in this module may be propagated through an interferon feedback circuit, involving the transcriptional regulators Stat2 and Irf7. Our study demonstrates the power and promise of single-cell genomics in uncovering functional diversity between cells and in deciphering cell states and circuits.National Institutes of Health (U.S.) (NIH Postdoctoral Fellowship (1F32HD075541-01))Charles H. Hood Foundation (Postdoctoral Fellowship)National Institutes of Health (U.S.) (NIH grant U54 AI057159)National Institutes of Health (U.S.) (NIH New Innovator Award (DP2 OD002230))National Institutes of Health (U.S.) (NIH CEGS Award (1P50HG006193-01))National Institutes of Health (U.S.) (NIH Pioneer Awards (5DP1OD003893-03))National Institutes of Health (U.S.) (NIH Pioneer Awards (DP1OD003958-01))Broad Institute of MIT and HarvardBroad Institute of MIT and Harvard (Klarman Cell Observatory

    Single-cell RNA-seq reveals dynamic paracrine control of cellular variation

    Get PDF
    High-throughput single-cell transcriptomics offers an unbiased approach for understanding the extent, basis and function of gene expression variation between seemingly identical cells. Here we sequence single-cell RNA-seq libraries prepared from over 1,700 primary mouse bone-marrow-derived dendritic cells spanning several experimental conditions. We find substantial variation between identically stimulated dendritic cells, in both the fraction of cells detectably expressing a given messenger RNA and the transcript’s level within expressing cells. Distinct gene modules are characterized by different temporal heterogeneity profiles. In particular, a ‘core’ module of antiviral genes is expressed very early by a few ‘precocious’ cells in response to uniform stimulation with a pathogenic component, but is later activated in all cells. By stimulating cells individually in sealed microfluidic chambers, analysing dendritic cells from knockout mice, and modulating secretion and extracellular signalling, we show that this response is coordinated by interferon-mediated paracrine signalling from these precocious cells. Notably, preventing cell-to-cell communication also substantially reduces variability between cells in the expression of an early-induced ‘peaked’ inflammatory module, suggesting that paracrine signalling additionally represses part of the inflammatory program. Our study highlights the importance of cell-to-cell communication in controlling cellular heterogeneity and reveals general strategies that multicellular populations can use to establish complex dynamic responses.National Human Genome Research Institute (U.S.). Centers of Excellence in Genomic Science (1P50HG006193-01)National Institutes of Health (U.S.). Pioneer Award (DP1OD003958-01)Howard Hughes Medical InstituteBroad Institute of MIT and Harvard. Klarman Cell Observator

    Large-scale discovery of insertion hotspots and preferential integration sites of human transposed elements

    Get PDF
    Throughout evolution, eukaryotic genomes have been invaded by transposable elements (TEs). Little is known about the factors leading to genomic proliferation of TEs, their preferred integration sites and the molecular mechanisms underlying their insertion. We analyzed hundreds of thousands nested TEs in the human genome, i.e. insertions of TEs into existing ones. We first discovered that most TEs insert within specific ‘hotspots’ along the targeted TE. In particular, retrotransposed Alu elements contain a non-canonical single nucleotide hotspot for insertion of other Alu sequences. We next devised a method for identification of integration sequence motifs of inserted TEs that are conserved within the targeted TEs. This method revealed novel sequences motifs characterizing insertions of various important TE families: Alu, hAT, ERV1 and MaLR. Finally, we performed a global assessment to determine the extent to which young TEs tend to nest within older transposed elements and identified a 4-fold higher tendency of TEs to insert into existing TEs than to insert within non-TE intergenic regions. Our analysis demonstrates that TEs are highly biased to insert within certain TEs, in specific orientations and within specific targeted TE positions. TE nesting events also reveal new characteristics of the molecular mechanisms underlying transposition

    Transcriptome-wide discovery of circular RNAs in Archaea

    Get PDF
    Circular RNA forms had been described in all domains of life. Such RNAs were shown to have diverse biological functions, including roles in the life cycle of viral and viroid genomes, and in maturation of permuted tRNA genes. Despite their potentially important biological roles, discovery of circular RNAs has so far been mostly serendipitous. We have developed circRNA-seq, a combined experimental/computational approach that enriches for circular RNAs and allows profiling their prevalence in a whole-genome, unbiased manner. Application of this approach to the archaeon Sulfolobus solfataricus P2 revealed multiple circular transcripts, a subset of which was further validated independently. The identified circular RNAs included expected forms, such as excised tRNA introns and rRNA processing intermediates, but were also enriched with non-coding RNAs, including C/D box RNAs and RNase P, as well as circular RNAs of unknown function. Many of the identified circles were conserved in Sulfolobus acidocaldarius, further supporting their functional significance. Our results suggest that circular RNAs, and particularly circular non-coding RNAs, are more prevalent in archaea than previously recognized, and might have yet unidentified biological roles. Our study establishes a specific and sensitive approach for identification of circular RNAs using RNA-seq, and can readily be applied to other organisms

    Detection and Removal of Biases in the Analysis of Next-Generation Sequencing Reads

    Get PDF
    Since the emergence of next-generation sequencing (NGS) technologies, great effort has been put into the development of tools for analysis of the short reads. In parallel, knowledge is increasing regarding biases inherent in these technologies. Here we discuss four different biases we encountered while analyzing various Illumina datasets. These biases are due to both biological and statistical effects that in particular affect comparisons between different genomic regions. Specifically, we encountered biases pertaining to the distributions of nucleotides across sequencing cycles, to mappability, to contamination of pre-mRNA with mRNA, and to non-uniform hydrolysis of RNA. Most of these biases are not specific to one analyzed dataset, but are present across a variety of datasets and within a variety of genomic contexts. Importantly, some of these biases correlated in a highly significant manner with biological features, including transcript length, gene expression levels, conservation levels, and exon-intron architecture, misleadingly increasing the credibility of results due to them. We also demonstrate the relevance of these biases in the context of analyzing an NGS dataset mapping transcriptionally engaged RNA polymerase II (RNAPII) in the context of exon-intron architecture, and show that elimination of these biases is crucial for avoiding erroneous interpretation of the data. Collectively, our results highlight several important pitfalls, challenges and approaches in the analysis of NGS reads

    Alu Exonization Events Reveal Features Required for Precise Recognition of Exons by the Splicing Machinery

    Get PDF
    Despite decades of research, the question of how the mRNA splicing machinery precisely identifies short exonic islands within the vast intronic oceans remains to a large extent obscure. In this study, we analyzed Alu exonization events, aiming to understand the requirements for correct selection of exons. Comparison of exonizing Alus to their non-exonizing counterparts is informative because Alus in these two groups have retained high sequence similarity but are perceived differently by the splicing machinery. We identified and characterized numerous features used by the splicing machinery to discriminate between Alu exons and their non-exonizing counterparts. Of these, the most novel is secondary structure: Alu exons in general and their 5′ splice sites (5′ss) in particular are characterized by decreased stability of local secondary structures with respect to their non-exonizing counterparts. We detected numerous further differences between Alu exons and their non-exonizing counterparts, among others in terms of exon–intron architecture and strength of splicing signals, enhancers, and silencers. Support vector machine analysis revealed that these features allow a high level of discrimination (AUC = 0.91) between exonizing and non-exonizing Alus. Moreover, the computationally derived probabilities of exonization significantly correlated with the biological inclusion level of the Alu exons, and the model could also be extended to general datasets of constitutive and alternative exons. This indicates that the features detected and explored in this study provide the basis not only for precise exon selection but also for the fine-tuned regulation thereof, manifested in cases of alternative splicing
    corecore