14 research outputs found
Different patterns of alternative splicing between total RNA-seq and polyribosomal RNA-seq.
<p>(A) Six classes of alternative splicing events in the two samples. RI-SI: retained intron or skipped intron. RE-SE: retained exon and skipped exon. IWI-TWI: initiation within intron or termination within intron. AA: alternative acceptor. AD: alternative donor. ATE: alternative terminal exon. G-test was used to calculate likelihood ratio statistics. (B) Distribution of retained intron sizes predicted by PASA in total RNA-seq (top circle) and polyribosomal RNA-seq (bottom circle).</p
Polyribosomal RNA-Seq Reveals the Decreased Complexity and Diversity of the Arabidopsis Translatome
<div><p>Recent RNA-seq studies reveal that the transcriptomes in animals and plants are more complex than previously thought, leading to the inclusion of many more splice isoforms in annotated genomes. However, it is possible that a significant proportion of the transcripts are spurious isoforms that do not contribute to functional proteins. One of the current hypotheses is that commonly used mRNA extraction methods isolate both pre-mature (nuclear) mRNA and mature (cytoplasmic) mRNA, and these incompletely spliced pre-mature mRNAs may contribute to a large proportion of these spurious transcripts. To investigate this, we compared a traditional RNA-seq dataset (total RNA-seq) and a ribosome-bound RNA-seq dataset (polyribosomal RNA-seq) from <i>Arabidopsis thaliana</i>. An integrative framework that combined <i>de novo</i> assembly and genome-guided assembly was applied to reconstruct transcriptomes for the two datasets. Up to 44.8% of the <i>de novo</i> assembled transcripts in total RNA-seq sample were of low abundance, whereas only 0.09% in polyribosomal RNA-seq <i>de novo</i> assembly were of low abundance. The final round of assembly using PASA (Program to Assemble Spliced Alignments) resulted in more transcript assemblies in the total RNA-seq than those in polyribosomal sample. Comparison of alternative splicing (AS) patterns between total and polyribosomal RNA-seq showed a significant difference (G-test, p-value<0.01) in intron retention events: 46.4% of AS events in the total sample were intron retention, whereas only 23.5% showed evidence of intron retention in the polyribosomal sample. It is likely that a large proportion of retained introns in total RNA-seq result from incompletely spliced pre-mature mRNA. Overall, this study demonstrated that polyribosomal RNA-seq technology decreased the complexity and diversity of the coding transcriptome by eliminating pre-mature mRNAs, especially those of low abundance.</p></div
Coverage profiles along <i>A</i>. <i>thaliana</i> chromosomes and TAIR10 annotated CDS.
<p>(A) Distribution of RNA-seq read density along chromosome length is shown for total RNA-seq (left) and polyribosomal RNA-seq (right). The y axis represents the log2 scale of median read density. (B) Distribution of the RNA-seq read coverage along the length of the transcriptional unit. The log2 scale of median depth of coverage along the length of each individual TAIR10 annotated cDNA was calculated and plotted against the relative length of the transcriptional unit for the total RNA-seq and polyribosomal RNA-seq. (C) Coverage over the length of TAIR10 annotated CDS. Box-and-whisker plots depict the coverage calculated as the percentage of bases along the length of the cDNA sequence that was supported by reads from the total and polyribosomal RNA-seq datasets. The bottom and top of the boxes represent the 25<sup>th</sup> and 75<sup>th</sup> quartiles, respectively. The lines within boxes represent the medians.</p
Alternative splicing of <i>ATGSTF11</i> (A) and <i>AFC2</i> (B) genes.
<p>TAIR10 gene models, Full-length cDNA (FL-cDNA), transcripts assembled and reads alignments in the total RNA-seq and polyribosomal RNA-seq datasets are listed from top to bottom. Retained introns in total RNA-seq are highlighted using red rectangles. The red and blue colors represent forward and reverse reads in the read-alignment part, respectively.</p
Unique features of LRR genes in <i>Tetrahymena</i>.
(a) LRR gene exon length distributions for all 10 Tetrahymena species. Every species shows an exon peak at 90 bp, representing the exactly 90-bp exon arrays. An inset shows the detailed exon distribution range from 85 to 95 bp in length. From right to left (inset), species are T. thermophila, T. malaccensis, T. elliotti, T. pyriformis, T. vorax, T. borealis, T. canadensis, T. empidokyrea (mosquito parasite), T. shanghaiensis, and T. paravorax. (b) LRR gene TTHERM_000586765, an example of a 90-bp exon array gene masked by at least 1 of the 8 de novo–identified MAC LRR gene CRSs. (c) Extreme phase 2 bias of introns among 90-bp exon containing LRR genes in 10 Tetrahymena species. The 10 concentric circles represent the 10 species, from inside to outside: T. thermophila, T. malaccensis, T. elliotti, T. pyriformis, T. vorax, T. borealis, T. canadensis, T. empidokyrea (mosquito parasite), T. shanghaiensis, and T. paravorax. (d) Highly variable numbers of 90-bp exons in different LRR genes in all 10 species. The numbers of 90-bp exons in different genes were used to make the dot plot. From left to right, species are T. thermophila, T. malaccensis, T. elliotti, T. pyriformis, T. vorax, T. borealis, T. canadensis, T. empidokyrea (mosquito parasite), T. shanghaiensis, and T. paravorax. The color scheme is the same as panel A. Numerical data underlying this panel are listed in S2 Data. CRS, consensus repeat sequence; LRR, leucine-rich repeat; MAC, macronucleus; RNA-Seq, RNA sequencing.</p
Evolutionary model of the observed innovation in LRR genes with tandem 90-bp exons.
(a) Diagram of a Tetrahymena cell. (b) Key to LRR gene-related symbols. (c) Typical MIC chromosome showing the biased distribution of key genetic elements. Central green circle: centromere (not yet fully assembled and characterized). Red and blue shading: biased chromosomal distribution of youngest and most conserved genes, respectively. Darkest color: highest concentration. Pink dashed line above the chromosome: biased chromosomal distribution of TEs, REP included, and other repeated sequences. (d) Multiple exon-shuffling mechanisms proposed to explain how pericentromeric and subtelomeric regions of the MIC genome function as LRR gene innovation centers. (1) Unequal crossing over between 2 different exons of the same LRR gene leads to alleles with more and fewer tandem repeats. (2) Unequal crossing over between exons in 2 different LRR genes leads to exon duplications and deletions. (3) REP retrotransposition into a preexisting LRR gene (step 1), followed by possible (not yet demonstrated) REP-mediated retrotransduction of LRR gene repeats into another LRR gene (step 2) would lead to a net increase in number of LRR gene repeats. Pink line: transcript resulting from cotranscription of REP and 1 LRR gene repeat. Note that the right branch of represents a co-retrotransposition of REP and an LRR repeat, which lead to dispersal of the LRR repeats and could potentially mediate further ectopic recombinations. (4) REP copies, also being repeated sequences, can undergo unequal crossing over, with similar consequences as mechanism 2. (e) Representative product of the above mechanisms: LRR gene with long tandem arrays of 90-bp exons. LRR, leucine-rich repeat; MAC, macronucleus; MIC, micronucleus; non-LTR, Non-long terminal repeat; REP, REP-type retrotransposon; TE, transposable element.</p
Phylogenetic tree and estimated divergence times of 10 morphologically similar <i>Tetrahymena</i> species.
(a) Maximum likelihood species tree, using 198 one-to-one orthologs, 104,434 amino acid sites, and 1,000 bootstraps. Green boxes, estimated divergence times for each node; orange hexagons, number of shared ortholog groups for each node; gray bar, geologic timescale. (b) Overall cell morphology and (c) oral apparatus, as revealed by silver-staining. C, Cenozoic; M, Mesozoic; Mya, million years ago; N, Neoproterozoic; P, Paleozoic.</p
Phylogeny and MIC chromosome distribution of 12 nearly identical 90-bp exons in different <i>T</i>. <i>thermophila</i> LRR genes give evidence of extensive ectopic recombination.
Right: intron/exon diagram of the 10 LRR genes containing the twelve 90-bp exons (shown in red) that share between 88 and 90 identical nucleotides. Listed above each gene: MIC chromosome location. L or R indicates the left or right arm of chromosome. Left: a maximum likelihood phylogenetic tree based exclusively on these twelve 90-bp exons. Note that (a) this is the largest group of nearly identical 90-bp exons and (b) TTHERM_001443819 and TTHERM_00001659049 both have 2 exons that belong to this group. Identical 90-bp exons share the same symbol: yellow asterisk (*) or number sign (#). chr3, Chromosome 3; LRR, leucine-rich repeat; MIC, micronucleus; mid-arm, near the middle of chromosome arms; NA, not available (the gene is located in still unassembled region of MIC genome); pCen, pericentromeric region.</p
Pericentromeric and subtelomeric regions of MIC chromosomes are gene innovation centers.
Circos (http://circos.ca/) diagram mapping the frequency of various properties associated with rapidly evolving genes to the 5 chromosomes of the T. thermophila MIC genome. Chromosomes (after omitting IESs) were divided into approximately 1 Mb bins. Values were normalized for the total number of genes and plotted for each bin. SSG indicates the density distribution of all species-specific genes; y-axis is the number of genes. CSG indicates the density distribution of the most highly conserved genes (i.e., ortholog category X); y-axis is the number of genes. LRR indicates the density distribution of species-specific LRR genes; y-axis is the number of genes. Ka/Ks indicates the distribution of Ka/Ks ratios, plotted as median value of each bin. TD indicates the distribution of tandem gene duplication frequencies; y-axis is the percentage of tandem duplicated genes in this bin. GE indicates the gene expression level during vegetative growth (SPP medium), plotted as the median FPKM value. Note that the current chromosome-level assembly was generated based on short reads (e.g. Illumina), and the centromeric and some of subtelomeric regions are still incompletely assembled, which is the likely reason for the weak patterns seen at some chromosome termini (for example, both termini of chr2). chr2, Chromosome 2; CSG, conserved genes; FPKM, fragments per kilobase of exon per million reads mapped; GE, gene expression; IES, internal eliminated sequence; Ka/Ks, ratio of nonsynonymous to synonymous substitutions; LRR, leucine-rich repeat; MIC, micronucleus; SPP, Super Proteose Peptone; SSG, species-specific gene.</p
Top gene domains that contribute to the high MAC genome divergence in the 10 species.
(a) Percentage of species-specific genes in each species. (b) Heat map of the top 7 categories of domains found in species-specific genes: the bottom row (gray background) shows the total number of genes containing each domain category in all 10 species. CNBD, cyclic nucleotide-binding; GFR, growth factor receptor cysteine-rich; LRR, leucine-rich repeat; MAC, macronucleus; P-loop NTPase, P-loop-containing nucleoside triphosphate hydrolase; PK, protein kinase; TPR, tetratricopeptide repeat; WD40, WD40 repeat.</p
