22 research outputs found
MOESM1 of Optimized sequencing depth and de novo assembler for deeply reconstructing the transcriptome of the tea plant, an economically important plant species
Additional file 1: Figure S1. Venn diagram shows the transcripts intersected in different assemblers. Figure S2. Transcripts mapped to coding sequences and genome sequences with different coverage and identity thresholds. Figure S3. Assembly quality of transcripts at different expression levels. Figure S4. Completeness of the assembled transcripts at different expression levels. Figure S5. Comparison of the assembly performances between two replications of datasets randomly selected from eight representative tissues of tea plant. Figure S6. Statistic of the transcriptome assemblies using Bridger with different amount of sequencing data from replicate 2. Table S1. Summary of transcriptome assemblies of tea plant in previous studies. Table S2. Summary of the data used in this study. Table S3. Coverage of transcripts mapped to the reference genome. Table S4. Statistic of the Bridger assembly using different k-mer values. (a) Assembly characteristics; (b) completeness assessment using BUSCO; (c) length distribution. Table S5. Statistics of assembly. (a) Apical bud; (b) flower; (c) fruit; (d) second young leaf; (e) mature leaf in summer; (f) first young leaf; (g) root; (h) stem. Table S6. Length distribution of assembly. (a) Apical bud; (b) flower; (c) fruit, (d) second young leaf; (e) mature leaf in summer; (f) first young leaf; (g) root; (h) stem. Table S7. BUSCO evaluation of assembly. (a) Apical bud; (b) flower; (c) fruit; (d) second young leaf; (e) mature leaf in summer; (f) first young leaf; (g) root; (h) stem. Table S8. Statistic of assembly. (a) Apical bud and first young leaf; (b) apical bud and root. Table S9. Statistic of pooled assembly. (a) Assembly characteristics; (b) length distribution; (c) BUSCO evaluation. Table S10. Runtime (hours) performance for each assembler with different amount of input data. (a) 0.5 Gb; (b) 1 Gb; (c) 3 Gb. Table S11. Completeness of transcriptomes generated in tea plant using PacBio technology
Summary of the chloroplast genome sequencing, assembly and features.
a<p>LSC, large single copy;</p>b<p>SSC, small single copy;</p>c<p>IR, inverted repeats.</p
RNA editing detected by transcriptome reads mapping.
a<p>Strands are indicated with “+”, positive strand, and “−”, negative strand;</p>b<p>Base in the positive strand;</p>c<p>Transcriptome reads that represent corresponding base substitutions that were counted;</p>d<p>Underline indicates the edited base.</p
Core reactions of fatty acid biosynthesis reconstructed based on the <i>de novo</i> assembly and annotation of <i>C. oleifera</i> transcriptome.
<p>During fatty acid biosynthesis, two-carbon units are added for each cycle reaction, and the four-step cycle is repeated until the appropriate chain-length is reached. Finally, different types of fatty acids are synthesized. The identified enzymes are shown in boxes and abbreviated as below: ACC, acetyl-CoA carboxylase (EC: 6.4.1.2); MAT, malonyl-CoA ACP transacylase (EC: 2.3.1.39); KAS, beta-ketoacyl-ACP synthase (KAS I, EC: 2.3.1.41; KASII, EC: 2.3.1.179; KAS III, EC: 2.3.1.180); KAR, beta-ketoacyl-ACP reductase (EC: 1.1.1.100); HAD, beta-hydroxyacyl-ACP dehydrase (EC: 4.2.1.-); EAR, enoyl-ACP reductase (EC: 1.3.1.9); AAD, acyl-ACP desaturase (EC: 1.14.19.2); OAH, oleoyl-ACP hydrolase (EC: 3.1.2.14); FatA, Acyl-ACP thioesterase A (EC: 3.1.2.-); Δ<sup>12</sup>D, Δ<sup>12</sup>(ω<sup>6</sup>)-desaturase (EC: 1.4.19.6). The numbers-in-circles indicates the repeat time of the condensation reaction.</p
Contradiction between Plastid Gene Transcription and Function Due to Complex Posttranscriptional Splicing: An Exemplary Study of <em>ycf15</em> Function and Evolution in Angiosperms
<div><p>Plant chloroplast genes are usually co-transcribed while its posttranscriptional splicing is fairly complex and remains largely unsolved. On basis of sequencing the three complete <i>Camellia</i> (Theaceae) chloroplast genomes for the first time, we comprehensively analyzed the evolutionary patterns of <i>ycf15</i>, a plastid gene quite paradoxical in terms of its function and evolution, along the inferred angiosperm phylogeny. Although many species in separate lineages including the three species reported here contained an intact <i>ycf15</i> gene in their chloroplast genomes, the phylogenetic mixture of both intact and obviously disabled <i>ycf15</i> genes imply that they are all non-functional. Both intracellular gene transfer (IGT) and horizontal gene transfer (HGT) failed to explain such distributional anomalies. While, transcriptome analyses revealed that <i>ycf15</i> was transcribed as precursor polycistronic transcript which contained <i>ycf2</i>, <i>ycf15</i> and antisense <i>trnL-CAA</i>. The transcriptome assembly was surprisingly found to cover near the complete <i>Camellia</i> chloroplast genome. Many non-coding regions including pseudogenes were mapped by multiple transcripts, indicating the generality of pseudogene transcriptions. Our results suggest that plastid DNA posttranscriptional splicing may involve complex cleavage of non-functional genes.</p> </div
The map of the three <i>Camellia</i> chloroplast genome sequences.
<p>Genes on the outside of the map are transcribed in the clockwise direction and genes on the inside of the map are transcribed in the counterclockwise direction. Dashed area in the inner circle indicates the GC content of the chloroplast genome.</p
Quantitative RT-PCR validations of the 17 candidate lipid-related genes in the <i>C. oleifera</i> transcriptome.
<p>17 candidate unigenes involved in lipid metabolism including (<b>a</b>) fatty acid and (<b>b</b>) TAG pathways were selected for the quantitative RT-PCR analysis. Standard error of the mean for three biological replicates (nested with three technical replicates) is represented by the error bars. Results represent the mean (± SD) of the three experiments. The translation elongation factor 1-alpha (TEF) gene was chosen as an internal standard.</p
Transcriptome Analysis of the Oil-Rich Tea Plant, <i>Camellia oleifera</i>, Reveals Candidate Genes Related to Lipid Metabolism
<div><p>Background</p><p>Rapidly driven by the need for developing sustainable sources of nutritionally important fatty acids and the rising concerns about environmental impacts after using fossil oil, oil-plants have received increasing awareness nowadays. As an important oil-rich plant in China, <i>Camellia oleifera</i> has played a vital role in providing nutritional applications, biofuel productions and chemical feedstocks. However, the lack of <i>C. oleifera</i> genome sequences and little genetic information have largely hampered the urgent needs for efficient utilization of the abundant germplasms towards modern breeding efforts of this woody oil-plant.</p><p>Results</p><p>Here, using the 454 GS-FLX sequencing platform, we generated approximately 600,000 RNA-Seq reads from four tissues of <i>C. oleifera</i>. These reads were trimmed and assembled into 104,842 non-redundant putative transcripts with a total length of ∼38.9 Mb, representing more than 218-fold of all the <i>C. oleifera</i> sequences currently deposited in the GenBank (as of March 2014). Based on the BLAST similarity searches, nearly 42.6% transcripts could be annotated with known genes, conserved domains, or Gene Ontology (GO) terms. Comparisons with the cultivated tea tree, <i>C. sinensis</i>, identified 3,022 pairs of orthologs, of which 211 exhibited the evidence under positive selection. Pathway analysis detected the majority of genes potentially related to lipid metabolism. Evolutionary analysis of omega-6 fatty acid desaturase (<i>FAD2</i>) genes among 20 oil-plants unexpectedly suggests that a parallel evolution may occur between <i>C. oleifera</i> and <i>Olea oleifera</i>. Additionally, more than 2,300 simple sequence repeats (SSRs) and 20,200 single-nucleotide polymorphisms (SNPs) were detected in the <i>C. oleifera</i> transcriptome.</p><p>Conclusions</p><p>The generated transcriptome represents a considerable increase in the number of sequences deposited in the public databases, providing an unprecedented opportunity to discover all related-genes associated with lipid metabolic pathway in <i>C. oleifera</i>. It will greatly enhance the generation of new varieties of <i>C. oleifera</i> with increased yields and high quality.</p></div
Phylogenetic analyses of the <i>FAD2</i> genes among 20 oil-plants.
<p>(<b>a</b>) The alignment of Cole|AFK31315 (<i>C. oleifera</i>, AFK31315), Cche|AGH32914 (<i>C. chekiangoleosa</i>, AGH32914) and ColeFAD2 (ColeIsotig4522:451–1599) amino acid sequences. The solid black lines indicate conserved amino acids. The filled boxes represent three H-boxes, including HECGH (red box), HRRHH (blue box), and HVAHH (green box). The position (left) is based on <i>FAD2</i> gene in <i>C. chekiangoleosa</i> (AGH32914). The three inconsistent amino acids were plotted in uppercase letters (black). Multiple sequence alignment was performed using ClustalW <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0104150#pone.0104150-Chenna1" target="_blank">[58]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0104150#pone.0104150-Larkin1" target="_blank">[59]</a> package. (<b>b</b>) The amino acid sequences were used for phylogenetic tree analysis. The asterisk indicates the <i>FAD2</i> gene (ColeFAD2) detected in the assembled <i>C. oleifera</i> transcriptome (ColeIsotig4522:451–1599). I–V represent the five groups of all the 20 oil-plants classified by the sequence similarity. The GenBank accession numbers and the full species names of the genes used here are: Scom|CAA63432 (<i>Solanum commersonii</i>, CAA63432); Atha|NP_187819 (<i>Arabidopsis thaliana</i>, NP_187819); Hann|AAL68982 (<i>Helianthus annuus</i>, AAL68982); Brap|CAD30827 (<i>Brassica rapa</i>, CAD30827); Sole|BAC22091 (<i>Spinacia oleracea</i>, BAC22091); Oeur|AAL93620 (<i>Olea europaea</i>, AAL93620); Pgra|AAO37754 (<i>Punica granatum</i>, AAO37754); Oeur|AAW63041 (<i>Olea europaea</i>, AAW63041); Gmax|BAD89862 (<i>Glycine max</i>, BAD89862); Hbra|AAY87459 (<i>Hevea brasiliensis</i>, AAY87459); Jcur|ABA41034 (<i>Jatropha curcas</i>, ABA41034); Ptom|ABC41578 (<i>Populus tomentosa</i>, ABC41578); Vmon|ABL86147 (<i>Vernicia Montana</i>, ABL86147); Lusi|ACF49507 (<i>Linum usitatissimum</i>, ACF49507); Rcom|002530704 (<i>Ricinus communis</i>, XP_002530704); Ahyp|ACZ06072 (<i>Arachis hypogaea</i>, ACZ06072); Pvul|ADO17551 (<i>Phaseolus vulgaris</i>, ADO17551); Vfor|AEE69020 (<i>Vernicia fordii</i>, AEE69020); Vlab|AEI60128 (<i>Vitis labrusca</i>, AEI60128); Cole|AFK31315 (<i>C. oleifera</i>, AFK31315); Cche|AGH32914 (<i>C. chekiangoleosa</i>, AGH32914).</p
ML phylogram of 55 taxa based on <i>ycf15</i> gene sequences.
<p>The analyzed genes include both intact and disabled sequences. The tree has a –lnL = 1983.94. Bootstrap support values >50% are given at nodes. The red branches indicate that intact <i>ycf15</i> gene is present in related species. Scale bar indicates the increment of 0.01 substitutions per site.</p
