16 research outputs found
Automated annotation of the <i>G. bimaculatus de novo</i> transcriptome assembly using Gene Predictor.
<p>(A) Comparison of the proportion of non-redundant assembly sequences, isotigs and singletons that obtained a significant BLAST hit against <b>nr</b> (black bars), and those that were assigned a putative orthology by Gene Predictor (GP; white bars), based on the best reciprocal top BLAST hit with the <i>Drosophila melanogaster</i> proteome (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479.s004" target="_blank">Table S1</a>). (B) Comparison of the proportion of sequences with a significant BLAST hit in <b>nr</b> that also had a putative orthology assignment based on Gene Predictor (dark grey bars). All sequences assigned putative orthologs by Gene Predictor also had significant BLAST hits in <b>nr</b> (light grey bars).</p
Phylogenetic comparison of proportion of known proteomes represented in the <i>G. bimaculatus de novo</i> assembled transcriptome.
<p>The number (bold) and percentage (bold italics) of proteome sequences with a putative <i>G. bimaculatus</i> ortholog in the <i>de novo</i> transcriptome assembly is shown for selected animals with sequenced genomes (based on top BLAST hit, E-value cutoff 1e-5). Proteomes were predicted from genome sequence sources as shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479.s004" target="_blank">Table S1</a>. Numbers in large font in red and blue ovals indicate average proportion of sequences from all tested insect and deuterostome proteomes, respectively, represented in the <i>G. bimaculatus</i> transcriptome.</p
Coding region analysis of <i>G. bimaculatus de novo</i> transcriptome assembly sequences without significant BLAST hits in nr.
<p>Assembly products that failed to obtain significant BLAST hits in <b>nr</b> (white) were examined for the presence of coding regions (green) using EST Scan <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Iseli1" target="_blank">[52]</a>. Assembly sequences thus predicted to contain coding regions were examined for the presence of known coding domains (yellow) using InterPro Scan <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Zdobnov2" target="_blank">[53]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Quevillon1" target="_blank">[54]</a>. Results are shown separately for isotigs (A), singletons (B) and all non-redundant assembly products (C). See also <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone-0061479-t003" target="_blank">Table 3</a>.</p
Large-scale Orthopteran transcriptome resources to date.
1<p>Data from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Ma1" target="_blank">[73]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Kang1" target="_blank">[74]</a>.</p>2<p>Data from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Danley1" target="_blank">[75]</a>.</p>3<p>Data from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Badisco1" target="_blank">[76]</a>.</p>4<p>Data from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Chen1" target="_blank">[72]</a>.</p>5<p>Data from this report.</p>6<p>L = larval stage. nd = data not reported in the relevant publication <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Chen1" target="_blank">[72]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Ma1" target="_blank">[73]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Kang1" target="_blank">[74]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Badisco1" target="_blank">[76]</a>.</p>7<p>“N50” refers to isotig N50 from the <i>G. bimaculatus de novo</i> transcriptome assembly; mean contig length is shown for all other orthopteran transcriptome resources in this table.</p>8<p># singletons are shown for the <i>G. bimaculatus de novo</i> transcriptome assembly; # single ESTs (not incorporated into contigs) are shown for all other orthopteran transcriptome resources in this table.</p>9<p># unique BLAST hits against <b>nr</b> are shown for the <i>G. bimaculatus de novo</i> transcriptome assembly; # unigenes are shown for all other orthopteran transcriptome resources in this table.</p
Statistical comparison of isotig and singleton nucleotide sequence lengths according to BLAST annotation and predicted protein-coding status.
<p>Values shown are <i>p</i>≥0.05 value results of a Welch's t-test.</p>***<p> = <i>p</i><0.0001;</p>*<p><i>p</i><0.05.</p>1<p>BLAST E-value cutoff is e-5 for all hits reported in this table.</p>2<p><b>nr</b> = NCBI non-redundant database.</p>3<p>NRAS = all non-redundant assembly products (isotigs or singletons) regardless of BLAST results against <b>nr</b>.</p>4<p>Numbers of sequences in each category are shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone-0061479-g009" target="_blank">Figure 9</a>. Mean, median, maximum and minimum values for each category are shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone-0061479-t003" target="_blank">Tables 3</a> and <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone-0061479-t004" target="_blank">4</a>.</p
Sequence extension and gene discovery in the <i>G. bimaculatus</i> Hedgehog and Hippo pathways.
<p>(A) The <i>de novo</i> transcriptome assembly of <i>G. bimaculatus</i> newly identifies most members of the <i>hedgehog</i> pathway (red), from which only the <i>hedgehog</i> ligand (blue) was previously known (GenBank accession AB044709). (B) The transcriptome also adds significant sequence data to the fragments of many genes in the Hippo signaling pathway that had been previously identified (green). Seven genes of the known pathway were not identified in the transcriptome (yellow, white), two of which lack any sequence data in GenBank (white). GenBank accessions for previously identified sequences are as follows: <i>discs overgrown</i> (<i>dco</i>): AB443442; <i>expanded</i> (<i>ex</i>): AB378099; <i>warts</i> (<i>wts</i>): AB300574; <i>cyclin E</i> (<i>cycE</i>): AB378067; <i>hippo</i> (<i>hpo</i>): AB378070; <i>inhibitor of apoptosis protein</i> (<i>diap1</i>): AB378071; <i>mob as tumor suppressor</i> (<i>mats</i>): AB378072; <i>yorkie</i> (<i>yki</i>): AB378076; <i>scaffold protein salvador</i> (<i>sav</i>): AB378074; <i>Merlin</i> (<i>Mer</i>): AB378073; <i>Kibra:</i> DC445461.</p
Comparison of sequences lacking significant BLAST hits to nr, with <i>Laupala kohalensis</i> and <i>Locusta m</i>igratoria databases.
<p>(A–C) Assembly products that failed to obtain significant BLAST hits to <b>nr</b> (white) were examined for significant similarity (magenta) to transcripts from at least one of <i>L. migratoria</i> or <i>L. kohalensis </i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Chen1" target="_blank">[72]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Ma1" target="_blank">[73]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Kang1" target="_blank">[74]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Danley1" target="_blank">[75]</a>. (A′–C′) Assembly sequences thus identified were parsed into sequences with significant hits among only <i>L. kohalensis</i> sequences (red), only <i>L. migratoria</i> sequences (blue), or both (yellow). Results are shown separately for isotigs (A, A′), singletons (B, A′) and all non-redundant assembly products (C, A′).</p
Principal protein domain composition of <i>G. bimaculatus</i> transcriptome sequences with highest similarity to <i>Laupala kohalensis</i> or <i>Locusta migratoria</i> sequences.
<p>Relative proportions of the top 25 protein domains coded by <i>G. bimaculatus</i> transcriptome sequences with significant similarity to sequences from <i>L. kohalensis</i> (A), <i>L. migratoria</i> (B), or sequences from <b>nr</b> (C). Protein domain nomenclature from Pfam <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Bateman2" target="_blank">[102]</a> as follows: AdoHcyase_NAD: PF00670; Ank: PF00023; ATP-gua_Ptrans/N: PF02807; BTB/POZ: PF00651; C2: PF00168; DUF (combined): n/a; EFG domains (combined): n/a; efhand/like: PF09279; F-box: PF00646; Glyco_hydro (combined): n/a; GTP_EFTU domains: PF00009; Laps: PF10169; LRR_1: PF00560; Metallophos: PF00149; Myb_DNA-binding (combined): n/a; OS-D: PF03392; PARP: PF00644; PGAMP: PF07644; Pkinase: PF00069; Ras: PF00071; Ribosomal (combined): n/a; RRM_1: PF00076; RVT_1: PF00078; ubiquitin: PF00240; zinc finger (combined): n/a. “Combined” indicates that multiple Pfam accessions are combined.</p
Distribution of average coverage (bp/contig) within contigs produced by <i>de novo</i> assembly of the <i>G. bimaculatus</i> transcriptome.
<p>The coverage within contigs is calculated by dividing the total number of base pairs contained in the reads used to construct a contig by the length of that contig.</p
Assessment of gene discovery and read length capacity of the <i>G. bimaculatus de novo</i> assembled transcriptome.
<p>(A) Randomly selected subsets of the trimmed reads were assembled using Newbler v2.5 in 10% increments, up to and including 100% of trimmed reads. For each subassembly, the number of unique BLAST hits against the NCBI non-redundant database (<b>nr</b>) with an E-value cutoff of 1e-10 (red; left axis) and the average coverage per base pair (blue; right axis) was calculated (see text for details). The number of unique BLAST hits did not increase after at least 90% of reads (3,795,085 reads) were assembled, while the coverage per base pair continued to increase as reads were added to the assembly. (B) Isotig length distribution for each subassembly created as described in (A). (C) Isotig length distribution of each subassembly for isotigs ≥4 kb. High numbers (≥50) of isotigs over 4 kb in length are achieved only when ≥40% of reads (1,686,646 reads) are assembled.</p