Search CORE

16 research outputs found

Automated annotation of the G. bimaculatus de novo transcriptome assembly using Gene Predictor.

Author: Ben Ewen-Campen (410109)
Cassandra G. Extavour (410112)
Hadley W. Horch (410110)
Siegfried Roth (223271)
Taro Mito (410111)
Victor Zeng (410108)
Publication venue
Publication date
Field of study

(A) Comparison of the proportion of non-redundant assembly sequences, isotigs and singletons that obtained a significant BLAST hit against nr (black bars), and those that were assigned a putative orthology by Gene Predictor (GP; white bars), based on the best reciprocal top BLAST hit with the Drosophila melanogaster proteome (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479.s004" target="_blank">Table S1</a>). (B) Comparison of the proportion of sequences with a significant BLAST hit in nr that also had a putative orthology assignment based on Gene Predictor (dark grey bars). All sequences assigned putative orthologs by Gene Predictor also had significant BLAST hits in nr (light grey bars).</p

The Francis Crick Institute

Phylogenetic comparison of proportion of known proteomes represented in the G. bimaculatus de novo assembled transcriptome.

Author: Ben Ewen-Campen (410109)
Cassandra G. Extavour (410112)
Hadley W. Horch (410110)
Siegfried Roth (223271)
Taro Mito (410111)
Victor Zeng (410108)
Publication venue
Publication date
Field of study

The number (bold) and percentage (bold italics) of proteome sequences with a putative G. bimaculatus ortholog in the de novo transcriptome assembly is shown for selected animals with sequenced genomes (based on top BLAST hit, E-value cutoff 1e-5). Proteomes were predicted from genome sequence sources as shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479.s004" target="_blank">Table S1</a>. Numbers in large font in red and blue ovals indicate average proportion of sequences from all tested insect and deuterostome proteomes, respectively, represented in the G. bimaculatus transcriptome.</p

The Francis Crick Institute

Coding region analysis of G. bimaculatus de novo transcriptome assembly sequences without significant BLAST hits in nr.

Author: Ben Ewen-Campen (410109)
Cassandra G. Extavour (410112)
Hadley W. Horch (410110)
Siegfried Roth (223271)
Taro Mito (410111)
Victor Zeng (410108)
Publication venue
Publication date
Field of study

Assembly products that failed to obtain significant BLAST hits in nr (white) were examined for the presence of coding regions (green) using EST Scan <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Iseli1" target="_blank">[52]</a>. Assembly sequences thus predicted to contain coding regions were examined for the presence of known coding domains (yellow) using InterPro Scan <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Zdobnov2" target="_blank">[53]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Quevillon1" target="_blank">[54]</a>. Results are shown separately for isotigs (A), singletons (B) and all non-redundant assembly products (C). See also <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone-0061479-t003" target="_blank">Table 3</a>.</p

The Francis Crick Institute

Large-scale Orthopteran transcriptome resources to date.

Author: Ben Ewen-Campen (410109)
Cassandra G. Extavour (410112)
Hadley W. Horch (410110)
Siegfried Roth (223271)
Taro Mito (410111)
Victor Zeng (410108)
Publication venue
Publication date
Field of study

1Data from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Ma1" target="_blank">[73]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Kang1" target="_blank">[74]</a>.2Data from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Danley1" target="_blank">[75]</a>.3Data from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Badisco1" target="_blank">[76]</a>.4Data from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Chen1" target="_blank">[72]</a>.5Data from this report.6L = larval stage. nd = data not reported in the relevant publication <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Chen1" target="_blank">[72]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Ma1" target="_blank">[73]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Kang1" target="_blank">[74]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Badisco1" target="_blank">[76]</a>.7“N50” refers to isotig N50 from the G. bimaculatus de novo transcriptome assembly; mean contig length is shown for all other orthopteran transcriptome resources in this table.8# singletons are shown for the G. bimaculatus de novo transcriptome assembly; # single ESTs (not incorporated into contigs) are shown for all other orthopteran transcriptome resources in this table.9# unique BLAST hits against nr are shown for the G. bimaculatus de novo transcriptome assembly; # unigenes are shown for all other orthopteran transcriptome resources in this table.</p

The Francis Crick Institute

Statistical comparison of isotig and singleton nucleotide sequence lengths according to BLAST annotation and predicted protein-coding status.

Author: Ben Ewen-Campen (410109)
Cassandra G. Extavour (410112)
Hadley W. Horch (410110)
Siegfried Roth (223271)
Taro Mito (410111)
Victor Zeng (410108)
Publication venue
Publication date
Field of study

Values shown are p≥0.05 value results of a Welch's t-test.*** = p<0.0001;*p<0.05.1BLAST E-value cutoff is e-5 for all hits reported in this table.2nr = NCBI non-redundant database.3NRAS = all non-redundant assembly products (isotigs or singletons) regardless of BLAST results against nr.4Numbers of sequences in each category are shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone-0061479-g009" target="_blank">Figure 9</a>. Mean, median, maximum and minimum values for each category are shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone-0061479-t003" target="_blank">Tables 3</a> and <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone-0061479-t004" target="_blank">4</a>.</p

The Francis Crick Institute

Sequence extension and gene discovery in the G. bimaculatus Hedgehog and Hippo pathways.

Author: Ben Ewen-Campen (410109)
Cassandra G. Extavour (410112)
Hadley W. Horch (410110)
Siegfried Roth (223271)
Taro Mito (410111)
Victor Zeng (410108)
Publication venue
Publication date
Field of study

(A) The de novo transcriptome assembly of G. bimaculatus newly identifies most members of the hedgehog pathway (red), from which only the hedgehog ligand (blue) was previously known (GenBank accession AB044709). (B) The transcriptome also adds significant sequence data to the fragments of many genes in the Hippo signaling pathway that had been previously identified (green). Seven genes of the known pathway were not identified in the transcriptome (yellow, white), two of which lack any sequence data in GenBank (white). GenBank accessions for previously identified sequences are as follows: discs overgrown (dco): AB443442; expanded (ex): AB378099; warts (wts): AB300574; cyclin E (cycE): AB378067; hippo (hpo): AB378070; inhibitor of apoptosis protein (diap1): AB378071; mob as tumor suppressor (mats): AB378072; yorkie (yki): AB378076; scaffold protein salvador (sav): AB378074; Merlin (Mer): AB378073; Kibra: DC445461.</p

The Francis Crick Institute

Comparison of sequences lacking significant BLAST hits to nr, with Laupala kohalensis and Locusta migratoria databases.

Author: Ben Ewen-Campen (410109)
Cassandra G. Extavour (410112)
Hadley W. Horch (410110)
Siegfried Roth (223271)
Taro Mito (410111)
Victor Zeng (410108)
Publication venue
Publication date
Field of study

(A–C) Assembly products that failed to obtain significant BLAST hits to nr (white) were examined for significant similarity (magenta) to transcripts from at least one of L. migratoria or L. kohalensis <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Chen1" target="_blank">[72]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Ma1" target="_blank">[73]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Kang1" target="_blank">[74]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Danley1" target="_blank">[75]</a>. (A′–C′) Assembly sequences thus identified were parsed into sequences with significant hits among only L. kohalensis sequences (red), only L. migratoria sequences (blue), or both (yellow). Results are shown separately for isotigs (A, A′), singletons (B, A′) and all non-redundant assembly products (C, A′).</p

The Francis Crick Institute

Principal protein domain composition of G. bimaculatus transcriptome sequences with highest similarity to Laupala kohalensis or Locusta migratoria sequences.

Author: Ben Ewen-Campen (410109)
Cassandra G. Extavour (410112)
Hadley W. Horch (410110)
Siegfried Roth (223271)
Taro Mito (410111)
Victor Zeng (410108)
Publication venue
Publication date
Field of study

Relative proportions of the top 25 protein domains coded by G. bimaculatus transcriptome sequences with significant similarity to sequences from L. kohalensis (A), L. migratoria (B), or sequences from nr (C). Protein domain nomenclature from Pfam <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0061479#pone.0061479-Bateman2" target="_blank">[102]</a> as follows: AdoHcyase_NAD: PF00670; Ank: PF00023; ATP-gua_Ptrans/N: PF02807; BTB/POZ: PF00651; C2: PF00168; DUF (combined): n/a; EFG domains (combined): n/a; efhand/like: PF09279; F-box: PF00646; Glyco_hydro (combined): n/a; GTP_EFTU domains: PF00009; Laps: PF10169; LRR_1: PF00560; Metallophos: PF00149; Myb_DNA-binding (combined): n/a; OS-D: PF03392; PARP: PF00644; PGAMP: PF07644; Pkinase: PF00069; Ras: PF00071; Ribosomal (combined): n/a; RRM_1: PF00076; RVT_1: PF00078; ubiquitin: PF00240; zinc finger (combined): n/a. “Combined” indicates that multiple Pfam accessions are combined.</p

The Francis Crick Institute

Distribution of average coverage (bp/contig) within contigs produced by de novo assembly of the G. bimaculatus transcriptome.

Author: Ben Ewen-Campen (410109)
Cassandra G. Extavour (410112)
Hadley W. Horch (410110)
Siegfried Roth (223271)
Taro Mito (410111)
Victor Zeng (410108)
Publication venue
Publication date
Field of study

The coverage within contigs is calculated by dividing the total number of base pairs contained in the reads used to construct a contig by the length of that contig.</p

The Francis Crick Institute

Assessment of gene discovery and read length capacity of the G. bimaculatus de novo assembled transcriptome.

Author: Ben Ewen-Campen (410109)
Cassandra G. Extavour (410112)
Hadley W. Horch (410110)
Siegfried Roth (223271)
Taro Mito (410111)
Victor Zeng (410108)
Publication venue
Publication date
Field of study

(A) Randomly selected subsets of the trimmed reads were assembled using Newbler v2.5 in 10% increments, up to and including 100% of trimmed reads. For each subassembly, the number of unique BLAST hits against the NCBI non-redundant database (nr) with an E-value cutoff of 1e-10 (red; left axis) and the average coverage per base pair (blue; right axis) was calculated (see text for details). The number of unique BLAST hits did not increase after at least 90% of reads (3,795,085 reads) were assembled, while the coverage per base pair continued to increase as reads were added to the assembly. (B) Isotig length distribution for each subassembly created as described in (A). (C) Isotig length distribution of each subassembly for isotigs ≥4 kb. High numbers (≥50) of isotigs over 4 kb in length are achieved only when ≥40% of reads (1,686,646 reads) are assembled.</p

The Francis Crick Institute

Automated annotation of the <i>G. bimaculatus de novo</i> transcriptome assembly using Gene Predictor.

Phylogenetic comparison of proportion of known proteomes represented in the <i>G. bimaculatus de novo</i> assembled transcriptome.

Coding region analysis of <i>G. bimaculatus de novo</i> transcriptome assembly sequences without significant BLAST hits in nr.

Large-scale Orthopteran transcriptome resources to date.

Statistical comparison of isotig and singleton nucleotide sequence lengths according to BLAST annotation and predicted protein-coding status.

Sequence extension and gene discovery in the <i>G. bimaculatus</i> Hedgehog and Hippo pathways.

Comparison of sequences lacking significant BLAST hits to nr, with <i>Laupala kohalensis</i> and <i>Locusta m</i>igratoria databases.

Principal protein domain composition of <i>G. bimaculatus</i> transcriptome sequences with highest similarity to <i>Laupala kohalensis</i> or <i>Locusta migratoria</i> sequences.

Distribution of average coverage (bp/contig) within contigs produced by <i>de novo</i> assembly of the <i>G. bimaculatus</i> transcriptome.

Assessment of gene discovery and read length capacity of the <i>G. bimaculatus de novo</i> assembled transcriptome.