17 research outputs found

    Pipeline overview of AGAPE for yeast.

    No full text
    <p>The pipeline consists of three parts; (a) assembly, (b) annotation, and (c) variation. Cylinder shapes indicate data, shaded cylinder final result data, arrows data flows, rectangular shapes programs, and dotted rectangular external package tools that are not included in our pipeline. After all ambiguous and low quality reads are discarded, the remaining reads are processed to generate assembly contigs (a). The assembly contigs from (a) are used as the input to annotate their genomic features including both reference ORFs inferred by a homology-based method and non-reference ORFs predicted by <i>ab initio</i> methods (b). Fungal (including yeast) protein and EST databases are used to accurately predict annotations. In a post-process annotation step, annotated ORFs are refined and corrected as shown in (b). For variation detections, the reads remaining after the error-correction step are mapped to the reference genome in (c). The procedure (c) then forks into two branches; one for unmapped and another for mapped reads. The unmapped reads are assembled in the manner described in (a) to contigs, then compared with the assembly contigs from (a) and annotation results from (b) to identify newly inserted sequences and ORFs that are not present in the reference genome. For the mapped reads, the mapping information is used for the HugeSeq pipeline that detects variations including SNPs relative to the reference. The SNP calls and the non-reference features identified in (c) can be used for further variation analysis using external tools, e.g. the Galaxy genome diversity tool and various R packages.</p

    Phylogenetic tree of the non-reference <i>MAL</i> gene family.

    No full text
    <p>The <i>MAL23</i>, <i>MAL43</i>, <i>MAL63</i>, and <i>MAL64</i> genes are known non-reference features that may be associated with maltose activator function. We included all non-reference <i>MAL</i> activator genes identified in <i>S</i>. <i>cerevisiae</i> including sequences from this study, sequences from Bergstrom <i>et al</i>. [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0120671#pone.0120671.ref017" target="_blank">17</a>], and ones deposited in the NCBI protein database. The <i>MAL</i> genes have been found in environmental and saké strains, but have not been detected in baking and European wine strains. One group of <i>MAL</i> genes in the upper part of the gene tree, detected in K11, YPS128, YPS163, UWOPS87, UWOPS83, SK1, and DBVPG6044 strains, is clustered separately from the other <i>MAL</i> genes.</p

    Pipeline validation based on annotation results.

    No full text
    <p>(A) Annotation accuracy of the pipeline is measured using the reference genome assembly as input. Whereas 80% of ORFs predicted by homology only are correct and 85% by MAKER only, our combined method with refinement steps predicts 98% of ORFs correctly. In terms of FDR, the combined method also shows better performance than the homology only or the MAKER only methods alone. (B) Annotation comparison of our non-reference ORFs to Bergstrom <i>et al</i>. [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0120671#pone.0120671.ref017" target="_blank">17</a>] shows that 77% of 319 non-reference ORFs from Bergstrom <i>et al</i>. are commonly found in our results from 18 non-S288C strains. We identify 40 non-reference ORFs that were not identified by Bergstrom <i>et al</i>. [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0120671#pone.0120671.ref017" target="_blank">17</a>] while Bergstrom <i>et al</i>. identify 72 non-reference ORFs not found in our study; these are presumably due to the non-overlapping strains among the sets of strains used in the two studies.</p

    Known features not present in the reference genome.

    No full text
    <p>Annotations for 8 non-reference ORFs that were identified by our pipeline in 25 strains have been maintained in SGD. (a) <i>MEL1</i> in D273, FL100, JK9, and UWOPS. (b) <i>RTM1</i> in D273 and FL100. (c) <i>MPR1</i> in JK9, RedStar, and Y55. (d) <i>BIO6</i> in K11: K11 is a saké strain and this is consistent with the description that BIO6 is present in saké strains. (e) <i>TAT3</i> in RM11_1A, SK1, UWOPS, YPS128, and YPS163. (f) <i>XDH1</i> in RedStar and YS9. (g) <i>MAL64</i> in K11, UWOPS, YPS163, YPS128, and 10560–6B. (h) <i>KHR1</i> in BC187, YS9, FL100, YJM339, Y55, K11, YPS163, DBVPG6044, YPS128, and L1528.</p

    Phylogenetic inferences and population structure of <i>S</i>. <i>cerevisiae</i> strains from variation.

    No full text
    <p>(A) A neighbor-joining tree based on non-reference ORFs among 18 <i>S</i>. <i>cerevisiae</i> strains. (B) A neighbor-joining tree based on SNPs relative to the reference among 25 <i>S</i>. <i>cerevisiae</i> strains. The origin of each strain is indicated by the color of the enclosing circle. Strains that originated from similar sources appear close to each other in both trees, but there are some differences (e.g. SK1, K11, and YJM339). (C) Population structure based on SNPs using the Genome diversity tool in Galaxy. Statistical scores were also computed by the Galaxy tool in order to choose the most appropriate number of clusters (K). In our case, “K = 2 or 3” showed the lowest cross-validation error scores among the K values tested (with scores of 0.90 and 0.95, respectively). Colors were generated automatically and are not congruent with colors used in A and B.</p

    Variations in <i>S</i>. <i>cerevisiae</i> strains.

    No full text
    <p>(A) Number of non-reference ORFs in 25 <i>S</i>. <i>cerevisiae</i> strains. (B) Number of SNPs relative to the reference. According the number of SNPs, BY4742, X2180, BY4741, and FY1679 are essentially identical to the reference strain (S288C) and there are no non-reference ORFs in these strains. This supports the notion that these four strains are the same as S288C within experimental error. The variation patterns between non-reference ORFs and the number of SNPs show that strains that have more SNPs tend to have more non-reference ORFs, but there are some strains that have different patterns (e.g. K11 and YS9).</p
    corecore