322 research outputs found

    Comparative Analyses of De Novo Transcriptome Assembly Pipelines for Diploid Wheat

    Get PDF
    Gene expression and transcriptome analysis are currently one of the main focuses of research for a great number of scientists. However, the assembly of raw sequence data to obtain a draft transcriptome of an organism is a complex multi-stage process usually composed of pre-processing, assembling, and post-processing. Each of these stages includes multiple steps such as data cleaning, error correction and assembly validation. Different combinations of steps, as well as different computational methods for the same step, generate transcriptome assemblies with different accuracy. Thus, using a combination that generates more accurate assemblies is crucial for any novel biological discoveries. Implementing accurate transcriptome assembly requires a great knowledge of different algorithms, bioinformatics tools and software that can be used in an analysis pipeline. Many pipelines can be represented as automated scalable scientific workflows that can be run simultaneously on powerful distributed and computational resources, such as Campus Clusters, Grids, and Clouds, and speed-up the analyses. In this thesis, we 1) compared and optimized de novo transcriptome assembly pipelines for diploid wheat; 2) investigated the impact of a few key parameters for generating accurate transcriptome assemblies, such as digital normalization and error correction methods, de novo assemblers and k-mer length strategies; 3) built distributed and scalable scientific workflow for blast2cap3, a step from the transcriptome assembly pipeline for protein-guided assembly, using the Pegasus Workflow Management System (WMS); and 4) deployed and examined the scientific workflow for blast2cap3 on two different computational platforms. Based on the analysis performed in this thesis, we conclude that the best transcriptome assembly is produced when the error correction method is used with Velvet Oases and the “multi-k” strategy. Moreover, the performed experiments show that the Pegasus WMS implementation of blast2cap3 reduces the running time for more than 95% compared to its current serial implementation. The results presented in this thesis provide valuable insight for designing good de novo transcriptome assembly pipeline and show the importance of using scientific workflows for executing computationally demanding pipelines. Advisor: Jitender S. Deogu

    Genomic tools for durum wheat breeding: de novo assembly of Svevo transcriptome and SNP discovery in elite germplasm

    Get PDF
    BACKGROUND: The tetraploid durum wheat (Triticum turgidum L. ssp. durum Desf. Husnot) is an important crop which provides the raw material for pasta production and a valuable source of genetic diversity for breeding hexaploid wheat (Triticum aestivum L.). Future breeding efforts to enhance yield potential and climate resilience will increasingly rely on genomics-based approaches to identify and select beneficial alleles. A deeper characterisation of the molecular and functional diversity of the durum wheat transcriptome will be instrumental to more effectively harness its genetic diversity. RESULTS: We report on the de novo transcriptome assembly of durum wheat cultivar 'Svevo'. The transcriptome of four tissues/organs (shoots and roots at the seedling stage, reproductive organs and developing grains) was assembled de novo, yielding 180,108 contigs, with a N50 length of 1121\u2009bp and mean contig length of 883\u2009bp. Alignment against the transcriptome of nine plant species identified 43% of transcripts with homology to at least one reference transcriptome. The functional annotation was completed by means of a combination of complementary software. The presence of differential expression between the A- and B-homoeolog copies of the durum wheat tetraploid genome was ascertained by phase reconstruction of polymorphic sites based on the T. urartu transcripts and inferring homoeolog-specific sequences. We observed greater expression divergence between A and B homoeologs in grains rather than in leaves and roots. The transcriptomes of 13 durum wheat cultivars spanning the breeding period from 1969 to 2005 were analysed for SNP diversity, leading to 95,358 non-rare, hemi-SNPs shared among two or more cultivars and 33,747 locus-specific (diploid inheritance) SNPs. CONCLUSIONS: Our study updates and expands the de novo transcriptome reference assembly available for durum wheat. Out of 180,108 assembled transcripts, 13,636 were specific to the Svevo cultivar as compared to the only other reference transcriptome available for durum, thus contributing to the identification of the tetraploid wheat pan-transcriptome. Additionally, the analysis of 13 historically relevant hallmark varieties produced a SNP dataset that could successfully validate the genotyping in tetraploid wheat and provide a valuable resource for genomics-assisted breeding of both tetraploid and hexaploid wheats

    Transcriptome Analysis of Root Development in Wheat \u3cem\u3e Triticum Aestivum\u3c/em\u3e Using High Throughtput Sequencing Technologies

    Get PDF
    Root provides plant water, nutrients and anchorage from soil. Most our knowledge of molecular mechanisms of root development is from the dicot model plant Arabidopsis, but very few studies have done in monocot crop systems like rice, maize, and wheat. We are studying very short root (VSR) phenotype in wheat, and lack of a sequenced reference genome in wheat prompted us to sequence and assemble the root transcriptome of the reference cultivar Chinese Spring (CS). A root transcriptome was assembled from the sequenced reads generated from root tip and the mature root tissues of CS. Approximately 169 million reads were successfully assembled into ~91K transcripts coding for functional proteins. Of these ~91K transcripts, 1,728 were differentially expressed in root tip as compared to the rest of the mature tissues. Generation of the root reference transcriptome and the availability of a reasonable reference genome sequence for wheat enabled us to analyze the gene expression in the long root (LR) and VSR. A total of 4,412 genes were differentially expressed in the VSR compared to the LR root tips. A significant portion of the differentially expressed genes functioning in the hormonal responses, regulation of transcription, defense response, reactive oxygen species (ROS), abiotic stress response, lignin biosynthesis, calcium signaling, and autophagy pathways were induced. In addition, several negative regulators of cell proliferation, including homologs of the BIGBROTHER E3 ubiquitin ligase, and negative regulators of root cell elongation, such as genes encoding the FERONIA kinases and a RALF peptide hormone, were also up-regulated in VSR. Consistent with this, a large number of genes for chromatin replication and protein syntheses, including those coding for histones and ribosomal proteins, and cell wall remodeling enzymes, were down-regulated in VSR. The ROS and lignin accumulation in the VSR were further validated by histochemical staining. This research revealed several molecular mechanisms of root development, based on which a working model was proposed to explain the VSR development. Although the related pathways identified in Arabidopsis may play a similar role in wheat, the VSR phenotype is probably governed by a unique mechanism that may be cereal- or wheat-specific

    Genomic tools for durum wheat breeding: De novo assembly of Svevo transcriptome and SNP discovery in elite germplasm

    Get PDF
    Abstract Background The tetraploid durum wheat (Triticum turgidum L. ssp. durum Desf. Husnot) is an important crop which provides the raw material for pasta production and a valuable source of genetic diversity for breeding hexaploid wheat (Triticum aestivum L.). Future breeding efforts to enhance yield potential and climate resilience will increasingly rely on genomics-based approaches to identify and select beneficial alleles. A deeper characterisation of the molecular and functional diversity of the durum wheat transcriptome will be instrumental to more effectively harness its genetic diversity. Results We report on the de novo transcriptome assembly of durum wheat cultivar ‘Svevo’. The transcriptome of four tissues/organs (shoots and roots at the seedling stage, reproductive organs and developing grains) was assembled de novo, yielding 180,108 contigs, with a N50 length of 1121 bp and mean contig length of 883 bp. Alignment against the transcriptome of nine plant species identified 43% of transcripts with homology to at least one reference transcriptome. The functional annotation was completed by means of a combination of complementary software. The presence of differential expression between the A- and B-homoeolog copies of the durum wheat tetraploid genome was ascertained by phase reconstruction of polymorphic sites based on the T. urartu transcripts and inferring homoeolog-specific sequences. We observed greater expression divergence between A and B homoeologs in grains rather than in leaves and roots. The transcriptomes of 13 durum wheat cultivars spanning the breeding period from 1969 to 2005 were analysed for SNP diversity, leading to 95,358 non-rare, hemi-SNPs shared among two or more cultivars and 33,747 locus-specific (diploid inheritance) SNPs. Conclusions Our study updates and expands the de novo transcriptome reference assembly available for durum wheat. Out of 180,108 assembled transcripts, 13,636 were specific to the Svevo cultivar as compared to the only other reference transcriptome available for durum, thus contributing to the identification of the tetraploid wheat pan-transcriptome. Additionally, the analysis of 13 historically relevant hallmark varieties produced a SNP dataset that could successfully validate the genotyping in tetraploid wheat and provide a valuable resource for genomics-assisted breeding of both tetraploid and hexaploid wheats

    A high-quality genome of Eragrostis curvula grass provides insights into Poaceae evolution and supports new strategies to enhance forage quality

    Get PDF
    The Poaceae constitute a taxon of fowering plants (grasses) that cover almost all Earth?s inhabitable range and comprises some of the genera most commonly used for human and animal nutrition. Many of these crops have been sequenced, like rice, Brachypodium, maize and, more recently, wheat. Some important members are still considered orphan crops, lacking a sequenced genome, but having important traits that make them attractive for sequencing. Among these traits is apomixis, clonal reproduction by seeds, present in some members of the Poaceae like Eragrostis curvula. A de novo, high-quality genome assembly and annotation for E. curvula have been obtained by sequencing 602Mb of a diploid genotype using a strategy that combined long-read length sequencing with chromosome conformation capture. The scafold N50 for this assembly was 43.41Mb and the annotation yielded 56,469 genes. The availability of this genome assembly has allowed us to identify regions associated with forage quality and to develop strategies to sequence and assemble the complex tetraploid genotypes which harbor the apomixis control region(s). Understanding and subsequently manipulating the genetic drivers underlying apomixis could revolutionize agriculture.Fil: Carballo, José. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; ArgentinaFil: Santos, B. A. C. M.. National Institute Of Agricultural Botany.; Reino UnidoFil: Zappacosta, Diego Carlos. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; Argentina. Universidad Nacional del Sur. Departamento de Agronomía; ArgentinaFil: Garbus, Ingrid. Universidad Nacional del Sur. Departamento de Biología, Bioquímica y Farmacia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; ArgentinaFil: Selva, Juan Pablo. Universidad Nacional del Sur. Departamento de Biología, Bioquímica y Farmacia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; ArgentinaFil: Gallo, Cristian Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; ArgentinaFil: Diaz, Alejandra Raquel. Universidad Nacional del Sur. Departamento de Biología, Bioquímica y Farmacia; Argentina. National Institute Of Agricultural Botany.; Reino Unido. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; ArgentinaFil: Albertini, Emiliano. Università di Perugia; ItaliaFil: Cáccamo, Mario José. National Institute Of Agricultural Botany.; Reino UnidoFil: Echenique, Carmen Viviana. Universidad Nacional del Sur. Departamento de Agronomía; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; Argentin

    Transcriptome-scale homoeolog-specific transcript assemblies of bread wheat

    Get PDF
    Extent: 14p. The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2164/13/492Background: Bread wheat is one of the world’s most important food crops and considerable efforts have been made to develop genomic resources for this species. This includes an on-going project by the International Wheat Genome Sequencing Consortium to assemble its large and complex genome, which is hexaploid and contains three closely related ‘homoeologous’ copies for each chromosome. This multi-national effort avoids the complications polyploidy entails for correct assembly of the genome by sequencing flow-sorted chromosome arms one at a time. Here we report on an alternate approach, a direct homoeolog-specific assembly of the expressed portion of the genome, the transcriptome. Results: After assessment of the ability of various assemblers to generate homoeolog-specific assemblies, we employed a two-stage assembly process to produce a high-quality assembly of the transcriptome of hexaploid wheat from Roche-454 and Illumina GAIIx paired-end sequence reads. The assembly process made use of a rapid partitioning of expressed sequences into homoeologous clusters, followed by a parallel high-fidelity assembly of each cluster on a 1150-processor compute cloud. We assessed assembly quality through comparison to known wheat gene sequences and found that in ca. 98.5% of cases the assembly was sufficiently accurate for homoeologous triplets to be cleanly separated into either two or three separate contigs. Comparison to publicly available transcript collections suggests that the assembly covers ~75-80% of the complete transcriptome. Conclusions: This work therefore describes the first homoeolog-specific sequence assembly of the wheat transcriptome and provides a reference transcriptome for future wheat research. Furthermore, our assembly methodology is transferable to other polyploid organisms.Andreas W Schreiber, Matthew J Hayden, Kerrie L Forrest, Stephan L Kong, Peter Langridge and Ute Bauman

    Navigating through the uncertainty of genotyping-by-sequencing data in polyploids

    Get PDF
    The development of genotyping-by-sequencing (GBS) methods has facilitated genomics studies in non-model species, including polyploids. Variant and genotype calling methods have been established for autopolyploids but for a species with a complex genome, such as sugarcane, the level of uncertainty within GBS data increases making trait mapping difficult. Furthermore, variant and genotype calling methods remain a challenge for both recent and ancient allopolyploids (e.g. wheat, maize, soybean, Miscanthus), particularly where the reference genome contains highly similar paralogous sequences that do not pair at meiosis. Alignment of sequence tags to the appropriate position within highly duplicated reference genomes remains a challenge inadequately addressed by existing alignment software. Although some variant calling pipelines can discriminate a paralogous locus from a Mendelian locus, the detection of these paralogous loci is typically for the purpose of the exclusion of these loci from the downstream analysis of genomic studies. We explore the significance of eliminating paralogous loci in downstream analysis using a newly developed pipeline developed to sort sequence tags to their correct alignment locations based on the novel Hind/HE statistic. The goal of this study was to evaluate the sorting pipeline’s ability to properly align paralogous loci to the correct position with respect to the reference genome. Three studies were conducted with a population of 400 individuals simulated based upon the Triticum aestivum, the reanalysis of a previously published genome-wide study of fusarium head blight in 273 wheat breeding lines, and the reanalysis of a previously published genome-wide study of traits associated with yield in a Miscanthus diversity panel. Results from the study suggested that the filtering of sequences using the Hind/HE statistic underlying polyRAD v1.2 may lead differences in the output of sequences. Further comparison of each output suggested that the output of the novel pipeline, polyRAD, was concentrated in gene-rich regions compared to other standard variant calling pipelines. From this study, we provide recommendations for future users of the polyRAD v1.2 variant calling pipeline. Overall we recommend that polyRAD v1.2 is more useful for populations of outcrossing species

    Translational genomics for achieving higher genetic gains in groundnut

    Get PDF
    Cultivated groundnut or peanut (Arachis hypogaea), an allopolyploid oilseed crop with a large and complex genome, is one of the most nutritious food. This crop is grown in more than 100 countries, and the low productivity has remained the biggest challenge in the semiarid tropics. Recently, the groundnut research community has witnessed fast progress and achieved several key milestones in genomics research including genome sequence assemblies of wild diploid progenitors, wild tetraploid and both the subspecies of cultivated tetraploids, resequencing of diverse germplasm lines, genome-wide transcriptome atlas and cost-effective high and low-density genotyping assays. These genomic resources have enabled high-resolution trait mapping by using germplasm diversity panels and multi-parent genetic populations leading to precise gene discovery and diagnostic marker development. Furthermore, development and deployment of diagnostic markers have facilitated screening early generation populations as well as marker-assisted backcrossing breeding leading to development and commercialization of some molecular breeding products in groundnut. Several new genomics applications/technologies such as genomic selection, speed breeding, mid-density genotyping assay and genome editing are in pipeline. The integration of these new technologies hold great promise for developing climate-smart, high yielding and more nutritious groundnut varieties in the post-genome era
    corecore