27 research outputs found

    An imputation platform to enhance integration of rice genetic resources

    Get PDF
    As sequencing and genotyping technologies evolve, crop genetics researchers accumulate increasing numbers of genomic data sets from various genotyping platforms on different germplasm panels. Imputation is an effective approach to increase marker density of existing data sets toward the goal of integrating resources for downstream applications. While a number of imputation software packages are available, the limitations to utilization for the rice community include high computational demand and lack of a reference panel. To address these challenges, we develop the Rice Imputation Server, a publicly available web application leveraging genetic information from a globally diverse rice reference panel assembled here. This resource allows researchers to benefit from increased marker density without needing to perform imputation on their own machines. We demonstrate improvements that imputed data provide to rice genome-wide association (GWA) results of grain amylose content and show that the major functional nucleotide polymorphism is tagged only in the imputed data set

    A high-performance computational workflow to accelerate GATK SNP detection across a 25-genome dataset

    Get PDF
    Background: Single-nucleotide polymorphisms (SNPs) are the most widely used form of molecular genetic variation studies. As reference genomes and resequencing data sets expand exponentially, tools must be in place to call SNPs at a similar pace. The genome analysis toolkit (GATK) is one of the most widely used SNP calling software tools publicly available, but unfortunately, high-performance computing versions of this tool have yet to become widely available and affordable. Results: Here we report an open-source high-performance computing genome variant calling workflow (HPC-GVCW) for GATK that can run on multiple computing platforms from supercomputers to desktop machines. We benchmarked HPC-GVCW on multiple crop species for performance and accuracy with comparable results with previously published reports (using GATK alone). Finally, we used HPC-GVCW in production mode to call SNPs on a “subpopulation aware” 16-genome rice reference panel with ~ 3000 resequenced rice accessions. The entire process took ~ 16 weeks and resulted in the identification of an average of 27.3 M SNPs/genome and the discovery of ~ 2.3 million novel SNPs that were not present in the flagship reference genome for rice (i.e., IRGSP RefSeq). Conclusions: This study developed an open-source pipeline (HPC-GVCW) to run GATK on HPC platforms, which significantly improved the speed at which SNPs can be called. The workflow is widely applicable as demonstrated successfully for four major crop species with genomes ranging in size from 400 Mb to 2.4 Gb. Using HPC-GVCW in production mode to call SNPs on a 25 multi-crop-reference genome data set produced over 1.1 billion SNPs that were publicly released for functional and breeding studies. For rice, many novel SNPs were identified and were found to reside within genes and open chromatin regions that are predicted to have functional consequences. Combined, our results demonstrate the usefulness of combining a high-performance SNP calling architecture solution with a subpopulation-aware reference genome panel for rapid SNP discovery and public deployment. © 2024, The Author(s).Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

    Predicting agronomic traits and associated genomic regions in diverse rice landraces using marker stability

    Full text link
    AbstractTo secure the world’s food supply it is essential that we improve our knowledge of the genetic underpinnings of complex agronomic traits. In this paper, we report our findings from performing trait prediction and association mapping using marker stability in diverse rice landraces. We used the least absolute shrinkage and selection operator as our marker selection algorithm, and considered twelve real agronomic traits and a hundred simulated traits using a population with approximately a hundred thousand markers. For trait prediction, we considered several statistical/machine learning methods. We found that some of the methods considered performed best when preselected markers using marker stability were used. However, our results also show that one might need to make a trade-off between model size and performance for some learning methods. For association mapping, we compared marker stability to the genome-wide efficient mixed-model analysis (GEMMA), and for the simulated traits, we found that marker stability significantly outperforms GEMMA. For the real traits, marker stability successfully identifies multiple associated markers, which often entail those selected by GEMMA. Further analysis of the markers selected for the real traits using marker stability showed that they are located in known quantitative trait loci (QTL) using the QTL Annotation Rice Online database. Furthermore, co-functional network prediction of the selected markers using RiceNet v2 also showed association to known controlling genes. We argue that a wide adoption of the marker stability approach for the prediction of agronomic traits and association mapping could improve global rice breeding efforts.</jats:p

    Multiple streams of genetic diversity in Japonica rice

    Full text link
    AbstractIn-depth studies on the genetic diversity of crops indicate that domestication is likely a drawn-out process that differs from the traditional representation of a simple rapid bottleneck. Asian cultivated rice provides a clear picture of multiple foundations of crop diversity. Among them, Japonica rice is likely the group derived from the first human manipulations of this species. We make use of the 3,000 Rice Genomes (3K RG) data set, first described in 2018, to explore the genetic diversity of traditional Japonica rice. After delineating introgressions from the Indica and cAus cultivar groups, we mask these traces to analyse Japonica diversity in more depth. We find differentiation between the established “temperate”, “subtropical” and “tropical” subgroups, and identify stream-like traces of highly divergent sources from broad geographic ranges and subgroups. We characterize five such streams, most visible respectively in: 1) Indonesia, 2) continental Southeast Asia, 3) China, 4) uplands of Japan, and 5) Bhutan. These streams likely consist of ancient alien introgressions propagated through geneflow to different degrees. They currently appear as long genome segments conserved among specific germplasm groups, as well as shorter segments more broadly distributed across diverse germplasm along what could be adaptive corridors. They are all represented in the Japonica component of cBasmati varieties, thought to have emerged over two millennia ago. We thus provide strong evidence that Japonica, the group posited as being the most direct product of a simple domestication process in China, is an aggregate derived from multiple waves of admixture and represents a composite gene pool with ancient Asia-wide population dynamics.</jats:p
    corecore