93 research outputs found

    Generating High Density, Low Cost Genotype Data in Soybean [\u3ci\u3eGlycine max\u3c/i\u3e (L.) Merr.]

    Get PDF
    Obtaining genome-wide genotype information for millions of SNPs in soybean [Glycine max (L.) Merr.] often involves completely resequencing a line at 5X or greater coverage. Currently, hundreds of soybean lines have been resequenced at high depth levels with their data deposited in the NCBI Short Read Archive. This publicly available dataset may be leveraged as an imputation reference panel in combination with skim (low coverage) sequencing of new soybean genotypes to economically obtain high-density SNP information. Ninety-nine soybean lines resequenced at an average of 17.1X were used to generate a reference panel, with over 10 million SNPs called using GATKā€™s Haplotype Caller tool. Whole genome resequencing at approximately 1X depth was performed on 114 previously ungenotyped experimental soybean lines. Coverages down to 0.1X were analyzed by randomly subsetting raw reads from the original 1X sequence data. SNPs discovered in the reference panel were genotyped in the experimental lines after aligning to the soybean reference genome, and missing markers imputed using Beagle 4.1. Sequencing depth of the experimental lines could be reduced to 0.3X while still retaining an accuracy of 97.8%. Accuracy was inversely related to minor allele frequency, and highly correlated with marker linkage disequilibrium. The high accuracy of skim sequencing combined with imputation provides a low cost method for obtaining dense genotypic information that can be used for various genomics applications in soybean

    Fine Mapping of the SCN Resistance Locus \u3ci\u3erhg1-b\u3c/i\u3e from PI 88788

    Get PDF
    Soybean cyst nematode (SCN) (Heterodera glycines Ichinohe) is the most economically damaging soybean [Glycine max (L.) Merr.] pest in the USA and genetic resistance is a key component for its control. Although SCN resistance is quantitative, the rhg1 locus on chromosome 18 (formerly known as Linkage Group G) confers a high level of resistance. The objective of this study was to fi ne-map the rhg1-b allele that is derived from plant introduction (PI) 88788. F2 and F3 plants and F3:4 lines from crosses between SCN resistant and susceptible genotypes were tested with genetic markers to identify recombination events close to rhg1-b. Lines developed from these recombinant plants were then tested for resistance to the SCN isolate PA3, which originally had an HG type 0 phenotype, and with genetic markers. Analysis of lines carrying key recombination events positioned rhg1-b between the simple sequence repeat (SSR) markers BARCSOYSSR_18_0090 and BARCSOYSSR_18_0094. This places rhg1-b to a 67-kb region of the ā€˜Williams 82ā€™ genome sequence. The receptor-like kinase gene that has been previously identified as a candidate for the ā€˜Pekingā€™-derived SCN resistant rhg1 gene is adjacent to, but outside of, the rhg1-b interval defined in the present study

    Genome-wide Association Mapping of Qualitatively Inherited Traits in a Germplasm Collection

    Get PDF
    Genome-wide association (GWA) has been used as a tool for dissecting the genetic architecture of quantitatively inherited traits. We demonstrate here that GWA can also be highly useful for detecting many major genes governing categorically defined phenotype variants that exist for qualitatively inherited traits in a germplasm collection. Genome-wide association mapping was applied to categorical phenotypic data available for 10 descriptive traits in a collection of ~13,000 soybean [Glycine max (L.) Merr.] accessions that had been genotyped with a 50,000 single nucleotide polymorphism (SNP) chip. A GWA on a panel of accessions of this magnitude can offer substantial statistical power and mapping resolution, and we found that GWA mapping resulted in the identification of strong SNP signals for 24 classical genes as well as several heretofore unknown genes controlling the phenotypic variants in those traits. Because some of these genes had been cloned, we were able to show that the narrow GWA mapping SNP signal regions that we detected for the phenotypic variants had chromosomal bp spans that, with just one exception, overlapped the bp region of the cloned genes, despite local variation in SNP number and nonuniform SNP distribution in the chip set

    SNP Assay Development for Linkage Map Construction, Anchoring Whole-Genome Sequence, and Other Genetic and Genomic Applications in Common Bean.

    Get PDF
    A total of 992,682 single-nucleotide polymorphisms (SNPs) was identified as ideal for Illumina Infinium II BeadChip design after sequencing a diverse set of 17 common bean (Phaseolus vulgaris L) varieties with the aid of next-generation sequencing technology. From these, two BeadChips each with >5000 SNPs were designed. The BARCBean6K_1 BeadChip was selected for the purpose of optimizing polymorphism among market classes and, when possible, SNPs were targeted to sequence scaffolds in the Phaseolus vulgaris 14Ɨ genome assembly with sequence lengths >10 kb. The BARCBean6K_2 BeadChip was designed with the objective of anchoring additional scaffolds and to facilitate orientation of large scaffolds. Analysis of 267 F2 plants from a cross of varieties Stampede Ɨ Red Hawk with the two BeadChips resulted in linkage maps with a total of 7040 markers including 7015 SNPs. With the linkage map, a total of 432.3 Mb of sequence from 2766 scaffolds was anchored to create the Phaseolus vulgaris v1.0 assembly, which accounted for approximately 89% of the 487 Mb of available sequence scaffolds of the Phaseolus vulgaris v0.9 assembly. A core set of 6000 SNPs (BARCBean6K_3 BeadChip) with high genotyping quality and polymorphism was selected based on the genotyping of 365 dry bean and 134 snap bean accessions with the BARCBean6K_1 and BARCBean6K_2 BeadChips. The BARCBean6K_3 BeadChip is a useful tool for genetics and genomics research and it is widely used by breeders and geneticists in the United States and abroad

    Application of machine learning in SNP discovery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Single nucleotide polymorphisms (SNP) constitute more than 90% of the genetic variation, and hence can account for most trait differences among individuals in a given species. Polymorphism detection software PolyBayes and PolyPhred give high false positive SNP predictions even with stringent parameter values. We developed a machine learning (ML) method to augment PolyBayes to improve its prediction accuracy. ML methods have also been successfully applied to other bioinformatics problems in predicting genes, promoters, transcription factor binding sites and protein structures.</p> <p>Results</p> <p>The ML program C4.5 was applied to a set of features in order to build a SNP classifier from training data based on human expert decisions (True/False). The training data were 27,275 candidate SNP generated by sequencing 1973 STS (sequence tag sites) (12 Mb) in both directions from 6 diverse homozygous soybean cultivars and PolyBayes analysis. Test data of 18,390 candidate SNP were generated similarly from 1359 additional STS (8 Mb). SNP from both sets were classified by experts. After training the ML classifier, it agreed with the experts on 97.3% of test data compared with 7.8% agreement between PolyBayes and experts. The PolyBayes positive predictive values (PPV) (i.e., fraction of candidate SNP being real) were 7.8% for all predictions and 16.7% for those with 100% posterior probability of being real. Using ML improved the PPV to 84.8%, a 5- to 10-fold increase. While both ML and PolyBayes produced a similar number of true positives, the ML program generated only 249 false positives as compared to 16,955 for PolyBayes. The complexity of the soybean genome may have contributed to high false SNP predictions by PolyBayes and hence results may differ for other genomes.</p> <p>Conclusion</p> <p>A machine learning (ML) method was developed as a supplementary feature to the polymorphism detection software for improving prediction accuracies. The results from this study indicate that a trained ML classifier can significantly reduce human intervention and in this case achieved a 5ā€“10 fold enhanced productivity. The optimized feature set and ML framework can also be applied to all polymorphism discovery software. ML support software is written in Perl and can be easily integrated into an existing SNP discovery pipeline.</p

    A High Density Integrated Genetic Linkage Map of Soybean and the Development of a 1536 Universal Soy Linkage Panel for Quantitative Trait Locus Mapping

    Get PDF
    Single nucleotide polymorphisms (SNPs) are the marker of choice for many researchers due to their abundance and the high-throughput methods available for their multiplex analysis. Only recently have SNP markers been available to researchers in soybean [Glycine max (L.) Merr.] with the release of the third version of the consensus genetic linkage map that added 1141 SNP markers to the map. Our objectives were to add 2500 additional SNP markers to the soybean integrated map and select a set of 1536 SNPs to create a universal linkage panel for high-throughput soybean quantitative trait locus (QTL) mapping. The GoldenGate assay is one high-throughput analysis method capable of genotyping 1536 SNPs in 192 DNA samples over a 3-d period. We designed GoldenGate assays for 3456 SNPs (2956 new plus 500 previously mapped) which were used to screen three recombinant inbred line populations and diverse germplasm. A total of 3000 workable assays were obtained which added about 2500 new SNP markers to create a fourth version of the soybean integrated linkage map. To create a ā€œUniversal Soy Linkage Panelā€ (USLP 1.0) of 1536 SNP loci, SNPs were selected based on even distribution throughout each of the 20 consensus linkage groups and to have a broad range of allele frequencies in diverse germplasm. The 1536 USLP 1.0 will be able to quickly create a comprehensive genetic map in most QTL mapping populations and thus will serve as a useful tool for high-throughput QTL mapping

    SNP-PHAGE ā€“ High throughput SNP discovery pipeline

    Get PDF
    BACKGROUND: Single nucleotide polymorphisms (SNPs) as defined here are single base sequence changes or short insertion/deletions between or within individuals of a given species. As a result of their abundance and the availability of high throughput analysis technologies SNP markers have begun to replace other traditional markers such as restriction fragment length polymorphisms (RFLPs), amplified fragment length polymorphisms (AFLPs) and simple sequence repeats (SSRs or microsatellite) markers for fine mapping and association studies in several species. For SNP discovery from chromatogram data, several bioinformatics programs have to be combined to generate an analysis pipeline. Results have to be stored in a relational database to facilitate interrogation through queries or to generate data for further analyses such as determination of linkage disequilibrium and identification of common haplotypes. Although these tasks are routinely performed by several groups, an integrated open source SNP discovery pipeline that can be easily adapted by new groups interested in SNP marker development is currently unavailable. RESULTS: We developed SNP-PHAGE (SNP discovery Pipeline with additional features for identification of common haplotypes within a sequence tagged site (Haplotype Analysis) and GenBank (-dbSNP) submissions. This tool was applied for analyzing sequence traces from diverse soybean genotypes to discover over 10,000 SNPs. This package was developed on UNIX/Linux platform, written in Perl and uses a MySQL database. Scripts to generate a user-friendly web interface are also provided with common queries for preliminary data analysis. A machine learning tool developed by this group for increasing the efficiency of SNP discovery is integrated as a part of this package as an optional feature. The SNP-PHAGE package is being made available open source at . CONCLUSION: SNP-PHAGE provides a bioinformatics solution for high throughput SNP discovery, identification of common haplotypes within an amplicon, and GenBank (dbSNP) submissions. SNP selection and visualization are aided through a user-friendly web interface. This tool is useful for analyzing sequence tagged sites (STSs) of genomic sequences, and this software can serve as a starting point for groups interested in developing SNP markers

    Genomic regions that underlie soybean seed isoflavone content

    Get PDF
    Soy products contain isoflavones (genistein, daidzein, and glycitein)that display biological effects when ingested by humans and animals, these effects are species, dose and age dependent. Therefore, the content and quality of isoflavones in soybeans is a key to their biological effect. Our objective was to identify loci that underlie isoflavone content in soybean seeds. The study involved 100 recombinant inbred lines (RIL)fr om the cross of ā€˜Essexā€™ by ā€˜Forrest,ā€™ two cultivars that contrast for isoflavone content. Isoflavone content of seeds fromeach RIL was determined by high performance liquid chromatography (HPLC). The distribution of isoflavone content was continuous and unimodal. The heritability estimates on a line mean basis were 79% for daidzein, 22% for genistein, and 88% for glycitein. Isoflavone content of soybean seeds was compared against 150 polymorphic DNA markers in a one-way analysis of variance. Four genomic regions were found to be significantly associated with the isoflavone content of soybean seeds across both locations and years. Molecular linkage group B1 contained a major QTL underlying glycitein content (P = 0.0001,R2 = 50.2%), linkage group N contained a QTL for glycitein (P = 0.0033,R2 = 11.1%)and a QTL for daidzein (P = 0.0023,R2 = 10.3%) and linkage group A1 contained a QTL for daidzein (P = 0.0081,R2 = 9.6%). Selection for these chromosomal regions in a marker assisted selection program will allow for the manipulation of amounts and profiles of isoflavones (genistein, daidzein, and glycitein)c ontent of soybean seeds. In addition, tightly linked markers can be used in map based cloning of genes associated with isoflavone content
    • ā€¦
    corecore