130 research outputs found

    QTL mapping and identification of GxE interactions of agronomic and seed quality traits in soybean

    Get PDF
    The detection of quantitative trait loci (QTL) in soybean [Glycine max (L) Merr] enables regions of chromosomes to be identified, which contain genes that regulate the expression of important agronomic and seed quality traits. The objective of this study was to identify QTL and genotype x environment (GxE) interactions contributing to agronomic and seed quality traits in soybean. A recombinant inbred line (RIL) population was created from two prominent ancestors (Essex and Williams) of currently available U.S. cultivars. One hundred simple sequence repeat (SSR) markers spaced throughout the genome were mapped in this population. Agronomic and seed quality traits were measured in six environments spanning two years. QTL were identified with GxE interaction for agronomic and seed quality traits, using composite interval mapping and multiple trait analysis. A total of 11 maturity, six height, seven lodging, two yield, six oil, 13 protein, nine seed size, four palmitate, seven stearate, four oleate, five linoleate, and eight linolenate QTL were detected in this population. The QTL found in this study may be of use in marker assisted selection to enable breeders to more finely tune their ability to make further genetic gains

    GENETIC DIVERSITY AND LINKAGE DISEQUILIBRIUM IN WILD SOYBEAN, LANDRACES, ANCESTRAL, AND ELITE SOYBEAN POPULATIONS

    Get PDF
    Domestication, founder effects, and artificial selection can impact populations by reducing genome diversity and increasing the extent of linkage disequilibrium (LD). To understand the impact of these genetic bottlenecks and selection on sequence diversity and LD within soybean [Glycine max (L.) Merr.], 111 genes and three chromosomal regions located on linkage groups A2, G, and J were characterized in soybean. Four soybean populations were evaluated: 1) the wild ancestor of soybean (G. soja), 2) the population resulting from domestication (landraces), 3) Asian introductions from which North American cultivars were developed (ancestors), and 4) elite cultivars from the 1980's (elite). A total of 438 single nucleotide polymorphisms (SNPs) and 58 insertions-deletions were discovered within the 102 genes. Sequence diversity was lower than expected in G. soja with an overall theta equal to 0.00235, and was less than half that value (theta = 0.00115) in the landraces. Domestication eliminated most unique haplotypes with G. soja containing 240 unique haplotypes while the landraces only contained 42 unique haplotypes. The founder effect of the introduction of soybean to North America followed by intensive artificial selection, resulted in only a 30% decrease in nucleotide diversity. A total of 738 SNPs were discovered and genotyped in the four populations throughout three chromosomal regions. In G. soja LD did not extend past 100 kb while in the three cultivated soybean populations LD extended from 90 kb up to 600+ kb, most likely as a result of increased inbreeding and domestication. The three chromosomal regions varied in the extent of LD within the populations. G. soja is the greatest resource for unique alleles and may be best suited for fine mapping utilizing association analysis. The landraces do not contain much more variability than the elite cultivars but may have enough diversity to facilitate genetic improvement of elite cultivars. Finally, due to the extended levels of LD in the landraces and the elite cultivars, whole genome association analysis may be possible for the discovery of QTL

    Fine Mapping of the SCN Resistance Locus \u3ci\u3erhg1-b\u3c/i\u3e from PI 88788

    Get PDF
    Soybean cyst nematode (SCN) (Heterodera glycines Ichinohe) is the most economically damaging soybean [Glycine max (L.) Merr.] pest in the USA and genetic resistance is a key component for its control. Although SCN resistance is quantitative, the rhg1 locus on chromosome 18 (formerly known as Linkage Group G) confers a high level of resistance. The objective of this study was to fi ne-map the rhg1-b allele that is derived from plant introduction (PI) 88788. F2 and F3 plants and F3:4 lines from crosses between SCN resistant and susceptible genotypes were tested with genetic markers to identify recombination events close to rhg1-b. Lines developed from these recombinant plants were then tested for resistance to the SCN isolate PA3, which originally had an HG type 0 phenotype, and with genetic markers. Analysis of lines carrying key recombination events positioned rhg1-b between the simple sequence repeat (SSR) markers BARCSOYSSR_18_0090 and BARCSOYSSR_18_0094. This places rhg1-b to a 67-kb region of the ‘Williams 82’ genome sequence. The receptor-like kinase gene that has been previously identified as a candidate for the ‘Peking’-derived SCN resistant rhg1 gene is adjacent to, but outside of, the rhg1-b interval defined in the present study

    Generating High Density, Low Cost Genotype Data in Soybean [\u3ci\u3eGlycine max\u3c/i\u3e (L.) Merr.]

    Get PDF
    Obtaining genome-wide genotype information for millions of SNPs in soybean [Glycine max (L.) Merr.] often involves completely resequencing a line at 5X or greater coverage. Currently, hundreds of soybean lines have been resequenced at high depth levels with their data deposited in the NCBI Short Read Archive. This publicly available dataset may be leveraged as an imputation reference panel in combination with skim (low coverage) sequencing of new soybean genotypes to economically obtain high-density SNP information. Ninety-nine soybean lines resequenced at an average of 17.1X were used to generate a reference panel, with over 10 million SNPs called using GATK’s Haplotype Caller tool. Whole genome resequencing at approximately 1X depth was performed on 114 previously ungenotyped experimental soybean lines. Coverages down to 0.1X were analyzed by randomly subsetting raw reads from the original 1X sequence data. SNPs discovered in the reference panel were genotyped in the experimental lines after aligning to the soybean reference genome, and missing markers imputed using Beagle 4.1. Sequencing depth of the experimental lines could be reduced to 0.3X while still retaining an accuracy of 97.8%. Accuracy was inversely related to minor allele frequency, and highly correlated with marker linkage disequilibrium. The high accuracy of skim sequencing combined with imputation provides a low cost method for obtaining dense genotypic information that can be used for various genomics applications in soybean

    SNP Assay Development for Linkage Map Construction, Anchoring Whole-Genome Sequence, and Other Genetic and Genomic Applications in Common Bean.

    Get PDF
    A total of 992,682 single-nucleotide polymorphisms (SNPs) was identified as ideal for Illumina Infinium II BeadChip design after sequencing a diverse set of 17 common bean (Phaseolus vulgaris L) varieties with the aid of next-generation sequencing technology. From these, two BeadChips each with >5000 SNPs were designed. The BARCBean6K_1 BeadChip was selected for the purpose of optimizing polymorphism among market classes and, when possible, SNPs were targeted to sequence scaffolds in the Phaseolus vulgaris 14× genome assembly with sequence lengths >10 kb. The BARCBean6K_2 BeadChip was designed with the objective of anchoring additional scaffolds and to facilitate orientation of large scaffolds. Analysis of 267 F2 plants from a cross of varieties Stampede × Red Hawk with the two BeadChips resulted in linkage maps with a total of 7040 markers including 7015 SNPs. With the linkage map, a total of 432.3 Mb of sequence from 2766 scaffolds was anchored to create the Phaseolus vulgaris v1.0 assembly, which accounted for approximately 89% of the 487 Mb of available sequence scaffolds of the Phaseolus vulgaris v0.9 assembly. A core set of 6000 SNPs (BARCBean6K_3 BeadChip) with high genotyping quality and polymorphism was selected based on the genotyping of 365 dry bean and 134 snap bean accessions with the BARCBean6K_1 and BARCBean6K_2 BeadChips. The BARCBean6K_3 BeadChip is a useful tool for genetics and genomics research and it is widely used by breeders and geneticists in the United States and abroad

    A High Density Integrated Genetic Linkage Map of Soybean and the Development of a 1536 Universal Soy Linkage Panel for Quantitative Trait Locus Mapping

    Get PDF
    Single nucleotide polymorphisms (SNPs) are the marker of choice for many researchers due to their abundance and the high-throughput methods available for their multiplex analysis. Only recently have SNP markers been available to researchers in soybean [Glycine max (L.) Merr.] with the release of the third version of the consensus genetic linkage map that added 1141 SNP markers to the map. Our objectives were to add 2500 additional SNP markers to the soybean integrated map and select a set of 1536 SNPs to create a universal linkage panel for high-throughput soybean quantitative trait locus (QTL) mapping. The GoldenGate assay is one high-throughput analysis method capable of genotyping 1536 SNPs in 192 DNA samples over a 3-d period. We designed GoldenGate assays for 3456 SNPs (2956 new plus 500 previously mapped) which were used to screen three recombinant inbred line populations and diverse germplasm. A total of 3000 workable assays were obtained which added about 2500 new SNP markers to create a fourth version of the soybean integrated linkage map. To create a “Universal Soy Linkage Panel” (USLP 1.0) of 1536 SNP loci, SNPs were selected based on even distribution throughout each of the 20 consensus linkage groups and to have a broad range of allele frequencies in diverse germplasm. The 1536 USLP 1.0 will be able to quickly create a comprehensive genetic map in most QTL mapping populations and thus will serve as a useful tool for high-throughput QTL mapping

    Application of machine learning in SNP discovery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Single nucleotide polymorphisms (SNP) constitute more than 90% of the genetic variation, and hence can account for most trait differences among individuals in a given species. Polymorphism detection software PolyBayes and PolyPhred give high false positive SNP predictions even with stringent parameter values. We developed a machine learning (ML) method to augment PolyBayes to improve its prediction accuracy. ML methods have also been successfully applied to other bioinformatics problems in predicting genes, promoters, transcription factor binding sites and protein structures.</p> <p>Results</p> <p>The ML program C4.5 was applied to a set of features in order to build a SNP classifier from training data based on human expert decisions (True/False). The training data were 27,275 candidate SNP generated by sequencing 1973 STS (sequence tag sites) (12 Mb) in both directions from 6 diverse homozygous soybean cultivars and PolyBayes analysis. Test data of 18,390 candidate SNP were generated similarly from 1359 additional STS (8 Mb). SNP from both sets were classified by experts. After training the ML classifier, it agreed with the experts on 97.3% of test data compared with 7.8% agreement between PolyBayes and experts. The PolyBayes positive predictive values (PPV) (i.e., fraction of candidate SNP being real) were 7.8% for all predictions and 16.7% for those with 100% posterior probability of being real. Using ML improved the PPV to 84.8%, a 5- to 10-fold increase. While both ML and PolyBayes produced a similar number of true positives, the ML program generated only 249 false positives as compared to 16,955 for PolyBayes. The complexity of the soybean genome may have contributed to high false SNP predictions by PolyBayes and hence results may differ for other genomes.</p> <p>Conclusion</p> <p>A machine learning (ML) method was developed as a supplementary feature to the polymorphism detection software for improving prediction accuracies. The results from this study indicate that a trained ML classifier can significantly reduce human intervention and in this case achieved a 5–10 fold enhanced productivity. The optimized feature set and ML framework can also be applied to all polymorphism discovery software. ML support software is written in Perl and can be easily integrated into an existing SNP discovery pipeline.</p

    SNP-PHAGE – High throughput SNP discovery pipeline

    Get PDF
    BACKGROUND: Single nucleotide polymorphisms (SNPs) as defined here are single base sequence changes or short insertion/deletions between or within individuals of a given species. As a result of their abundance and the availability of high throughput analysis technologies SNP markers have begun to replace other traditional markers such as restriction fragment length polymorphisms (RFLPs), amplified fragment length polymorphisms (AFLPs) and simple sequence repeats (SSRs or microsatellite) markers for fine mapping and association studies in several species. For SNP discovery from chromatogram data, several bioinformatics programs have to be combined to generate an analysis pipeline. Results have to be stored in a relational database to facilitate interrogation through queries or to generate data for further analyses such as determination of linkage disequilibrium and identification of common haplotypes. Although these tasks are routinely performed by several groups, an integrated open source SNP discovery pipeline that can be easily adapted by new groups interested in SNP marker development is currently unavailable. RESULTS: We developed SNP-PHAGE (SNP discovery Pipeline with additional features for identification of common haplotypes within a sequence tagged site (Haplotype Analysis) and GenBank (-dbSNP) submissions. This tool was applied for analyzing sequence traces from diverse soybean genotypes to discover over 10,000 SNPs. This package was developed on UNIX/Linux platform, written in Perl and uses a MySQL database. Scripts to generate a user-friendly web interface are also provided with common queries for preliminary data analysis. A machine learning tool developed by this group for increasing the efficiency of SNP discovery is integrated as a part of this package as an optional feature. The SNP-PHAGE package is being made available open source at . CONCLUSION: SNP-PHAGE provides a bioinformatics solution for high throughput SNP discovery, identification of common haplotypes within an amplicon, and GenBank (dbSNP) submissions. SNP selection and visualization are aided through a user-friendly web interface. This tool is useful for analyzing sequence tagged sites (STSs) of genomic sequences, and this software can serve as a starting point for groups interested in developing SNP markers
    corecore