26 research outputs found
Optimising the identification of causal variants across varying genetic architectures in crops
Association studies use statistical links between genetic markers and the phenotype variation across many individuals to identify genes controlling variation in the target phenotype. However, this approach, particularly conducted on a genome-wide scale (GWAS), has limited power to identify the genes responsible for variation in traits controlled by complex genetic architectures. In this study, we employ real-world genotype datasets from four crop species with distinct minor allele frequency distributions, population structures and linkage disequilibrium patterns. We demonstrate that different GWAS statistical approaches provide favourable trade-offs between power and accuracy for traits controlled by different types of genetic architectures. FarmCPU provides the most favourable outcomes for moderately complex traits while a Bayesian approach adopted from genomic prediction provides the most favourable outcomes for extremely complex traits. We assert that by estimating the complexity of genetic architectures for target traits and selecting an appropriate statistical approach for the degree of complexity detected, researchers can substantially improve the ability to dissect the genetic factors controlling complex traits such as flowering time, plant height and yield component
Semantic Segmentation of Sorghum Using Hyperspectral Data Identifies Genetic Associations
This study describes the evaluation of a range of approaches to semantic segmentation of hyperspectral images of sorghum plants, classifying each pixel as either nonplant or belonging to one of the three organ types (leaf, stalk, panicle). While many current methods for segmentation focus on separating plant pixels from background, organ-specific segmentation makes it feasible to measure a wider range of plant properties. Manually scored training data for a set of hyperspectral images collected from a sorghum association population was used to train and evaluate a set of supervised classification models. Many algorithms show acceptable accuracy for this classification task. Algorithms trained on sorghum data are able to accurately classify maize leaves and stalks, but fail to accurately classify maize reproductive organs which are not directly equivalent to sorghum panicles. Trait measurements extracted from semantic segmentation of sorghum organs can be used to identify both genes known to be controlling variation in a previously measured phenotypes (e.g., panicle size and plant height) as well as identify signals for genes controlling traits not previously quantified in this population (e.g., stalk/leaf ratio). Organ level semantic segmentation provides opportunities to identify genes controlling variation in a wide range of morphological phenotypes in sorghum, maize, and other related grain crops
Genotype-Corrector: improved genotype calls for genetic mapping in F2 and RIL populations
F2 and recombinant inbred lines (RILs) populations are very commonly used in plant genetic mapping studies. Although genome-wide genetic markers like single nucleotide polymorphisms (SNPs) can be readily identified by a wide array of methods, accurate genotype calling remains challenging, especially for heterozygous loci and missing data due to low sequencing coverage per individual. Therefore, we developed Genotype-Corrector, a program that corrects genotype calls and imputes missing data to improve the accuracy of genetic mapping. Genotype-Corrector can be applied in a wide variety of genetic mapping studies that are based on low coverage whole genome sequencing (WGS) or Genotyping-by-Sequencing (GBS) related techniques. Our results show that Genotype-Corrector achieves high accuracy when applied to both synthetic and real genotype data. Compared with using raw or only imputed genotype calls, the linkage groups built by corrected genotype data show much less noise and significant distortions can be corrected. Additionally, Genotype-Corrector compares favorably to the popular imputation software LinkImpute and Beagle in both F2 and RIL populations
High‑throughput analysis of leaf physiological and chemical traits with VIS–NIR–SWIR spectroscopy: a case study with a maize diversity panel
Hyperspectral reflectance data in the visible, near infrared and shortwave infrared range (VIS–NIR– SWIR, 400–2500 nm) are commonly used to nondestructively measure plant leaf properties. We investigated the usefulness of VIS–NIR–SWIR as a high-throughput tool to measure six leaf properties of maize plants including chlorophyll content (CHL), leaf water content (LWC), specific leaf area (SLA), nitrogen (N), phosphorus (P), and potassium (K). This assessment was performed using the lines of the maize diversity panel. Data were collected from plants grown in greenhouse condition, as well as in the field under two nitrogen application regimes. Leaf-level hyperspectral data were collected with a VIS–NIR–SWIR spectroradiometer at tasseling. Two multivariate modeling approaches, partial least squares regression (PLSR) and support vector regression (SVR), were employed to estimate the leaf properties from hyperspectral data. Several common vegetation indices (VIs: GNDVI, RENDVI, and NDWI), which were calculated from hyperspectral data, were also assessed to estimate these leaf properties
3D reconstruction identifies loci linked to variation in angle of individual sorghum leaves
Selection for yield at high planting density has reshaped the leaf canopy of maize, improving photosynthetic productivity in high density settings. Further optimization of canopy architecture may be possible. However, measuring leaf angles, the widely studied component trait of leaf canopy architecture, by hand is a labor and time intensive process. Here, we use multiple, calibrated, 2D images to reconstruct the 3D geometry of individual sorghum plants using a voxel carving based algorithm. Automatic skeletonization and segmentation of these 3D geometries enable quan- tification of the angle of each leaf for each plant. The resulting measurements are both heritable and correlated with manually collected leaf angles. This automated and scaleable reconstruction approach was employed to measure leaf-by-leaf angles for a population of 366 sorghum plants at multiple time points, resulting in 971 successful reconstructions and 3,376 leaf angle measurements from individual leaves. A genome wide association study conducted using aggregated leaf angle data identified a known large effect leaf angle gene, several previously identified leaf angle QTL from a sorghum NAM population, and novel signals. Genome wide association studies conducted separately for three individual sorghum leaves identified a number of the same signals, a previously unreported signal shared across multiple leaves, and signals near the sorghum orthologs of two maize genes known to influence leaf angle. Automated measurement of individual leaves and mapping variants associated with leaf angle reduce the barriers to engineering ideal canopy architectures in sorghum and other grain crops
Meta-analysis identifies pleiotropic loci controlling phenotypic trade-offs in sorghum
Community association populations are composed of phenotypically and genetically diverse accessions. Once these populations are genotyped, the resulting marker data can be reused by different groups investigating the genetic basis of different traits. Because the same genotypes are observed and scored for a wide range of traits in different environments, these populations represent a unique resource to investigate pleiotropy. Here, we assembled a set of 234 separate trait datasets for the Sorghum Association Panel, a group of 406 sorghum genotypes widely employed by the sorghum genetics community. Comparison of genome-wide association studies (GWAS) conducted with two independently generated marker sets for this population demonstrate that existing genetic marker sets do not saturate the genome and likely capture only 35–43% of potentially detectable loci controlling variation for traits scored in this population. While limited evidence for pleiotropy was apparent in cross-GWAS comparisons, a multivariate adaptive shrinkage approach recovered both known pleiotropic effects of existing loci and new pleiotropic effects, particularly significant impacts of known dwarfing genes on root architecture. In addition, we identified new loci with pleiotropic effects consistent with known trade-offs in sorghum development. These results demonstrate the potential for mining existing trait datasets from widely used community association populations to enable new discoveries from existing trait datasets as new, denser genetic marker datasets are generated for existing community association populations
Shared genetic control of root system architecture between Zea mays and Sorghum bicolor
Determining the genetic control of root system architecture (RSA) in plants via large-scale genome-wide association study (GWAS) requires high-throughput pipelines for root phenotyping. We developed CREAMD (Core Root Excavation using Compressed-air), a high-throughput pipeline for the cleaning of field-grown roots, and COFE (Core Root Feature Extraction), a semi-automated pipeline for the extraction of RSA traits from images. CREAMD-COFE was applied to diversity panels of maize (Zea mays) and sorghum (Sorghum bicolor), which consisted of 369 and 294 genotypes, respectively. Six RSA-traits were extracted from images collected from \u3e3,300 maize roots and \u3e1,470 sorghum roots. SNP-based GWAS identified 87 TAS (trait-associated SNPs) in maize, representing 77 genes and 115 TAS in sorghum. An additional 62 RSA-associated maize genes were identified via eRD-GWAS. Among the 139 maize RSA-associated genes (or their homologs), 22 (16%) are known to affect RSA in maize or other species. In addition, 26 RSA-associated genes are co-regulated with genes previously shown to affect RSA and 51 (37% of RSA-associated genes) are themselves trans-eQTL for another RSA-associated gene. Finally, the finding that RSA-associated genes from maize and sorghum included seven pairs of syntenic genes demonstrates the conservation of regulation of morphology across taxa
Quantitative Genetics and Phonemics in Crops Using Statistical and Machine Learning Approaches
Plant biologists seek to meet the growing food demands in the world by developing high yielding and more resilient crop varieties. Advances in both quantitative genetics and high throughput phenotyping have the potential to facilitate this work to improve crop qualities. Genome-wide association studies (GWAS) are approaches to identify the genes controlling variation in phenotype within a species. While many statistical models exist for GWAS, the relative strengths and weaknesses of these models in crop species were not well elucidated. In the first chapter, current GWAS models were evaluated using real world genetic data from four crop species and different assumptions about genetic architecture and heritability. The second chapter presents a new semantic segmentation approach to measure morphological phenotypes in sorghum. This approach lets researchers measure plant traits using automated phenotyping which previously required time intensive hand measurements of the same plants. Automated phenotyping also makes it easier to measure how the phenotypes of individual plants change over time. The third chapter adopts a statistical approach called functional PCA model for conducting GWAS in sorghum using time series data. The approach presented can help researchers better understand how an individual gene plays in determining plant phenotype over time. Leaf number, and the timing of leaf emergence, is another important agronomic trait of interest and of use to plant breeders and plant biologists. However, work on the computer vision task of leaf counting has focused on Arabidopsis because that is where the training data has been. In the last chapter, a new benchmark image dataset was generated including annotating the number and position of each leave in over 150,000 maize and sorghum images. I show that machine learning models trained using this dataset achieves leaf counting performance comparable to humans in maize. The data, approaches, and conclusions presented in this dissertation provide valuable knowledge to guide the improvement of crop qualities in the future
Quantitative Genetics and Phonemics in Crops Using Statistical and Machine Learning Approaches
Plant biologists seek to meet the growing food demands in the world by developing high yielding and more resilient crop varieties. Advances in both quantitative genetics and high throughput phenotyping have the potential to facilitate this work to improve crop qualities. Genome-wide association studies (GWAS) are approaches to identify the genes controlling variation in phenotype within a species. While many statistical models exist for GWAS, the relative strengths and weaknesses of these models in crop species were not well elucidated. In the first chapter, current GWAS models were evaluated using real world genetic data from four crop species and different assumptions about genetic architecture and heritability. The second chapter presents a new semantic segmentation approach to measure morphological phenotypes in sorghum. This approach lets researchers measure plant traits using automated phenotyping which previously required time intensive hand measurements of the same plants. Automated phenotyping also makes it easier to measure how the phenotypes of individual plants change over time. The third chapter adopts a statistical approach called functional PCA model for conducting GWAS in sorghum using time series data. The approach presented can help researchers better understand how an individual gene plays in determining plant phenotype over time. Leaf number, and the timing of leaf emergence, is another important agronomic trait of interest and of use to plant breeders and plant biologists. However, work on the computer vision task of leaf counting has focused on Arabidopsis because that is where the training data has been. In the last chapter, a new benchmark image dataset was generated including annotating the number and position of each leave in over 150,000 maize and sorghum images. I show that machine learning models trained using this dataset achieves leaf counting performance comparable to humans in maize. The data, approaches, and conclusions presented in this dissertation provide valuable knowledge to guide the improvement of crop qualities in the future
Optimising the identification of causal variants across varying genetic architectures in crops
Association studies use statistical links between genetic markers and the phenotype variation across many individuals to identify genes controlling variation in the target phenotype. However, this approach, particularly conducted on a genome-wide scale (GWAS), has limited power to identify the genes responsible for variation in traits controlled by complex genetic architectures. In this study, we employ real-world genotype datasets from four crop species with distinct minor allele frequency distributions, population structures and linkage disequilibrium patterns. We demonstrate that different GWAS statistical approaches provide favourable trade-offs between power and accuracy for traits controlled by different types of genetic architectures. FarmCPU provides the most favourable outcomes for moderately complex traits while a Bayesian approach adopted from genomic prediction provides the most favourable outcomes for extremely complex traits. We assert that by estimating the complexity of genetic architectures for target traits and selecting an appropriate statistical approach for the degree of complexity detected, researchers can substantially improve the ability to dissect the genetic factors controlling complex traits such as flowering time, plant height and yield component