368 research outputs found

    High performance computing for large-scale genomic prediction

    Get PDF
    In the past decades genetics was studied intensively leading to the knowledge that DNA is the molecule behind genetic inheritance and starting from the new millennium next-generation sequencing methods made it possible to sample this DNA with an ever decreasing cost. Animal and plant breeders have always made use of genetic information to predict agronomic performance of new breeds. While this genetic information previously was gathered from the pedigree of the population under study, genomic information of the DNA makes it possible to also deduce correlations between individuals that do not share any known ancestors leading to so-called genomic prediction of agronomic performance. Nowadays, the number of informative samples that can be taken from a genome ranges from one thousand to one million. Using all this information in a breeding context where agronomic performance is predicted and optimized for different environmental conditions is not a straightforward task. Moreover, the number of individuals for which this information is available keeps on growing and thus sophisticated computational methods are required for analyzing these large scale genomic data sets. This thesis introduces some concepts of high performance computing in a genomic prediction context and shows that analyzing phenotypic records of large numbers of genotyped individuals leads to a better prediction accuracy of the agronomic performance in different environments. Finally, it is even shown that the parts of the DNA that influence the agronomic performance under certain environmental conditions can be pinpointed, and this knowledge can thus be used by breeders to select individuals that thrive better in the targeted environment

    A divide-and-conquer approach for genomic prediction in rubber tree using machine learning

    Get PDF
    International audienceRubber tree ( Hevea brasiliensis ) is the main feedstock for commercial rubber; however, its long vegetative cycle has hindered the development of more productive varieties via breeding programs. With the availability of H. brasiliensis genomic data, several linkage maps with associated quantitative trait loci have been constructed and suggested as a tool for marker-assisted selection. Nonetheless, novel genomic strategies are still needed, and genomic selection (GS) may facilitate rubber tree breeding programs aimed at reducing the required cycles for performance assessment. Even though such a methodology has already been shown to be a promising tool for rubber tree breeding, increased model predictive capabilities and practical application are still needed. Here, we developed a novel machine learning-based approach for predicting rubber tree stem circumference based on molecular markers. Through a divide-and-conquer strategy, we propose a neural network prediction system with two stages: (1) subpopulation prediction and (2) phenotype estimation. This approach yielded higher accuracies than traditional statistical models in a single-environment scenario. By delivering large accuracy improvements, our methodology represents a powerful tool for use in Hevea GS strategies. Therefore, the incorporation of machine learning techniques into rubber tree GS represents an opportunity to build more robust models and optimize Hevea breeding programs

    Parametric and Nonparametric Statistical Methods for Genomic Selection of Traits with Additive and Epistatic Genetic Architectures

    Get PDF
    Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes CÏ€. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE

    DRYP 1.0: a parsimonious hydrological model of DRYland Partitioning of the water balance

    Get PDF
    Dryland regions are characterized by water scarcity and are facing major challenges under climate change. One difficulty is anticipating how rainfall will be partitioned into evaporative losses, groundwater, soil moisture and runoff (the water balance) in the future, which has important implications for water resources and dryland ecosystems. However, in order to effectively estimate the water balance, hydrological models in drylands need to capture the key processes at the appropriate spatiotemporal scales including spatially restricted and temporally brief rainfall, high evaporation rates, transmission losses and focused groundwater recharge. Lack of available data and the high computational costs of explicit representation of ephemeral surface-groundwater interactions restrict the usefulness of most hydrological models in these environments. Therefore, here we have developed a parsimonious hydrological model (DRYP) that incorporates the key processes of water partitioning in dryland regions, and we tested it in the data-rich Walnut Gulch Experimental Watershed against measurements of streamflow, soil moisture and evapotranspiration. Overall, DRYP showed skill in quantifying the main components of the dryland water balance including monthly observations of streamflow (Nash efficiency (NSE) ~0.7), evapotranspiration (NSE > 0.6) and soil moisture (NSE ~0.7). The model showed that evapotranspiration consumes > 90 % of the total precipitation input to the catchment, and that < 1 % leaves the catchment as streamflow. Greater than 90 % of the overland flow generated in the catchment is lost through ephemeral channels as transmission losses. However, only ~35 % of the total transmission losses percolate to the groundwater aquifer as focused groundwater recharge, whereas the rest is lost to the atmosphere as riparian evapotranspiration. Overall, DRYP is a modular, versatile and parsimonious Python-based model which can be used to anticipate and plan for climatic and anthropogenic changes to water fluxes and storage in dryland region

    Parametric and Nonparametric Statistical Methods for Genomic Selection of Traits with Additive and Epistatic Genetic Architectures

    Get PDF
    Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes CÏ€. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE

    Visual analytics for relationships in scientific data

    Get PDF
    Domain scientists hope to address grand scientific challenges by exploring the abundance of data generated and made available through modern high-throughput techniques. Typical scientific investigations can make use of novel visualization tools that enable dynamic formulation and fine-tuning of hypotheses to aid the process of evaluating sensitivity of key parameters. These general tools should be applicable to many disciplines: allowing biologists to develop an intuitive understanding of the structure of coexpression networks and discover genes that reside in critical positions of biological pathways, intelligence analysts to decompose social networks, and climate scientists to model extrapolate future climate conditions. By using a graph as a universal data representation of correlation, our novel visualization tool employs several techniques that when used in an integrated manner provide innovative analytical capabilities. Our tool integrates techniques such as graph layout, qualitative subgraph extraction through a novel 2D user interface, quantitative subgraph extraction using graph-theoretic algorithms or by querying an optimized B-tree, dynamic level-of-detail graph abstraction, and template-based fuzzy classification using neural networks. We demonstrate our system using real-world workflows from several large-scale studies. Parallel coordinates has proven to be a scalable visualization and navigation framework for multivariate data. However, when data with thousands of variables are at hand, we do not have a comprehensive solution to select the right set of variables and order them to uncover important or potentially insightful patterns. We present algorithms to rank axes based upon the importance of bivariate relationships among the variables and showcase the efficacy of the proposed system by demonstrating autonomous detection of patterns in a modern large-scale dataset of time-varying climate simulation

    Parametric and Nonparametric Statistical Methods for Genomic Selection of Traits with Additive and Epistatic Genetic Architectures

    Get PDF
    Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cp. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE

    Plant phenomics, from sensors to knowledge

    Get PDF
    Major improvements in crop yield are needed to keep pace with population growth and climate change. While plant breeding efforts have greatly benefited from advances in genomics, profiling the crop phenome (i.e., the structure and function of plants) associated with allelic variants and environments remains a major technical bottleneck. Here, we review the conceptual and technical challenges facing plant phenomics. We first discuss how, given plants’ high levels of morphological plasticity, crop phenomics presents distinct challenges compared with studies in animals. Next, we present strategies for multi-scale phenomics, and describe how major improvements in imaging, sensor technologies and data analysis are now making high-throughput root, shoot, whole-plant and canopy phenomic studies possible. We then suggest that research in this area is entering a new stage of development, in which phenomic pipelines can help researchers transform large numbers of images and sensor data into knowledge, necessitating novel methods of data handling and modelling. Collectively, these innovations are helping accelerate the selection of the next generation of crops more sustainable and resilient to climate change, and whose benefits promise to scale from physiology to breeding and to deliver real world impact for ongoing global food security efforts

    Genetic studies of incubation behaviour and morphological traits in chickens

    Get PDF
    Finding the genes that underlie variation in production and developmental traits has important economic applications. Incubation behaviour represents a loss of production in conventional breeds of chicken adapted to local conditions and was what motivated this thesis. The Mendelian traits of comb type, crest, Silkie and normal feathers, feathered leg, fibromelanosis, comb colour, skin and shank colour, feather colour and patterns are of interest because of the insight they give to genes and development and were also investigated in the thesis.We used White Leghorn and Silkie lines of chicken to detect the genetic loci controlling incubation behaviour and Mendelian traits using linkage based analysis in an F2 cross. The evidence for QTL affecting incubation status over the whole period on chromosome 5 was strong (P <0.05). After the addition of 218 new informative SNP markers across the genome including chromosome 5 the 95% confidence interval spanned a region around 45 cM having previously been 95 cM. Three other suggestive QTL for incubation status were found after the addition of SNP markers on chromosome 1, 18, 19, E22C19W28 at 70, 0, 1 and 13cM respectively. The mode of action of the incubation status QTL indicates that the White Leghorn allele was either promoting incubation behaviour or that heterozygotes have performance that exceeds the homozygotes except the QTL on chromosome 1 where the Silkie allele is promoting incubation behaviour as might be expected. A highly significant QTL (P <0.01) for early incubation behaviour (25 -30 weeks) was found on chromosome 8 at 18 cM. This QTL has an additive effect with the possession of a Silkie allele increasing the likelihood of incubation behaviour. Other suggestive QTL for early incubation behaviour were found on chromosome 26 and 1 at 0 and 66cM respectively.For Mendelian traits, genome wide significant (P <0.01) genetic loci for comb type, crest type and feather type was found on chromosome 7 at 77cM, linkage group E22C19W28 at 7cM and on chromosome 3 at 169cM respectively. Significant genetic loci (P <0.01) for leg colour and skin colour were found on chromosome 20 at 56cM and 60cM respectively. In the present study, loci for all feather patterns were found on E22C19W28 even after removing animals carrying the dominant white alleles, suggesting dominant white or another allele at the locus was still influential.Comb type and incubation behaviour were investigated at the gene level. Thyroid stimulating hormone receptor (TSHR) is believed to be involved in the process of domestication and was found at the peak position of the most significant QTL on chromosome 5 for incubation behaviour. Functional exploration of Wnt genes as a candidate gene for comb type was investigated by in -situ hybridization in Silkie and White Leghorn embryos. The Wnt6 gene showed expression in the region of the presumptive comb development of embryos.In conclusion, for the first time genetic loci that explain maternal behaviour have been described. The coincidence of the incubation behaviour locus on chromosome 5 with the site of the strongest selective sweep in poultry, the TSHR, and the coincidence of QTL on chromosome 1 and 8 with thyroid hormone activity it would appear that the thyrotrophic axis may be critical to the loss of incubation behaviour and improved reproductive performance with domestication. Further analysis of these loci should be able to produce markers that can reduce the propensity for birds to incubate. Comb type marker might allow introgression of this trait to prevent comb damage in commercial hens
    • …
    corecore