5 research outputs found

    Detecting discriminatory risk through data annotation based on Bayesian inferences

    Get PDF
    Thanks to the increasing growth of computational power and data availability, the research in machine learning has advanced with tremendous rapidity. Nowadays, the majority of automatic decision making systems are based on data. However, it is well known that machine learning systems can present problematic results if they are built on partial or incomplete data. In fact, in recent years several studies have found a convergence of issues related to the ethics and transparency of these systems in the process of data collection and how they are recorded. Although the process of rigorous data collection and analysis is fundamental in the model design, this step is still largely overlooked by the machine learning community. For this reason, we propose a method of data annotation based on Bayesian statistical inference that aims to warn about the risk of discriminatory results of a given data set. In particular, our method aims to deepen knowledge and promote awareness about the sampling practices employed to create the training set, highlighting that the probability of success or failure conditioned to a minority membership is given by the structure of the data available. We empirically test our system on three datasets commonly accessed by the machine learning community and we investigate the risk of racial discrimination.Comment: 11 pages, 8 figure

    Evaluation of the Potential for Genomic Selection to Improve Spring Wheat Resistance to Fusarium Head Blight in the Pacific Northwest

    Get PDF
    Fusarium Head Blight (FHB) has emerged in spring wheat production in Pacific Northwest during the last decade due to factors including climate changes, crop rotations, and tillage practices. A breeding population with 170 spring wheat lines was established and screened over a 2-year period in multiple locations for FHB incidence (INC), severity (SEV), and deposition of the mycotoxin, deoxynivalenol (DON). A genome-wide association study suggested that the detectable number of genetic loci and effects are limited for marker-assisted selection. In conjunction with the success of breeding on FHB resistance in other programs, genomic selection (GS) was suggested as a better option. To evaluate the prediction accuracy of GS in the current breeding population, we conducted a variety of validations by varying proportions of testing populations and cohorts based on both FHB resistance and market class, including soft white spring (SWS), hard white spring (HWS), and hard red spring (HRS). We found that INC had higher heritability, higher correlation across years and locations, and higher prediction accuracy than SEV and DON. Prediction accuracy varied among the scenarios that restricted the testing population to a certain cohort. For a small set of newly developed or introduced lines (<17), prediction accuracy will be about 60% if the lines have similar genetic relationships as those among the current 170-line training population. However, we expect a lower prediction accuracy if new lines are selected for a specific characteristic, such as FHB resistance or market class. With the exception of DON in the SWS lines, the current training population is capable of making reasonably accurate predictions for FHB-resistant lines in most of the major market classes. For SWS, adding more lines or further phenotyping is required to improve prediction accuracy. These results demonstrate the potential and challenges of GS, especially for developing FHB-resistant varieties in the SWS market class

    Advancing red clover breeding through genomic selection methods

    Get PDF
    Red clover is a major forage legume and a highly valuable crop for Northern Europe due to its high protein value and multiple ecological services. It is an important crop for both the ruminant industry and ecological farming. As growing conditions change due to rapid climate change, the demand for red clover breeding has increased. In this thesis, the potential for accelerating genetic gain through improved red clover breeding was studied. First, since the response to selection is a function of genetic variation, the success of genomic selection depends on available genetic resources. Hence, red clover genetic resources available at the Nordic Genetic Resource Center (NordGen) gene bank and the Swedish seed company Lantmännen were used to evaluate the crop’s genetic diversity and population structure. Red clover accessions currently used for breeding have low values for measures of inbreeding, which suggests a lower risk of inbreeding depression. However, their genetic diversity was low, relative to available wild populations and landraces, which can increase the risk of inbreeding depression. Hence, the progression of breeding could be limited by the gene pool. In this thesis, red clover populations with the potential to be used in red clover breeding to increase genetic diversity were identified. Second, genetic gain in red clover can be rapidly increased by the introduction of genomic prediction models that minimize the need for time-consuming field trials. Both genome-wide association study (GWAS) and genomic prediction (GP) were tested for dry matter yield and forage quality traits based on data generated through multi-environment field trials and genotyping-by-sequencing. The results showed that dry matter yield and forage quality are affected by genes regulating responses to environmental inputs and stresses. This thesis showed that, by increasing genetic diversity and implementing GP in red clover breeding, genetic gain can be accelerated

    Breeding hard winter wheat (Triticum aestivum L.) for high grain yield and high grain protein concentration

    Get PDF
    2021 Spring.Includes bibliographical references.High grain yield (GY) is the primary selection target in commercial hard winter wheat (Triticum aestivum L.) breeding programs, with milling and bread-making quality as important secondary selection targets. Grain protein concentration (GPRO) is strongly correlated with important dough rheology and bread-making characteristics. Simultaneous improvement is difficult given the strong negative relationship of GY and GPRO in cereal crops. Nitrogen use efficiency (NUE), defined as the amount of grain produced per unit of N supply, promotes high GY through the component traits N uptake (NUpE) and N utilization (NUtE) efficiencies. Grain protein accumulation relies on N uptake from the soil and remobilization from plant tissue reserves. One study was conducted to characterize variation for NUE among a set of 20 breeding lines and varieties adapted to the west central Great Plains of the United States. Path analysis was applied to characterize the NUE component structure during the 2010-2011 growing season and then for two newly released varieties in the 2011-2012 growing season. Nitrogen use efficiency ranged from 39.9 kg kg-1 for 'RonL' to 46.7 kg kg-1 for 'Byrd'. By path analysis, we determined that variation in NUE depended on NUpE under N sufficiency and on NUtE under limiting N. Additionally, strategies for simultaneous improvement of GY and GPRO were explored. Analysis of standardized residuals of the linear regression of GPRO on GY, or 'grain protein deviation', identified one cultivar ('Brawl CL Plus') that had 6.7 g kg-1 higher GPRO than the average for all 20 genotypes. In a second study, selection strategies based on protein-yield selection indices for a set of 775 breeding lines and varieties representing the Colorado State University hard winter wheat breeding program were evaluated based on field data obtained during the 2012-2015 growing seasons. Selection based on high values for a particular index delivered a characteristic emphasis on GY or GPRO. Correlation analysis between index values and GY or GPRO showed that each simultaneous selection strategy focused to differing extents on the primary traits. Genomic selection applied to index values in univariate models provided forward prediction accuracy ranging between r = .21 to .44 for the 2013 validation set, but approached zero for the 2014 validation set. Index values were also calculated from genomic estimated breeding values obtained in bivariate genomic selection models. Prediction accuracy for individual trait values was not substantially improved in the bivariate model. Protein-yield indices calculated from bivariate genomic estimated breeding values showed similar relationships to GY and GPRO as for the genomic estimated breeding values for indices calculated in the univariate models. A set of selection strategies generate sufficient predictive ability in phenotypic or genomic selection to be effective tools for simultaneous selection for GY and GPRO
    corecore