108 research outputs found

    Array Variate Skew Normal Random Variables with Multiway Kronecker Delta Covariance Matrix Structure

    Full text link
    In this paper, we will discuss the concept of an array variate random variable and introduce a class of skew normal array densities that are obtained through a selection model that uses the array variate normal density as the kernel and the cumulative distribution of the univariate normal distribution as the selection function.Comment: A part of this paper was taken from the technical report "Array Variate Random Variables with Multiway Kronecker Delta Covariance Matrix Structure" that is published in 2011 by Department of Mathematics and Statistics at the Bowling Green State Universit

    Array Variate Elliptical Random Variables with Multiway Kronecker Delta Covariance Matrix Structure

    Full text link
    Standard statistical methods applied to matrix random variables often fail to describe the underlying structure in multiway data sets. In this paper we will discuss the concept of an array variate random variable and introduce a class of elliptical array densities which have elliptical contours.Comment: A part of this paper appears in a Technical Report No: 11-02 published by Department of Mathematics and Statistics at the Bowling Green State Universit

    Slicing: Nonsingular Estimation of High Dimensional Covariance Matrices Using Multiway Kronecker Delta Covariance Structures

    Full text link
    Nonsingular estimation of high dimensional covariance matrices is an important step in many statistical procedures like classification, clustering, variable selection an future extraction. After a review of the essential background material, this paper introduces a technique we call slicing for obtaining a nonsingular covariance matrix of high dimensional data. Slicing is essentially assuming that the data has Kronecker delta covariance structure. Finally, we discuss the implications of the results in this paper and provide an example of classification for high dimensional gene expression data

    Training population selection for (breeding value) prediction

    Full text link
    Training population selection for genomic selection has captured a great deal of interest in animal and plant breeding. In this article, we derive a computationally efficient statistic to measure the reliability of estimates of genetic breeding values for a fixed set of genotypes based on a given training set of genotypes and phenotypes. We adopt a genetic algorithm scheme to find a training set of certain size from a larger set of candidate genotypes that optimizes this reliability measure. Our results show that, compared to a random sample of the same size, phenotyping individuals selected by our method results in models with better accuracies. We implement the proposed training selection methodology on four data sets, namely, the arabidopsis, wheat, rice and the maize data sets. Our results indicate that dynamic model building process which uses genotypes of the individuals in the test sample into account while selecting the training individuals improves the performance of GS models

    Locally epistatic genomic relationship matrices for genomic association, prediction and selection

    Full text link
    As the amount and complexity of genetic information increases it is necessary that we explore some efficient ways of handling these data. This study takes the "divide and conquer" approach for analyzing high dimensional genomic data. Our aims include reducing the dimensionality of the problem that has to be dealt one at a time, improving the performance and interpretability of the models. We propose using the inherent structures in the genome; to divide the bigger problem into manageable parts. In plant and animal breeding studies a distinction is made between the commercial value (additive + epistatic genetic effects) and the breeding value (additive genetic effects) of an individual since it is expected that some of the epistatic genetic effects will be lost due to recombination. In this paper, we argue that the breeder can take advantage of some of the epistatic marker effects in regions of low recombination. The models introduced here aim to estimate local epistatic line heritability by using the genetic map information and combine the local additive and epistatic effects. To this end, we have used semi-parametric mixed models with multiple local genomic relationship matrices with hierarchical testing designs and lasso post-processing for sparsity in the final model and speed. Our models produce good predictive performance along with genetic association information

    Genomic Prediction of Quantitative Traits using Sparse and Locally Epistatic Models

    Full text link
    In plant and animal breeding studies a distinction is made between the genetic value (additive + epistatic genetic effects) and the breeding value (additive genetic effects) of an individual since it is expected that some of the epistatic genetic effects will be lost due to recombination. In this paper, we argue that the breeder can take advantage of some of the epistatic marker effects in regions of low recombination. The models introduced here aim to estimate local epistatic line heritability by using the genetic map information and combine the local additive and epistatic effects. To this end, we have used semi-parametric mixed models with multiple local genomic relationship matrices with hierarchical designs and lasso post-processing for sparsity in the final model. Our models produce good predictive performance along with good explanatory information.Comment: arXiv admin note: substantial text overlap with arXiv:1302.346

    Soft Rule Ensembles for Statistical Learning

    Full text link
    In this article supervised learning problems are solved using soft rule ensembles. We first review the importance sampling learning ensembles (ISLE) approach that is useful for generating hard rules. The soft rules are then obtained with logistic regression from the corresponding hard rules. In order to deal with the perfect separation problem related to the logistic regression, Firth's bias corrected likelihood is used. Various examples and simulation results show that soft rule ensembles can improve predictive performance over hard rule ensembles.Comment: arXiv admin note: text overlap with arXiv:1112.369

    Efficient Breeding by Genomic Mating

    Full text link
    In this article, we propose an approach to breeding which focuses on mating instead of truncation selection, our method uses genome-wide marker information in a similar fashion to genomic selection so we refer it to as genomic mating. Using concepts of estimated breeding values, risk (usefulness) and inbreeding, an efficient mating approach is formulated for improvement of breeding values in the long run. We have used a genetic algorithm to find solutions to this optimization problem. Results from our simulations point to the efficiency of genomic mating for breeding complex traits compared to truncation selection

    Locally Epistatic Models for Genome-wide Prediction and Association by Importance Sampling

    Full text link
    In statistical genetics an important task involves building predictive models for the genotype-phenotype relationships and thus attribute a proportion of the total phenotypic variance to the variation in genotypes. Numerous models have been proposed to incorporate additive genetic effects into models for prediction or association. However, there is a scarcity of models that can adequately account for gene by gene or other forms of genetical interactions. In addition, there is an increased interest in using marker annotations in genome-wide prediction and association. In this paper, we discuss an hybrid modeling methodology which combines the parametric mixed modeling approach and the non-parametric rule ensembles. This approach gives us a flexible class of models that can be used to capture additive, locally epistatic genetic effects, gene x background interactions and allows us to incorporate one or more annotations into the genomic selection or association models. We use benchmark data sets covering a range of organisms and traits in addition to simulated data sets to illustrate the strengths of this approach. The improvement of model accuracies and association results suggest that a part of the "missing heritability" in complex traits can be captured by modeling local epistasis.Comment: *Corresponding Author: Deniz Akdemir ([email protected]

    Selection of training populations (and other subset selection problems) with an accelerated genetic algorithm (STPGA: An R-package for selection of training populations with a genetic algorithm)

    Full text link
    Optimal subset selection is an important task that has numerous algorithms designed for it and has many application areas. STPGA contains a special genetic algorithm supplemented with a tabu memory property (that keeps track of previously tried solutions and their fitness for a number of iterations), and with a regression of the fitness of the solutions on their coding that is used to form the ideal estimated solution (look ahead property) to search for solutions of generic optimal subset selection problems. I have initially developed the programs for the specific problem of selecting training populations for genomic prediction or association problems, therefore I give discussion of the theory behind optimal design of experiments to explain the default optimization criteria in STPGA, and illustrate the use of the programs in this endeavor. Nevertheless, I have picked a few other areas of application: supervised and unsupervised variable selection based on kernel alignment, supervised variable selection with design criteria, influential observation identification for regression, solving mixed integer quadratic optimization problems, balancing gains and inbreeding in a breeding population. Some of these illustrations pertain new statistical approaches
    • …
    corecore