108 research outputs found
Array Variate Skew Normal Random Variables with Multiway Kronecker Delta Covariance Matrix Structure
In this paper, we will discuss the concept of an array variate random
variable and introduce a class of skew normal array densities that are obtained
through a selection model that uses the array variate normal density as the
kernel and the cumulative distribution of the univariate normal distribution as
the selection function.Comment: A part of this paper was taken from the technical report "Array
Variate Random Variables with Multiway Kronecker Delta Covariance Matrix
Structure" that is published in 2011 by Department of Mathematics and
Statistics at the Bowling Green State Universit
Array Variate Elliptical Random Variables with Multiway Kronecker Delta Covariance Matrix Structure
Standard statistical methods applied to matrix random variables often fail to
describe the underlying structure in multiway data sets. In this paper we will
discuss the concept of an array variate random variable and introduce a class
of elliptical array densities which have elliptical contours.Comment: A part of this paper appears in a Technical Report No: 11-02
published by Department of Mathematics and Statistics at the Bowling Green
State Universit
Slicing: Nonsingular Estimation of High Dimensional Covariance Matrices Using Multiway Kronecker Delta Covariance Structures
Nonsingular estimation of high dimensional covariance matrices is an
important step in many statistical procedures like classification, clustering,
variable selection an future extraction. After a review of the essential
background material, this paper introduces a technique we call slicing for
obtaining a nonsingular covariance matrix of high dimensional data. Slicing is
essentially assuming that the data has Kronecker delta covariance structure.
Finally, we discuss the implications of the results in this paper and provide
an example of classification for high dimensional gene expression data
Training population selection for (breeding value) prediction
Training population selection for genomic selection has captured a great deal
of interest in animal and plant breeding. In this article, we derive a
computationally efficient statistic to measure the reliability of estimates of
genetic breeding values for a fixed set of genotypes based on a given training
set of genotypes and phenotypes. We adopt a genetic algorithm scheme to find a
training set of certain size from a larger set of candidate genotypes that
optimizes this reliability measure. Our results show that, compared to a random
sample of the same size, phenotyping individuals selected by our method results
in models with better accuracies. We implement the proposed training selection
methodology on four data sets, namely, the arabidopsis, wheat, rice and the
maize data sets. Our results indicate that dynamic model building process which
uses genotypes of the individuals in the test sample into account while
selecting the training individuals improves the performance of GS models
Locally epistatic genomic relationship matrices for genomic association, prediction and selection
As the amount and complexity of genetic information increases it is necessary
that we explore some efficient ways of handling these data. This study takes
the "divide and conquer" approach for analyzing high dimensional genomic data.
Our aims include reducing the dimensionality of the problem that has to be
dealt one at a time, improving the performance and interpretability of the
models. We propose using the inherent structures in the genome; to divide the
bigger problem into manageable parts. In plant and animal breeding studies a
distinction is made between the commercial value (additive + epistatic genetic
effects) and the breeding value (additive genetic effects) of an individual
since it is expected that some of the epistatic genetic effects will be lost
due to recombination. In this paper, we argue that the breeder can take
advantage of some of the epistatic marker effects in regions of low
recombination. The models introduced here aim to estimate local epistatic line
heritability by using the genetic map information and combine the local
additive and epistatic effects. To this end, we have used semi-parametric mixed
models with multiple local genomic relationship matrices with hierarchical
testing designs and lasso post-processing for sparsity in the final model and
speed. Our models produce good predictive performance along with genetic
association information
Genomic Prediction of Quantitative Traits using Sparse and Locally Epistatic Models
In plant and animal breeding studies a distinction is made between the
genetic value (additive + epistatic genetic effects) and the breeding value
(additive genetic effects) of an individual since it is expected that some of
the epistatic genetic effects will be lost due to recombination. In this paper,
we argue that the breeder can take advantage of some of the epistatic marker
effects in regions of low recombination. The models introduced here aim to
estimate local epistatic line heritability by using the genetic map information
and combine the local additive and epistatic effects. To this end, we have used
semi-parametric mixed models with multiple local genomic relationship matrices
with hierarchical designs and lasso post-processing for sparsity in the final
model. Our models produce good predictive performance along with good
explanatory information.Comment: arXiv admin note: substantial text overlap with arXiv:1302.346
Soft Rule Ensembles for Statistical Learning
In this article supervised learning problems are solved using soft rule
ensembles. We first review the importance sampling learning ensembles (ISLE)
approach that is useful for generating hard rules. The soft rules are then
obtained with logistic regression from the corresponding hard rules. In order
to deal with the perfect separation problem related to the logistic regression,
Firth's bias corrected likelihood is used. Various examples and simulation
results show that soft rule ensembles can improve predictive performance over
hard rule ensembles.Comment: arXiv admin note: text overlap with arXiv:1112.369
Efficient Breeding by Genomic Mating
In this article, we propose an approach to breeding which focuses on mating
instead of truncation selection, our method uses genome-wide marker information
in a similar fashion to genomic selection so we refer it to as genomic mating.
Using concepts of estimated breeding values, risk (usefulness) and inbreeding,
an efficient mating approach is formulated for improvement of breeding values
in the long run. We have used a genetic algorithm to find solutions to this
optimization problem. Results from our simulations point to the efficiency of
genomic mating for breeding complex traits compared to truncation selection
Locally Epistatic Models for Genome-wide Prediction and Association by Importance Sampling
In statistical genetics an important task involves building predictive models
for the genotype-phenotype relationships and thus attribute a proportion of the
total phenotypic variance to the variation in genotypes. Numerous models have
been proposed to incorporate additive genetic effects into models for
prediction or association. However, there is a scarcity of models that can
adequately account for gene by gene or other forms of genetical interactions.
In addition, there is an increased interest in using marker annotations in
genome-wide prediction and association. In this paper, we discuss an hybrid
modeling methodology which combines the parametric mixed modeling approach and
the non-parametric rule ensembles. This approach gives us a flexible class of
models that can be used to capture additive, locally epistatic genetic effects,
gene x background interactions and allows us to incorporate one or more
annotations into the genomic selection or association models. We use benchmark
data sets covering a range of organisms and traits in addition to simulated
data sets to illustrate the strengths of this approach. The improvement of
model accuracies and association results suggest that a part of the "missing
heritability" in complex traits can be captured by modeling local epistasis.Comment: *Corresponding Author: Deniz Akdemir ([email protected]
Selection of training populations (and other subset selection problems) with an accelerated genetic algorithm (STPGA: An R-package for selection of training populations with a genetic algorithm)
Optimal subset selection is an important task that has numerous algorithms
designed for it and has many application areas. STPGA contains a special
genetic algorithm supplemented with a tabu memory property (that keeps track of
previously tried solutions and their fitness for a number of iterations), and
with a regression of the fitness of the solutions on their coding that is used
to form the ideal estimated solution (look ahead property) to search for
solutions of generic optimal subset selection problems. I have initially
developed the programs for the specific problem of selecting training
populations for genomic prediction or association problems, therefore I give
discussion of the theory behind optimal design of experiments to explain the
default optimization criteria in STPGA, and illustrate the use of the programs
in this endeavor. Nevertheless, I have picked a few other areas of application:
supervised and unsupervised variable selection based on kernel alignment,
supervised variable selection with design criteria, influential observation
identification for regression, solving mixed integer quadratic optimization
problems, balancing gains and inbreeding in a breeding population. Some of
these illustrations pertain new statistical approaches
- …