97,809 research outputs found
Adaptive Reduced Rank Regression
We study the low rank regression problem , where and are and dimensional
vectors respectively. We consider the extreme high-dimensional setting where
the number of observations is less than . Existing algorithms
are designed for settings where is typically as large as
. This work provides an efficient algorithm which
only involves two SVD, and establishes statistical guarantees on its
performance. The algorithm decouples the problem by first estimating the
precision matrix of the features, and then solving the matrix denoising
problem. To complement the upper bound, we introduce new techniques for
establishing lower bounds on the performance of any algorithm for this problem.
Our preliminary experiments confirm that our algorithm often out-performs
existing baselines, and is always at least competitive.Comment: 40 page
A Bayesian generalized random regression model for estimating heritability using overdispersed count data
Background:
Faecal egg counts are a common indicator of nematode infection and since it is a heritable trait, it provides a marker for selective breeding. However, since resistance to disease changes as the adaptive immune system develops, quantifying temporal changes in heritability could help improve selective breeding programs. Faecal egg counts can be extremely skewed and difficult to handle statistically. Therefore, previous heritability analyses have log transformed faecal egg counts to estimate heritability on a latent scale. However, such transformations may not always be appropriate. In addition, analyses of faecal egg counts have typically used univariate rather than multivariate analyses such as random regression that are appropriate when traits are correlated. We present a method for estimating the heritability of untransformed faecal egg counts over the grazing season using random regression.
Results:
Replicating standard univariate analyses, we showed the dependence of heritability estimates on choice of transformation. Then, using a multitrait model, we exposed temporal correlations, highlighting the need for a random regression approach. Since random regression can sometimes involve the estimation of more parameters than observations or result in computationally intractable problems, we chose to investigate reduced rank random regression. Using standard software (WOMBAT), we discuss the estimation of variance components for log transformed data using both full and reduced rank analyses. Then, we modelled the untransformed data assuming it to be negative binomially distributed and used Metropolis Hastings to fit a generalized reduced rank random regression model with an additive genetic, permanent environmental and maternal effect. These three variance components explained more than 80 % of the total phenotypic variation, whereas the variance components for the log transformed data accounted for considerably less. The heritability, on a link scale, increased from around 0.25 at the beginning of the grazing season to around 0.4 at the end.
Conclusions:
Random regressions are a useful tool for quantifying sources of variation across time. Our MCMC (Markov chain Monte Carlo) algorithm provides a flexible approach to fitting random regression models to non-normal data. Here we applied the algorithm to negative binomially distributed faecal egg count data, but this method is readily applicable to other types of overdispersed data
Uncertainty Quantification in Bayesian Reduced-Rank Sparse Regressions
Reduced-rank regression recognises the possibility of a rank-deficient matrix
of coefficients, which is particularly useful when the data is
high-dimensional. We propose a novel Bayesian model for estimating the rank of
the rank of the coefficient matrix, which obviates the need of post-processing
steps, and allows for uncertainty quantification. Our method employs a mixture
prior on the regression coefficient matrix along with a global-local shrinkage
prior on its low-rank decomposition. Then, we rely on the Signal Adaptive
Variable Selector to perform sparsification, and define two novel tools, the
Posterior Inclusion Probability uncertainty index and the Relevance Index. The
validity of the method is assessed in a simulation study, then its advantages
and usefulness are shown in real-data applications on the chemical composition
of tobacco and on the photometry of galaxies
- …