22,590 research outputs found
Bayesian Approximate Kernel Regression with Variable Selection
Nonlinear kernel regression models are often used in statistics and machine
learning because they are more accurate than linear models. Variable selection
for kernel regression models is a challenge partly because, unlike the linear
regression setting, there is no clear concept of an effect size for regression
coefficients. In this paper, we propose a novel framework that provides an
effect size analog of each explanatory variable for Bayesian kernel regression
models when the kernel is shift-invariant --- for example, the Gaussian kernel.
We use function analytic properties of shift-invariant reproducing kernel
Hilbert spaces (RKHS) to define a linear vector space that: (i) captures
nonlinear structure, and (ii) can be projected onto the original explanatory
variables. The projection onto the original explanatory variables serves as an
analog of effect sizes. The specific function analytic property we use is that
shift-invariant kernel functions can be approximated via random Fourier bases.
Based on the random Fourier expansion we propose a computationally efficient
class of Bayesian approximate kernel regression (BAKR) models for both
nonlinear regression and binary classification for which one can compute an
analog of effect sizes. We illustrate the utility of BAKR by examining two
important problems in statistical genetics: genomic selection (i.e. phenotypic
prediction) and association mapping (i.e. inference of significant variants or
loci). State-of-the-art methods for genomic selection and association mapping
are based on kernel regression and linear models, respectively. BAKR is the
first method that is competitive in both settings.Comment: 22 pages, 3 figures, 3 tables; theory added; new simulations
presented; references adde
Sparse Signal Recovery under Poisson Statistics
We are motivated by problems that arise in a number of applications such as
Online Marketing and explosives detection, where the observations are usually
modeled using Poisson statistics. We model each observation as a Poisson random
variable whose mean is a sparse linear superposition of known patterns. Unlike
many conventional problems observations here are not identically distributed
since they are associated with different sensing modalities. We analyze the
performance of a Maximum Likelihood (ML) decoder, which for our Poisson setting
involves a non-linear optimization but yet is computationally tractable. We
derive fundamental sample complexity bounds for sparse recovery when the
measurements are contaminated with Poisson noise. In contrast to the
least-squares linear regression setting with Gaussian noise, we observe that in
addition to sparsity, the scale of the parameters also fundamentally impacts
sample complexity. We introduce a novel notion of Restricted Likelihood
Perturbation (RLP), to jointly account for scale and sparsity. We derive sample
complexity bounds for regularized ML estimators in terms of RLP and
further specialize these results for deterministic and random sensing matrix
designs.Comment: 13 pages, 11 figures, 2 tables, submitted to IEEE Transactions on
Signal Processin
- β¦