176,338 research outputs found
Galaxy Modeling with Compound Elliptical Shapelets
Gauss-Hermite and Gauss-Laguerre ("shapelet") decompositions of images have
become important tools in galaxy modeling, particularly for the purpose of
extracting ellipticity and morphological information from astronomical data.
However, the standard shapelet basis functions cannot compactly represent
galaxies with high ellipticity or large Sersic index, and the resulting
underfitting bias has been shown to present a serious challenge for
weak-lensing methods based on shapelets. We present here a new convolution
relation and a compound "multi-scale" shapelet basis to address these problems,
and provide a proof-of-concept demonstration using a small sample of nearby
galaxies.Comment: 14 pages, 7 figure
High-Dimensional Bayesian Geostatistics
With the growing capabilities of Geographic Information Systems (GIS) and
user-friendly software, statisticians today routinely encounter geographically
referenced data containing observations from a large number of spatial
locations and time points. Over the last decade, hierarchical spatiotemporal
process models have become widely deployed statistical tools for researchers to
better understand the complex nature of spatial and temporal variability.
However, fitting hierarchical spatiotemporal models often involves expensive
matrix computations with complexity increasing in cubic order for the number of
spatial locations and temporal points. This renders such models unfeasible for
large data sets. This article offers a focused review of two methods for
constructing well-defined highly scalable spatiotemporal stochastic processes.
Both these processes can be used as "priors" for spatiotemporal random fields.
The first approach constructs a low-rank process operating on a
lower-dimensional subspace. The second approach constructs a Nearest-Neighbor
Gaussian Process (NNGP) that ensures sparse precision matrices for its finite
realizations. Both processes can be exploited as a scalable prior embedded
within a rich hierarchical modeling framework to deliver full Bayesian
inference. These approaches can be described as model-based solutions for big
spatiotemporal datasets. The models ensure that the algorithmic complexity has
floating point operations (flops), where the number of spatial
locations (per iteration). We compare these methods and provide some insight
into their methodological underpinnings
Data-Driven Estimation in Equilibrium Using Inverse Optimization
Equilibrium modeling is common in a variety of fields such as game theory and
transportation science. The inputs for these models, however, are often
difficult to estimate, while their outputs, i.e., the equilibria they are meant
to describe, are often directly observable. By combining ideas from inverse
optimization with the theory of variational inequalities, we develop an
efficient, data-driven technique for estimating the parameters of these models
from observed equilibria. We use this technique to estimate the utility
functions of players in a game from their observed actions and to estimate the
congestion function on a road network from traffic count data. A distinguishing
feature of our approach is that it supports both parametric and
\emph{nonparametric} estimation by leveraging ideas from statistical learning
(kernel methods and regularization operators). In computational experiments
involving Nash and Wardrop equilibria in a nonparametric setting, we find that
a) we effectively estimate the unknown demand or congestion function,
respectively, and b) our proposed regularization technique substantially
improves the out-of-sample performance of our estimators.Comment: 36 pages, 5 figures Additional theorems for generalization guarantees
and statistical analysis adde
Iterative Updating of Model Error for Bayesian Inversion
In computational inverse problems, it is common that a detailed and accurate
forward model is approximated by a computationally less challenging substitute.
The model reduction may be necessary to meet constraints in computing time when
optimization algorithms are used to find a single estimate, or to speed up
Markov chain Monte Carlo (MCMC) calculations in the Bayesian framework. The use
of an approximate model introduces a discrepancy, or modeling error, that may
have a detrimental effect on the solution of the ill-posed inverse problem, or
it may severely distort the estimate of the posterior distribution. In the
Bayesian paradigm, the modeling error can be considered as a random variable,
and by using an estimate of the probability distribution of the unknown, one
may estimate the probability distribution of the modeling error and incorporate
it into the inversion. We introduce an algorithm which iterates this idea to
update the distribution of the model error, leading to a sequence of posterior
distributions that are demonstrated empirically to capture the underlying truth
with increasing accuracy. Since the algorithm is not based on rejections, it
requires only limited full model evaluations.
We show analytically that, in the linear Gaussian case, the algorithm
converges geometrically fast with respect to the number of iterations. For more
general models, we introduce particle approximations of the iteratively
generated sequence of distributions; we also prove that each element of the
sequence converges in the large particle limit. We show numerically that, as in
the linear case, rapid convergence occurs with respect to the number of
iterations. Additionally, we show through computed examples that point
estimates obtained from this iterative algorithm are superior to those obtained
by neglecting the model error.Comment: 39 pages, 9 figure
Bayesian Approximate Kernel Regression with Variable Selection
Nonlinear kernel regression models are often used in statistics and machine
learning because they are more accurate than linear models. Variable selection
for kernel regression models is a challenge partly because, unlike the linear
regression setting, there is no clear concept of an effect size for regression
coefficients. In this paper, we propose a novel framework that provides an
effect size analog of each explanatory variable for Bayesian kernel regression
models when the kernel is shift-invariant --- for example, the Gaussian kernel.
We use function analytic properties of shift-invariant reproducing kernel
Hilbert spaces (RKHS) to define a linear vector space that: (i) captures
nonlinear structure, and (ii) can be projected onto the original explanatory
variables. The projection onto the original explanatory variables serves as an
analog of effect sizes. The specific function analytic property we use is that
shift-invariant kernel functions can be approximated via random Fourier bases.
Based on the random Fourier expansion we propose a computationally efficient
class of Bayesian approximate kernel regression (BAKR) models for both
nonlinear regression and binary classification for which one can compute an
analog of effect sizes. We illustrate the utility of BAKR by examining two
important problems in statistical genetics: genomic selection (i.e. phenotypic
prediction) and association mapping (i.e. inference of significant variants or
loci). State-of-the-art methods for genomic selection and association mapping
are based on kernel regression and linear models, respectively. BAKR is the
first method that is competitive in both settings.Comment: 22 pages, 3 figures, 3 tables; theory added; new simulations
presented; references adde
- …