351,870 research outputs found
Gaussian Process Regression for Estimating EM Ducting Within the Marine Atmospheric Boundary Layer
We show that Gaussian process regression (GPR) can be used to infer the
electromagnetic (EM) duct height within the marine atmospheric boundary layer
(MABL) from sparsely sampled propagation factors within the context of bistatic
radars. We use GPR to calculate the posterior predictive distribution on the
labels (i.e. duct height) from both noise-free and noise-contaminated array of
propagation factors. For duct height inference from noise-contaminated
propagation factors, we compare a naive approach, utilizing one random sample
from the input distribution (i.e. disregarding the input noise), with an
inverse-variance weighted approach, utilizing a few random samples to estimate
the true predictive distribution. The resulting posterior predictive
distributions from these two approaches are compared to a "ground truth"
distribution, which is approximated using a large number of Monte-Carlo
samples. The ability of GPR to yield accurate and fast duct height predictions
using a few training examples indicates the suitability of the proposed method
for real-time applications.Comment: 15 pages, 6 figure
Deconvolution Estimation in Measurement Error Models: The R Package decon
Data from many scientific areas often come with measurement error. Density or distribution function estimation from contaminated data and nonparametric regression with errors in variables are two important topics in measurement error models. In this paper, we present a new software package decon for R, which contains a collection of functions that use the deconvolution kernel methods to deal with the measurement error problems. The functions allow the errors to be either homoscedastic or heteroscedastic. To make the deconvolution estimators computationally more efficient in R, we adapt the fast Fourier transform algorithm for density estimation with error-free data to the deconvolution kernel estimation. We discuss the practical selection of the smoothing parameter in deconvolution methods and illustrate the use of the package through both simulated and real examples.
Simultaneous confidence tubes in multivariate linear regression
Simultaneous confidence bands have been shown in the statistical literature as powerful inferential tools in univariate linear regression. While the methodology of simultaneous confidence bands for univariate linear regression has been extensively researched and well developed, no published work seems available for multivariate linear regression. This paper fills this gap by studying one particular simultaneous confidence band for multivariate linear regression. Because of the shape of the band, the word ‘tube’ is more pertinent and so will be used to replace the word ‘band’. It is shown that the construction of the tube is related to the distribution of the largest eigenvalue. A simulation-based method is proposed to compute the 1 − α quantile of this eigenvalue. With the computation power of modern computers, the simultaneous confidence tube can be computed fast and accurately. A real-data example is used to illustrate the method, and many potential research problems have been pointed out
Speculative Approximations for Terascale Analytics
Model calibration is a major challenge faced by the plethora of statistical
analytics packages that are increasingly used in Big Data applications.
Identifying the optimal model parameters is a time-consuming process that has
to be executed from scratch for every dataset/model combination even by
experienced data scientists. We argue that the incapacity to evaluate multiple
parameter configurations simultaneously and the lack of support to quickly
identify sub-optimal configurations are the principal causes. In this paper, we
develop two database-inspired techniques for efficient model calibration.
Speculative parameter testing applies advanced parallel multi-query processing
methods to evaluate several configurations concurrently. The number of
configurations is determined adaptively at runtime, while the configurations
themselves are extracted from a distribution that is continuously learned
following a Bayesian process. Online aggregation is applied to identify
sub-optimal configurations early in the processing by incrementally sampling
the training dataset and estimating the objective function corresponding to
each configuration. We design concurrent online aggregation estimators and
define halting conditions to accurately and timely stop the execution. We apply
the proposed techniques to distributed gradient descent optimization -- batch
and incremental -- for support vector machines and logistic regression models.
We implement the resulting solutions in GLADE PF-OLA -- a state-of-the-art Big
Data analytics system -- and evaluate their performance over terascale-size
synthetic and real datasets. The results confirm that as many as 32
configurations can be evaluated concurrently almost as fast as one, while
sub-optimal configurations are detected accurately in as little as a
fraction of the time
Deconvolution Estimation in Measurement Error Models: The R Package decon
Data from many scientific areas often come with measurement error. Density or distribution function estimation from contaminated data and nonparametric regression with errors in variables are two important topics in measurement error models. In this paper, we present a new software package decon for R, which contains a collection of functions that use the deconvolution kernel methods to deal with the measurement error problems. The functions allow the errors to be either homoscedastic or heteroscedastic. To make the deconvolution estimators computationally more efficient in R, we adapt the fast Fourier transform algorithm for density estimation with error-free data to the deconvolution kernel estimation. We discuss the practical selection of the smoothing parameter in deconvolution methods and illustrate the use of the package through both simulated and real examples
- …