15,510 research outputs found
Statistical unfolding of elementary particle spectra: Empirical Bayes estimation and bias-corrected uncertainty quantification
We consider the high energy physics unfolding problem where the goal is to
estimate the spectrum of elementary particles given observations distorted by
the limited resolution of a particle detector. This important statistical
inverse problem arising in data analysis at the Large Hadron Collider at CERN
consists in estimating the intensity function of an indirectly observed Poisson
point process. Unfolding typically proceeds in two steps: one first produces a
regularized point estimate of the unknown intensity and then uses the
variability of this estimator to form frequentist confidence intervals that
quantify the uncertainty of the solution. In this paper, we propose forming the
point estimate using empirical Bayes estimation which enables a data-driven
choice of the regularization strength through marginal maximum likelihood
estimation. Observing that neither Bayesian credible intervals nor standard
bootstrap confidence intervals succeed in achieving good frequentist coverage
in this problem due to the inherent bias of the regularized point estimate, we
introduce an iteratively bias-corrected bootstrap technique for constructing
improved confidence intervals. We show using simulations that this enables us
to achieve nearly nominal frequentist coverage with only a modest increase in
interval length. The proposed methodology is applied to unfolding the boson
invariant mass spectrum as measured in the CMS experiment at the Large Hadron
Collider.Comment: Published at http://dx.doi.org/10.1214/15-AOAS857 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org). arXiv admin note:
substantial text overlap with arXiv:1401.827
SLO-aware Colocation of Data Center Tasks Based on Instantaneous Processor Requirements
In a cloud data center, a single physical machine simultaneously executes
dozens of highly heterogeneous tasks. Such colocation results in more efficient
utilization of machines, but, when tasks' requirements exceed available
resources, some of the tasks might be throttled down or preempted. We analyze
version 2.1 of the Google cluster trace that shows short-term (1 second) task
CPU usage. Contrary to the assumptions taken by many theoretical studies, we
demonstrate that the empirical distributions do not follow any single
distribution. However, high percentiles of the total processor usage (summed
over at least 10 tasks) can be reasonably estimated by the Gaussian
distribution. We use this result for a probabilistic fit test, called the
Gaussian Percentile Approximation (GPA), for standard bin-packing algorithms.
To check whether a new task will fit into a machine, GPA checks whether the
resulting distribution's percentile corresponding to the requested service
level objective, SLO is still below the machine's capacity. In our simulation
experiments, GPA resulted in colocations exceeding the machines' capacity with
a frequency similar to the requested SLO.Comment: Author's version of a paper published in ACM SoCC'1
Variable Selection and Model Averaging in Semiparametric Overdispersed Generalized Linear Models
We express the mean and variance terms in a double exponential regression
model as additive functions of the predictors and use Bayesian variable
selection to determine which predictors enter the model, and whether they enter
linearly or flexibly. When the variance term is null we obtain a generalized
additive model, which becomes a generalized linear model if the predictors
enter the mean linearly. The model is estimated using Markov chain Monte Carlo
simulation and the methodology is illustrated using real and simulated data
sets.Comment: 8 graphs 35 page
Coordinate Transformation and Polynomial Chaos for the Bayesian Inference of a Gaussian Process with Parametrized Prior Covariance Function
This paper addresses model dimensionality reduction for Bayesian inference
based on prior Gaussian fields with uncertainty in the covariance function
hyper-parameters. The dimensionality reduction is traditionally achieved using
the Karhunen-\Loeve expansion of a prior Gaussian process assuming covariance
function with fixed hyper-parameters, despite the fact that these are uncertain
in nature. The posterior distribution of the Karhunen-Lo\`{e}ve coordinates is
then inferred using available observations. The resulting inferred field is
therefore dependent on the assumed hyper-parameters. Here, we seek to
efficiently estimate both the field and covariance hyper-parameters using
Bayesian inference. To this end, a generalized Karhunen-Lo\`{e}ve expansion is
derived using a coordinate transformation to account for the dependence with
respect to the covariance hyper-parameters. Polynomial Chaos expansions are
employed for the acceleration of the Bayesian inference using similar
coordinate transformations, enabling us to avoid expanding explicitly the
solution dependence on the uncertain hyper-parameters. We demonstrate the
feasibility of the proposed method on a transient diffusion equation by
inferring spatially-varying log-diffusivity fields from noisy data. The
inferred profiles were found closer to the true profiles when including the
hyper-parameters' uncertainty in the inference formulation.Comment: 34 pages, 17 figure
Image Coaddition with Temporally Varying Kernels
Large, multi-frequency imaging surveys, such as the Large Synaptic Survey
Telescope (LSST), need to do near-real time analysis of very large datasets.
This raises a host of statistical and computational problems where standard
methods do not work. In this paper, we study a proposed method for combining
stacks of images into a single summary image, sometimes referred to as a
template. This task is commonly referred to as image coaddition. In part, we
focus on a method proposed in previous work, which outlines a procedure for
combining stacks of images in an online fashion in the Fourier domain. We
evaluate this method by comparing it to two straightforward methods through the
use of various criteria and simulations. Note that the goal is not to propose
these comparison methods for use in their own right, but to ensure that
additional complexity also provides substantially improved performance
- …