7,600 research outputs found

    Statistical unfolding of elementary particle spectra: Empirical Bayes estimation and bias-corrected uncertainty quantification

    Full text link
    We consider the high energy physics unfolding problem where the goal is to estimate the spectrum of elementary particles given observations distorted by the limited resolution of a particle detector. This important statistical inverse problem arising in data analysis at the Large Hadron Collider at CERN consists in estimating the intensity function of an indirectly observed Poisson point process. Unfolding typically proceeds in two steps: one first produces a regularized point estimate of the unknown intensity and then uses the variability of this estimator to form frequentist confidence intervals that quantify the uncertainty of the solution. In this paper, we propose forming the point estimate using empirical Bayes estimation which enables a data-driven choice of the regularization strength through marginal maximum likelihood estimation. Observing that neither Bayesian credible intervals nor standard bootstrap confidence intervals succeed in achieving good frequentist coverage in this problem due to the inherent bias of the regularized point estimate, we introduce an iteratively bias-corrected bootstrap technique for constructing improved confidence intervals. We show using simulations that this enables us to achieve nearly nominal frequentist coverage with only a modest increase in interval length. The proposed methodology is applied to unfolding the ZZ boson invariant mass spectrum as measured in the CMS experiment at the Large Hadron Collider.Comment: Published at http://dx.doi.org/10.1214/15-AOAS857 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org). arXiv admin note: substantial text overlap with arXiv:1401.827

    Missing.... presumed at random: cost-analysis of incomplete data

    Get PDF
    When collecting patient-level resource use data for statistical analysis, for some patients and in some categories of resource use, the required count will not be observed. Although this problem must arise in most reported economic evaluations containing patient-level data, it is rare for authors to detail how the problem was overcome. Statistical packages may default to handling missing data through a so-called complete case analysis, while some recent cost-analyses have appeared to favour an available case approach. Both of these methods are problematic: complete case analysis is inefficient and is likely to be biased; available case analysis, by employing different numbers of observations for each resource use item, generates severe problems for standard statistical inference. Instead we explore imputation methods for generating replacement values for missing data that will permit complete case analysis using the whole data set and we illustrate these methods using two data sets that had incomplete resource use information
    corecore