301,057 research outputs found
Ignorability for categorical data
We study the problem of ignorability in likelihood-based inference from
incomplete categorical data. Two versions of the coarsened at random assumption
(car) are distinguished, their compatibility with the parameter distinctness
assumption is investigated and several conditions for ignorability that do not
require an extra parameter distinctness assumption are established. It is shown
that car assumptions have quite different implications depending on whether the
underlying complete-data model is saturated or parametric. In the latter case,
car assumptions can become inconsistent with observed data.Comment: Published at http://dx.doi.org/10.1214/009053605000000363 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Score, Pseudo-Score and Residual Diagnostics for Spatial Point Process Models
We develop new tools for formal inference and informal model validation in
the analysis of spatial point pattern data. The score test is generalized to a
"pseudo-score" test derived from Besag's pseudo-likelihood, and to a class of
diagnostics based on point process residuals. The results lend theoretical
support to the established practice of using functional summary statistics,
such as Ripley's -function, when testing for complete spatial randomness;
and they provide new tools such as the compensator of the -function for
testing other fitted models. The results also support localization methods such
as the scan statistic and smoothed residual plots. Software for computing the
diagnostics is provided.Comment: Published in at http://dx.doi.org/10.1214/11-STS367 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Inference Aware Neural Optimization for Top Pair Cross-Section Measurements with CMS Open Data
In recent years novel inference techniques have been developed based on the construction of summary statistics with neural networks by minimizing inference-motivated losses via automatic differentiation. The inference-aware summary statistics aim to be optimal with respect to the statistical inference goal of high energy physics analysis by accounting for the effects of nuisance parameters during the model training.
One such technique is INFERNO (P. de Castro and T. Dorigo, Comp.\ Phys.\ Comm.\ 244 (2019) 170) which was shown on toy problems to outperform classical summary statistics for the problem of confidence interval estimation in the presence of nuisance parameters.
In this thesis the algorithm is extended to common high energy physics problems based on a differentiable interpolation technique. In order to test and benchmark the algorithm in a real-world application, a complete, systematics-dominated analysis of the CMS experiment, "Measurement of the top-quark pair production cross section in the tau+jets channel in pp collisions at sqrt(s) = 7 TeV" (CMS Collaboration, The European Physical Journal C, 2013) is reproduced with CMS Open Data. The application of the INFERNO-powered neural network architecture to this analysis demonstrates the potential to reduce the impact of systematic uncertainties in real LHC analysis
Variable selection for model-based clustering using the integrated complete-data likelihood
Variable selection in cluster analysis is important yet challenging. It can
be achieved by regularization methods, which realize a trade-off between the
clustering accuracy and the number of selected variables by using a lasso-type
penalty. However, the calibration of the penalty term can suffer from
criticisms. Model selection methods are an efficient alternative, yet they
require a difficult optimization of an information criterion which involves
combinatorial problems. First, most of these optimization algorithms are based
on a suboptimal procedure (e.g. stepwise method). Second, the algorithms are
often greedy because they need multiple calls of EM algorithms. Here we propose
to use a new information criterion based on the integrated complete-data
likelihood. It does not require any estimate and its maximization is simple and
computationally efficient. The original contribution of our approach is to
perform the model selection without requiring any parameter estimation. Then,
parameter inference is needed only for the unique selected model. This approach
is used for the variable selection of a Gaussian mixture model with conditional
independence assumption. The numerical experiments on simulated and benchmark
datasets show that the proposed method often outperforms two classical
approaches for variable selection.Comment: submitted to Statistics and Computin
Modeling social networks from sampled data
Network models are widely used to represent relational information among
interacting units and the structural implications of these relations. Recently,
social network studies have focused a great deal of attention on random graph
models of networks whose nodes represent individual social actors and whose
edges represent a specified relationship between the actors. Most inference for
social network models assumes that the presence or absence of all possible
links is observed, that the information is completely reliable, and that there
are no measurement (e.g., recording) errors. This is clearly not true in
practice, as much network data is collected though sample surveys. In addition
even if a census of a population is attempted, individuals and links between
individuals are missed (i.e., do not appear in the recorded data). In this
paper we develop the conceptual and computational theory for inference based on
sampled network information. We first review forms of network sampling designs
used in practice. We consider inference from the likelihood framework, and
develop a typology of network data that reflects their treatment within this
frame. We then develop inference for social network models based on information
from adaptive network designs. We motivate and illustrate these ideas by
analyzing the effect of link-tracing sampling designs on a collaboration
network.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS221 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Parameter Expansion and Efficient Inference
This EM review article focuses on parameter expansion, a simple technique
introduced in the PX-EM algorithm to make EM converge faster while maintaining
its simplicity and stability. The primary objective concerns the connection
between parameter expansion and efficient inference. It reviews the statistical
interpretation of the PX-EM algorithm, in terms of efficient inference via bias
reduction, and further unfolds the PX-EM mystery by looking at PX-EM from
different perspectives. In addition, it briefly discusses potential
applications of parameter expansion to statistical inference and the broader
impact of statistical thinking on understanding and developing other iterative
optimization algorithms.Comment: Published in at http://dx.doi.org/10.1214/10-STS348 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …