301,057 research outputs found

    Ignorability for categorical data

    Full text link
    We study the problem of ignorability in likelihood-based inference from incomplete categorical data. Two versions of the coarsened at random assumption (car) are distinguished, their compatibility with the parameter distinctness assumption is investigated and several conditions for ignorability that do not require an extra parameter distinctness assumption are established. It is shown that car assumptions have quite different implications depending on whether the underlying complete-data model is saturated or parametric. In the latter case, car assumptions can become inconsistent with observed data.Comment: Published at http://dx.doi.org/10.1214/009053605000000363 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Score, Pseudo-Score and Residual Diagnostics for Spatial Point Process Models

    Full text link
    We develop new tools for formal inference and informal model validation in the analysis of spatial point pattern data. The score test is generalized to a "pseudo-score" test derived from Besag's pseudo-likelihood, and to a class of diagnostics based on point process residuals. The results lend theoretical support to the established practice of using functional summary statistics, such as Ripley's KK-function, when testing for complete spatial randomness; and they provide new tools such as the compensator of the KK-function for testing other fitted models. The results also support localization methods such as the scan statistic and smoothed residual plots. Software for computing the diagnostics is provided.Comment: Published in at http://dx.doi.org/10.1214/11-STS367 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Inference Aware Neural Optimization for Top Pair Cross-Section Measurements with CMS Open Data

    Get PDF
    In recent years novel inference techniques have been developed based on the construction of summary statistics with neural networks by minimizing inference-motivated losses via automatic differentiation. The inference-aware summary statistics aim to be optimal with respect to the statistical inference goal of high energy physics analysis by accounting for the effects of nuisance parameters during the model training. One such technique is INFERNO (P. de Castro and T. Dorigo, Comp.\ Phys.\ Comm.\ 244 (2019) 170) which was shown on toy problems to outperform classical summary statistics for the problem of confidence interval estimation in the presence of nuisance parameters. In this thesis the algorithm is extended to common high energy physics problems based on a differentiable interpolation technique. In order to test and benchmark the algorithm in a real-world application, a complete, systematics-dominated analysis of the CMS experiment, "Measurement of the top-quark pair production cross section in the tau+jets channel in pp collisions at sqrt(s) = 7 TeV" (CMS Collaboration, The European Physical Journal C, 2013) is reproduced with CMS Open Data. The application of the INFERNO-powered neural network architecture to this analysis demonstrates the potential to reduce the impact of systematic uncertainties in real LHC analysis

    Variable selection for model-based clustering using the integrated complete-data likelihood

    Full text link
    Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty. However, the calibration of the penalty term can suffer from criticisms. Model selection methods are an efficient alternative, yet they require a difficult optimization of an information criterion which involves combinatorial problems. First, most of these optimization algorithms are based on a suboptimal procedure (e.g. stepwise method). Second, the algorithms are often greedy because they need multiple calls of EM algorithms. Here we propose to use a new information criterion based on the integrated complete-data likelihood. It does not require any estimate and its maximization is simple and computationally efficient. The original contribution of our approach is to perform the model selection without requiring any parameter estimation. Then, parameter inference is needed only for the unique selected model. This approach is used for the variable selection of a Gaussian mixture model with conditional independence assumption. The numerical experiments on simulated and benchmark datasets show that the proposed method often outperforms two classical approaches for variable selection.Comment: submitted to Statistics and Computin

    Modeling social networks from sampled data

    Full text link
    Network models are widely used to represent relational information among interacting units and the structural implications of these relations. Recently, social network studies have focused a great deal of attention on random graph models of networks whose nodes represent individual social actors and whose edges represent a specified relationship between the actors. Most inference for social network models assumes that the presence or absence of all possible links is observed, that the information is completely reliable, and that there are no measurement (e.g., recording) errors. This is clearly not true in practice, as much network data is collected though sample surveys. In addition even if a census of a population is attempted, individuals and links between individuals are missed (i.e., do not appear in the recorded data). In this paper we develop the conceptual and computational theory for inference based on sampled network information. We first review forms of network sampling designs used in practice. We consider inference from the likelihood framework, and develop a typology of network data that reflects their treatment within this frame. We then develop inference for social network models based on information from adaptive network designs. We motivate and illustrate these ideas by analyzing the effect of link-tracing sampling designs on a collaboration network.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS221 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Parameter Expansion and Efficient Inference

    Full text link
    This EM review article focuses on parameter expansion, a simple technique introduced in the PX-EM algorithm to make EM converge faster while maintaining its simplicity and stability. The primary objective concerns the connection between parameter expansion and efficient inference. It reviews the statistical interpretation of the PX-EM algorithm, in terms of efficient inference via bias reduction, and further unfolds the PX-EM mystery by looking at PX-EM from different perspectives. In addition, it briefly discusses potential applications of parameter expansion to statistical inference and the broader impact of statistical thinking on understanding and developing other iterative optimization algorithms.Comment: Published in at http://dx.doi.org/10.1214/10-STS348 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore