1,104 research outputs found
Model Assessment Tools for a Model False World
A standard goal of model evaluation and selection is to find a model that
approximates the truth well while at the same time is as parsimonious as
possible. In this paper we emphasize the point of view that the models under
consideration are almost always false, if viewed realistically, and so we
should analyze model adequacy from that point of view. We investigate this
issue in large samples by looking at a model credibility index, which is
designed to serve as a one-number summary measure of model adequacy. We define
the index to be the maximum sample size at which samples from the model and
those from the true data generating mechanism are nearly indistinguishable. We
use standard notions from hypothesis testing to make this definition precise.
We use data subsampling to estimate the index. We show that the definition
leads us to some new ways of viewing models as flawed but useful. The concept
is an extension of the work of Davies [Statist. Neerlandica 49 (1995)
185--245].Comment: Published in at http://dx.doi.org/10.1214/09-STS302 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Building and using semiparametric tolerance regions for parametric multinomial models
We introduce a semiparametric ``tubular neighborhood'' of a parametric model
in the multinomial setting. It consists of all multinomial distributions lying
in a distance-based neighborhood of the parametric model of interest. Fitting
such a tubular model allows one to use a parametric model while treating it as
an approximation to the true distribution. In this paper, the Kullback--Leibler
distance is used to build the tubular region. Based on this idea one can define
the distance between the true multinomial distribution and the parametric model
to be the index of fit. The paper develops a likelihood ratio test procedure
for testing the magnitude of the index. A semiparametric bootstrap method is
implemented to better approximate the distribution of the LRT statistic. The
approximation permits more accurate construction of a lower confidence limit
for the model fitting index.Comment: Published in at http://dx.doi.org/10.1214/08-AOS603 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
The topography of multivariate normal mixtures
Multivariate normal mixtures provide a flexible method of fitting
high-dimensional data. It is shown that their topography, in the sense of their
key features as a density, can be analyzed rigorously in lower dimensions by
use of a ridgeline manifold that contains all critical points, as well as the
ridges of the density. A plot of the elevations on the ridgeline shows the key
features of the mixed density. In addition, by use of the ridgeline, we uncover
a function that determines the number of modes of the mixed density when there
are two components being mixed. A followup analysis then gives a curvature
function that can be used to prove a set of modality theorems.Comment: Published at http://dx.doi.org/10.1214/009053605000000417 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Estimating the number of classes
Estimating the unknown number of classes in a population has numerous
important applications. In a Poisson mixture model, the problem is reduced to
estimating the odds that a class is undetected in a sample. The discontinuity
of the odds prevents the existence of locally unbiased and informative
estimators and restricts confidence intervals to be one-sided. Confidence
intervals for the number of classes are also necessarily one-sided. A sequence
of lower bounds to the odds is developed and used to define pseudo maximum
likelihood estimators for the number of classes.Comment: Published at http://dx.doi.org/10.1214/009053606000001280 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …