26,838 research outputs found
Population Intervention Models in Causal Inference
Marginal structural models (MSM) provide a powerful tool for estimating the causal effect of a] treatment variable or risk variable on the distribution of a disease in a population. These models, as originally introduced by Robins (e.g., Robins (2000a), Robins (2000b), van der Laan and Robins (2002)), model the marginal distributions of treatment-specific counterfactual outcomes, possibly conditional on a subset of the baseline covariates, and its dependence on treatment. Marginal structural models are particularly useful in the context of longitudinal data structures, in which each subject\u27s treatment and covariate history are measured over time, and an outcome is recorded at a final time point. In addition to the simpler, weighted regression approaches (inverse probability of treatment weighted estimators), more general (and robust) estimators have been developed and studied in detail for standard MSM (Robins (2000b), Neugebauer and van der Laan (2004), Yu and van der Laan (2003), van der Laan and Robins (2002)). In this paper we argue that in many applications one is interested in modeling the difference between a treatment-specific counterfactual population distribution and the actual population distribution of the target population of interest. Relevant parameters describe the effect of a hypothetical intervention on such a population, and therefore we refer to these models as intervention models. We focus on intervention models estimating the effect on an intervention in terms of a difference of means, ratio in means (e.g., relative risk if the outcome is binary), a so called switch relative risk for binary outcomes, and difference in entire distributions as measured by the quantile-quantile function. In addition, we provide a class of inverse probability of treatment weighed estimators, and double robust estimators of the causal parameters in these models. We illustrate the finite sample performance of these new estimators in a simulation study
Nonlinear shrinkage estimation of large-dimensional covariance matrices
Many statistical applications require an estimate of a covariance matrix
and/or its inverse. When the matrix dimension is large compared to the sample
size, which happens frequently, the sample covariance matrix is known to
perform poorly and may suffer from ill-conditioning. There already exists an
extensive literature concerning improved estimators in such situations. In the
absence of further knowledge about the structure of the true covariance matrix,
the most successful approach so far, arguably, has been shrinkage estimation.
Shrinking the sample covariance matrix to a multiple of the identity, by taking
a weighted average of the two, turns out to be equivalent to linearly shrinking
the sample eigenvalues to their grand mean, while retaining the sample
eigenvectors. Our paper extends this approach by considering nonlinear
transformations of the sample eigenvalues. We show how to construct an
estimator that is asymptotically equivalent to an oracle estimator suggested in
previous work. As demonstrated in extensive Monte Carlo simulations, the
resulting bona fide estimator can result in sizeable improvements over the
sample covariance matrix and also over linear shrinkage.Comment: Published in at http://dx.doi.org/10.1214/12-AOS989 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Nonparametric Additive Model-assisted Estimation for Survey Data
An additive model-assisted nonparametric method is investigated to estimate
the finite population totals of massive survey data with the aid of auxiliary
information. A class of estimators is proposed to improve the precision of the
well known Horvitz-Thompson estimators by combining the spline and local
polynomial smoothing methods. These estimators are calibrated, asymptotically
design-unbiased, consistent, normal and robust in the sense of asymptotically
attaining the Godambe-Joshi lower bound to the anticipated variance. A
consistent model selection procedure is further developed to select the
significant auxiliary variables. The proposed method is sufficiently fast to
analyze large survey data of high dimension within seconds. The performance of
the proposed method is assessed empirically via simulation studies
Empirical likelihood confidence intervals for complex sampling designs
We define an empirical likelihood approach which gives consistent design-based confidence intervals which can be calculated without the need of variance estimates, design effects, resampling, joint inclusion probabilities and linearization, even when the point estimator is not linear. It can be used to construct confidence intervals for a large class of sampling designs and estimators which are solutions of estimating equations. It can be used for means, regressions coefficients, quantiles, totals or counts even when the population size is unknown. It can be used with large sampling fractions and naturally includes calibration constraints. It can be viewed as an extension of the empirical likelihood approach to complex survey data. This approach is computationally simpler than the pseudoempirical likelihood and the bootstrap approaches. The simulation study shows that the confidence interval proposed may give better coverages than the confidence intervals based on linearization, bootstrap and pseudoempirical likelihood. Our simulation study shows that, under complex sampling designs, standard confidence intervals based on normality may have poor coverages, because point estimators may not follow a normal sampling distribution and their variance estimators may be biased.<br/
Spectrum Estimation: A Unified Framework for Covariance Matrix Estimation and PCA in Large Dimensions
Covariance matrix estimation and principal component analysis (PCA) are two
cornerstones of multivariate analysis. Classic textbook solutions perform
poorly when the dimension of the data is of a magnitude similar to the sample
size, or even larger. In such settings, there is a common remedy for both
statistical problems: nonlinear shrinkage of the eigenvalues of the sample
covariance matrix. The optimal nonlinear shrinkage formula depends on unknown
population quantities and is thus not available. It is, however, possible to
consistently estimate an oracle nonlinear shrinkage, which is motivated on
asymptotic grounds. A key tool to this end is consistent estimation of the set
of eigenvalues of the population covariance matrix (also known as the
spectrum), an interesting and challenging problem in its own right. Extensive
Monte Carlo simulations demonstrate that our methods have desirable
finite-sample properties and outperform previous proposals.Comment: 40 pages, 8 figures, 5 tables, University of Zurich, Department of
Economics, Working Paper No. 105, Revised version, July 201
- …