20,156 research outputs found
Inference on Counterfactual Distributions
Counterfactual distributions are important ingredients for policy analysis
and decomposition analysis in empirical economics. In this article we develop
modeling and inference tools for counterfactual distributions based on
regression methods. The counterfactual scenarios that we consider consist of
ceteris paribus changes in either the distribution of covariates related to the
outcome of interest or the conditional distribution of the outcome given
covariates. For either of these scenarios we derive joint functional central
limit theorems and bootstrap validity results for regression-based estimators
of the status quo and counterfactual outcome distributions. These results allow
us to construct simultaneous confidence sets for function-valued effects of the
counterfactual changes, including the effects on the entire distribution and
quantile functions of the outcome as well as on related functionals. These
confidence sets can be used to test functional hypotheses such as no-effect,
positive effect, or stochastic dominance. Our theory applies to general
counterfactual changes and covers the main regression methods including
classical, quantile, duration, and distribution regressions. We illustrate the
results with an empirical application to wage decompositions using data for the
United States.
As a part of developing the main results, we introduce distribution
regression as a comprehensive and flexible tool for modeling and estimating the
\textit{entire} conditional distribution. We show that distribution regression
encompasses the Cox duration regression and represents a useful alternative to
quantile regression. We establish functional central limit theorems and
bootstrap validity results for the empirical distribution regression process
and various related functionals.Comment: 55 pages, 1 table, 3 figures, supplementary appendix with additional
results available from the authors' web site
Block-Conditional Missing at Random Models for Missing Data
Two major ideas in the analysis of missing data are (a) the EM algorithm
[Dempster, Laird and Rubin, J. Roy. Statist. Soc. Ser. B 39 (1977) 1--38] for
maximum likelihood (ML) estimation, and (b) the formulation of models for the
joint distribution of the data and missing data indicators , and
associated "missing at random"; (MAR) condition under which a model for
is unnecessary [Rubin, Biometrika 63 (1976) 581--592]. Most previous work has
treated and as single blocks, yielding selection or pattern-mixture
models depending on how their joint distribution is factorized. This paper
explores "block-sequential"; models that interleave subsets of the variables
and their missing data indicators, and then make parameter restrictions based
on assumptions in each block. These include models that are not MAR. We examine
a subclass of block-sequential models we call block-conditional MAR (BCMAR)
models, and an associated block-monotone reduced likelihood strategy that
typically yields consistent estimates by selectively discarding some data.
Alternatively, full ML estimation can often be achieved via the EM algorithm.
We examine in some detail BCMAR models for the case of two multinomially
distributed categorical variables, and a two block structure where the first
block is categorical and the second block arises from a (possibly multivariate)
exponential family distribution.Comment: Published in at http://dx.doi.org/10.1214/10-STS344 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Regression Discontinuity Designs Using Covariates
We study regression discontinuity designs when covariates are included in the
estimation. We examine local polynomial estimators that include discrete or
continuous covariates in an additive separable way, but without imposing any
parametric restrictions on the underlying population regression functions. We
recommend a covariate-adjustment approach that retains consistency under
intuitive conditions, and characterize the potential for estimation and
inference improvements. We also present new covariate-adjusted mean squared
error expansions and robust bias-corrected inference procedures, with
heteroskedasticity-consistent and cluster-robust standard errors. An empirical
illustration and an extensive simulation study is presented. All methods are
implemented in \texttt{R} and \texttt{Stata} software packages
A sparse conditional Gaussian graphical model for analysis of genetical genomics data
Genetical genomics experiments have now been routinely conducted to measure
both the genetic markers and gene expression data on the same subjects. The
gene expression levels are often treated as quantitative traits and are subject
to standard genetic analysis in order to identify the gene expression
quantitative loci (eQTL). However, the genetic architecture for many gene
expressions may be complex, and poorly estimated genetic architecture may
compromise the inferences of the dependency structures of the genes at the
transcriptional level. In this paper we introduce a sparse conditional Gaussian
graphical model for studying the conditional independent relationships among a
set of gene expressions adjusting for possible genetic effects where the gene
expressions are modeled with seemingly unrelated regressions. We present an
efficient coordinate descent algorithm to obtain the penalized estimation of
both the regression coefficients and the sparse concentration matrix. The
corresponding graph can be used to determine the conditional independence among
a group of genes while adjusting for shared genetic effects. Simulation
experiments and asymptotic convergence rates and sparsistency are used to
justify our proposed methods. By sparsistency, we mean the property that all
parameters that are zero are actually estimated as zero with probability
tending to one. We apply our methods to the analysis of a yeast eQTL data set
and demonstrate that the conditional Gaussian graphical model leads to a more
interpretable gene network than a standard Gaussian graphical model based on
gene expression data alone.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS494 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …