41,036 research outputs found
Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms
The paper extends existing models for multilevel multivariate data with mixed response types to handle quite general types and patterns of missing data values in a wide range of multilevel generalized linear models. It proposes an efficient Bayesian modelling approach that allows missing values in covariates, including models where there are interactions or other functions of covariates such as polynomials. The procedure can also be used to produce multiply imputed complete data sets. A simulation study is presented as well as the analysis of a longitudinal data set. The paper also shows how existing multiprocess models for handling endogeneity can be extended by the framework proposed
Introducing COZIGAM: An R Package for Unconstrained and Constrained Zero-Inflated Generalized Additive Model Analysis
Zero-inflation problem is very common in ecological studies as well as other areas. Nonparametric regression with zero-inflated data may be studied via the zero-inflated generalized additive model (ZIGAM), which assumes that the zero-inflated responses come from a probabilistic mixture of zero and a regular component whose distribution belongs to the 1-parameter exponential family. With the further assumption that the probability of non-zero-inflation is some monotonic function of the mean of the regular component, we propose the constrained zero-inflated generalized additive model (COZIGAM) for analyzingzero-inflated data. When the hypothesized constraint obtains, the new approach provides a unified framework for modeling zero-inflated data, which is more parsimonious and efficient than the unconstrained ZIGAM. We have developed an R package COZIGAM which contains functions that implement an iterative algorithm for fitting ZIGAMs and COZIGAMs to zero-inflated data basedon the penalized likelihood approach. Other functions included in the packageare useful for model prediction and model selection. We demonstrate the use ofthe COZIGAM package via some simulation studies and a real application.
Estimating linear functionals in nonlinear regression with responses missing at random
We consider regression models with parametric (linear or nonlinear)
regression function and allow responses to be ``missing at random.'' We assume
that the errors have mean zero and are independent of the covariates. In order
to estimate expectations of functions of covariate and response we use a fully
imputed estimator, namely an empirical estimator based on estimators of
conditional expectations given the covariate. We exploit the independence of
covariates and errors by writing the conditional expectations as unconditional
expectations, which can now be estimated by empirical plug-in estimators. The
mean zero constraint on the error distribution is exploited by adding suitable
residual-based weights. We prove that the estimator is efficient (in the sense
of H\'{a}jek and Le Cam) if an efficient estimator of the parameter is used.
Our results give rise to new efficient estimators of smooth transformations of
expectations. Estimation of the mean response is discussed as a special
(degenerate) case.Comment: Published in at http://dx.doi.org/10.1214/08-AOS642 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Estimating distributions of potential outcomes using local instrumental variables with an application to changes in college enrollment and wage inequality
This paper extends the method of local instrumental variables developed by Heckman and Vyt-
lacil (1999, 2001, 2005) to the estimation of not only means, but also distributions of potential
outcomes. The newly developed method is illustrated by applying it to changes in college enroll-
ment and wage inequality using data from the National Longitudinal Survey of Youth of 1979.
Increases in college enrollment cause changes in the distribution of ability among college and high
school graduates. This paper estimates a semiparametric selection model of schooling and wages to
show that, for fixed skill prices, a 14% increase in college participation (analogous to the increase
observed in the 1980s), reduces the college premium by 12% and increases the 90-10 percentile ratio
among college graduates by 2
Block-Conditional Missing at Random Models for Missing Data
Two major ideas in the analysis of missing data are (a) the EM algorithm
[Dempster, Laird and Rubin, J. Roy. Statist. Soc. Ser. B 39 (1977) 1--38] for
maximum likelihood (ML) estimation, and (b) the formulation of models for the
joint distribution of the data and missing data indicators , and
associated "missing at random"; (MAR) condition under which a model for
is unnecessary [Rubin, Biometrika 63 (1976) 581--592]. Most previous work has
treated and as single blocks, yielding selection or pattern-mixture
models depending on how their joint distribution is factorized. This paper
explores "block-sequential"; models that interleave subsets of the variables
and their missing data indicators, and then make parameter restrictions based
on assumptions in each block. These include models that are not MAR. We examine
a subclass of block-sequential models we call block-conditional MAR (BCMAR)
models, and an associated block-monotone reduced likelihood strategy that
typically yields consistent estimates by selectively discarding some data.
Alternatively, full ML estimation can often be achieved via the EM algorithm.
We examine in some detail BCMAR models for the case of two multinomially
distributed categorical variables, and a two block structure where the first
block is categorical and the second block arises from a (possibly multivariate)
exponential family distribution.Comment: Published in at http://dx.doi.org/10.1214/10-STS344 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Probabilistic Inference from Arbitrary Uncertainty using Mixtures of Factorized Generalized Gaussians
This paper presents a general and efficient framework for probabilistic
inference and learning from arbitrary uncertain information. It exploits the
calculation properties of finite mixture models, conjugate families and
factorization. Both the joint probability density of the variables and the
likelihood function of the (objective or subjective) observation are
approximated by a special mixture model, in such a way that any desired
conditional distribution can be directly obtained without numerical
integration. We have developed an extended version of the expectation
maximization (EM) algorithm to estimate the parameters of mixture models from
uncertain training examples (indirect observations). As a consequence, any
piece of exact or uncertain information about both input and output values is
consistently handled in the inference and learning stages. This ability,
extremely useful in certain situations, is not found in most alternative
methods. The proposed framework is formally justified from standard
probabilistic principles and illustrative examples are provided in the fields
of nonparametric pattern classification, nonlinear regression and pattern
completion. Finally, experiments on a real application and comparative results
over standard databases provide empirical evidence of the utility of the method
in a wide range of applications
- ā¦