41,036 research outputs found

    Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms

    Get PDF
    The paper extends existing models for multilevel multivariate data with mixed response types to handle quite general types and patterns of missing data values in a wide range of multilevel generalized linear models. It proposes an efficient Bayesian modelling approach that allows missing values in covariates, including models where there are interactions or other functions of covariates such as polynomials. The procedure can also be used to produce multiply imputed complete data sets. A simulation study is presented as well as the analysis of a longitudinal data set. The paper also shows how existing multiprocess models for handling endogeneity can be extended by the framework proposed

    Introducing COZIGAM: An R Package for Unconstrained and Constrained Zero-Inflated Generalized Additive Model Analysis

    Get PDF
    Zero-inflation problem is very common in ecological studies as well as other areas. Nonparametric regression with zero-inflated data may be studied via the zero-inflated generalized additive model (ZIGAM), which assumes that the zero-inflated responses come from a probabilistic mixture of zero and a regular component whose distribution belongs to the 1-parameter exponential family. With the further assumption that the probability of non-zero-inflation is some monotonic function of the mean of the regular component, we propose the constrained zero-inflated generalized additive model (COZIGAM) for analyzingzero-inflated data. When the hypothesized constraint obtains, the new approach provides a unified framework for modeling zero-inflated data, which is more parsimonious and efficient than the unconstrained ZIGAM. We have developed an R package COZIGAM which contains functions that implement an iterative algorithm for fitting ZIGAMs and COZIGAMs to zero-inflated data basedon the penalized likelihood approach. Other functions included in the packageare useful for model prediction and model selection. We demonstrate the use ofthe COZIGAM package via some simulation studies and a real application.

    Estimating linear functionals in nonlinear regression with responses missing at random

    Get PDF
    We consider regression models with parametric (linear or nonlinear) regression function and allow responses to be ``missing at random.'' We assume that the errors have mean zero and are independent of the covariates. In order to estimate expectations of functions of covariate and response we use a fully imputed estimator, namely an empirical estimator based on estimators of conditional expectations given the covariate. We exploit the independence of covariates and errors by writing the conditional expectations as unconditional expectations, which can now be estimated by empirical plug-in estimators. The mean zero constraint on the error distribution is exploited by adding suitable residual-based weights. We prove that the estimator is efficient (in the sense of H\'{a}jek and Le Cam) if an efficient estimator of the parameter is used. Our results give rise to new efficient estimators of smooth transformations of expectations. Estimation of the mean response is discussed as a special (degenerate) case.Comment: Published in at http://dx.doi.org/10.1214/08-AOS642 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Estimating distributions of potential outcomes using local instrumental variables with an application to changes in college enrollment and wage inequality

    Get PDF
    This paper extends the method of local instrumental variables developed by Heckman and Vyt- lacil (1999, 2001, 2005) to the estimation of not only means, but also distributions of potential outcomes. The newly developed method is illustrated by applying it to changes in college enroll- ment and wage inequality using data from the National Longitudinal Survey of Youth of 1979. Increases in college enrollment cause changes in the distribution of ability among college and high school graduates. This paper estimates a semiparametric selection model of schooling and wages to show that, for fixed skill prices, a 14% increase in college participation (analogous to the increase observed in the 1980s), reduces the college premium by 12% and increases the 90-10 percentile ratio among college graduates by 2

    Block-Conditional Missing at Random Models for Missing Data

    Full text link
    Two major ideas in the analysis of missing data are (a) the EM algorithm [Dempster, Laird and Rubin, J. Roy. Statist. Soc. Ser. B 39 (1977) 1--38] for maximum likelihood (ML) estimation, and (b) the formulation of models for the joint distribution of the data Z{Z} and missing data indicators M{M}, and associated "missing at random"; (MAR) condition under which a model for M{M} is unnecessary [Rubin, Biometrika 63 (1976) 581--592]. Most previous work has treated Z{Z} and M{M} as single blocks, yielding selection or pattern-mixture models depending on how their joint distribution is factorized. This paper explores "block-sequential"; models that interleave subsets of the variables and their missing data indicators, and then make parameter restrictions based on assumptions in each block. These include models that are not MAR. We examine a subclass of block-sequential models we call block-conditional MAR (BCMAR) models, and an associated block-monotone reduced likelihood strategy that typically yields consistent estimates by selectively discarding some data. Alternatively, full ML estimation can often be achieved via the EM algorithm. We examine in some detail BCMAR models for the case of two multinomially distributed categorical variables, and a two block structure where the first block is categorical and the second block arises from a (possibly multivariate) exponential family distribution.Comment: Published in at http://dx.doi.org/10.1214/10-STS344 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Probabilistic Inference from Arbitrary Uncertainty using Mixtures of Factorized Generalized Gaussians

    Full text link
    This paper presents a general and efficient framework for probabilistic inference and learning from arbitrary uncertain information. It exploits the calculation properties of finite mixture models, conjugate families and factorization. Both the joint probability density of the variables and the likelihood function of the (objective or subjective) observation are approximated by a special mixture model, in such a way that any desired conditional distribution can be directly obtained without numerical integration. We have developed an extended version of the expectation maximization (EM) algorithm to estimate the parameters of mixture models from uncertain training examples (indirect observations). As a consequence, any piece of exact or uncertain information about both input and output values is consistently handled in the inference and learning stages. This ability, extremely useful in certain situations, is not found in most alternative methods. The proposed framework is formally justified from standard probabilistic principles and illustrative examples are provided in the fields of nonparametric pattern classification, nonlinear regression and pattern completion. Finally, experiments on a real application and comparative results over standard databases provide empirical evidence of the utility of the method in a wide range of applications
    • ā€¦
    corecore