5,554 research outputs found

    Estimation of a regression spline sample selection model

    Get PDF
    It is often the case that an outcome of interest is observed for a restricted non-randomly selected sample of the population. In such a situation, standard statistical analysis yields biased results. This issue can be addressed using sample selection models which are based on the estimation of two regressions: a binary selection equation determining whether a particular statistical unit will be available in the outcome equation. Classic sample selection models assume a priori that continuous regressors have a pre-specified linear or non-linear relationship to the outcome, which can lead to erroneous conclusions. In the case of continuous response, methods in which covariate effects are modeled flexibly have been previously proposed, the most recent being based on a Bayesian Markov chain Monte Carlo approach. A frequentist counterpart which has the advantage of being computationally fast is introduced. The proposed algorithm is based on the penalized likelihood estimation framework. The construction of confidence intervals is also discussed. The empirical properties of the existing and proposed methods are studied through a simulation study. The approaches are finally illustrated by analyzing data from the RAND Health Insurance Experiment on annual health expenditures

    parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics

    Full text link
    Recent advances in big data and analytics research have provided a wealth of large data sets that are too big to be analyzed in their entirety, due to restrictions on computer memory or storage size. New Bayesian methods have been developed for large data sets that are only large due to large sample sizes; these methods partition big data sets into subsets, and perform independent Bayesian Markov chain Monte Carlo analyses on the subsets. The methods then combine the independent subset posterior samples to estimate a posterior density given the full data set. These approaches were shown to be effective for Bayesian models including logistic regression models, Gaussian mixture models and hierarchical models. Here, we introduce the R package parallelMCMCcombine which carries out four of these techniques for combining independent subset posterior samples. We illustrate each of the methods using a Bayesian logistic regression model for simulation data and a Bayesian Gamma model for real data; we also demonstrate features and capabilities of the R package. The package assumes the user has carried out the Bayesian analysis and has produced the independent subposterior samples outside of the package. The methods are primarily suited to models with unknown parameters of fixed dimension that exist in continuous parameter spaces. We envision this tool will allow researchers to explore the various methods for their specific applications, and will assist future progress in this rapidly developing field.Comment: for published version see: http://www.plosone.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0108425&representation=PD

    Generalized structured additive regression based on Bayesian P-splines

    Get PDF
    Generalized additive models (GAM) for modelling nonlinear effects of continuous covariates are now well established tools for the applied statistician. In this paper we develop Bayesian GAM's and extensions to generalized structured additive regression based on one or two dimensional P-splines as the main building block. The approach extends previous work by Lang und Brezger (2003) for Gaussian responses. Inference relies on Markov chain Monte Carlo (MCMC) simulation techniques, and is either based on iteratively weighted least squares (IWLS) proposals or on latent utility representations of (multi)categorical regression models. Our approach covers the most common univariate response distributions, e.g. the Binomial, Poisson or Gamma distribution, as well as multicategorical responses. For the first time, we present Bayesian semiparametric inference for the widely used multinomial logit models. As we will demonstrate through two applications on the forest health status of trees and a space-time analysis of health insurance data, the approach allows realistic modelling of complex problems. We consider the enormous flexibility and extendability of our approach as a main advantage of Bayesian inference based on MCMC techniques compared to more traditional approaches. Software for the methodology presented in the paper is provided within the public domain package BayesX

    Bayesian Geoadditive Seemingly Unrelated Regression

    Get PDF
    Parametric seemingly unrelated regression (SUR) models are a common tool for multivariate regression analysis when error variables are reasonably correlated, so that separate univariate analysis may result in inefficient estimates of covariate effects. A weakness of parametric models is that they require strong assumptions on the functional form of possibly nonlinear effects of metrical covariates. In this paper, we develop a Bayesian semiparametric SUR model, where the usual linear predictors are replaced by more flexible additive predictors allowing for simultaneous nonparametric estimation of such covariate effects and of spatial effects. The approach is based on appropriate smoothness priors which allow different forms and degrees of smoothness in a general framework. Inference is fully Bayesian and uses recent Markov chain Monte Carlo techniques

    Monotonic regression based on Bayesian P-splines: an application to estimating price response functions from store-level scanner data

    Get PDF
    Generalized additive models have become a widely used instrument for flexible regression analysis. In many practical situations, however, it is desirable to restrict the flexibility of nonparametric estimation in order to accommodate a presumed monotonic relationship between a covariate and the response variable. For example, consumers usually will buy less of a brand if its price increases, and therefore one expects a brand's unit sales to be a decreasing function in own price. We follow a Bayesian approach using penalized B-splines and incorporate the assumption of monotonicity in a natural way by an appropriate specification of the respective prior distributions. We illustrate the methodology in an empirical application modeling demand for a brand of orange juice and show that imposing monotonicity constraints for own- and cross-item price effects improves the predictive validity of the estimated sales response function considerably

    A Bayesian semiparametric latent variable model for mixed responses

    Get PDF
    In this article we introduce a latent variable model (LVM) for mixed ordinal and continuous responses, where covariate effects on the continuous latent variables are modelled through a flexible semiparametric predictor. We extend existing LVM with simple linear covariate effects by including nonparametric components for nonlinear effects of continuous covariates and interactions with other covariates as well as spatial effects. Full Bayesian modelling is based on penalized spline and Markov random field priors and is performed by computationally efficient Markov chain Monte Carlo (MCMC) methods. We apply our approach to a large German social science survey which motivated our methodological development

    A semiparametric bivariate probit model for joint modeling of outcomes in STEMI patients

    Get PDF
    In this work we analyse the relationship among in-hospital mortality and a treatment effectiveness outcome in patients affected by ST-Elevation myocardial infarction. The main idea is to carry out a joint modeling of the two outcomes applying a Semiparametric Bivariate Probit Model to data arising from a clinical registry called STEMI Archive. A realistic quantification of the relationship between outcomes can be problematic for several reasons. First, latent factors associated with hospitals organization can affect the treatment efficacy and/or interact with patient’s condition at admission time. Moreover, they can also directly influence the mortality outcome. Such factors can be hardly measurable. Thus, the use of classical estimation methods will clearly result in inconsistent or biased parameter estimates. Secondly, covariate-outcomes relationships can exhibit nonlinear patterns. Provided that proper statistical methods for model fitting in such framework are available, it is possible to employ a simultaneous estimation approach to account for unobservable confounders. Such a framework can also provide flexible covariate structures and model the whole conditional distribution of the response

    Functional Regression

    Full text link
    Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively. This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development. First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression. For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods. The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed. At the end is a brief discussion describing potential areas of future development in this field
    corecore