974 research outputs found

    Bayesian Semiparametric Multivariate Density Deconvolution

    Full text link
    We consider the problem of multivariate density deconvolution when the interest lies in estimating the distribution of a vector-valued random variable but precise measurements of the variable of interest are not available, observations being contaminated with additive measurement errors. The existing sparse literature on the problem assumes the density of the measurement errors to be completely known. We propose robust Bayesian semiparametric multivariate deconvolution approaches when the measurement error density is not known but replicated proxies are available for each unobserved value of the random vector. Additionally, we allow the variability of the measurement errors to depend on the associated unobserved value of the vector of interest through unknown relationships which also automatically includes the case of multivariate multiplicative measurement errors. Basic properties of finite mixture models, multivariate normal kernels and exchangeable priors are exploited in many novel ways to meet the modeling and computational challenges. Theoretical results that show the flexibility of the proposed methods are provided. We illustrate the efficiency of the proposed methods in recovering the true density of interest through simulation experiments. The methodology is applied to estimate the joint consumption pattern of different dietary components from contaminated 24 hour recalls

    Quantile Regression in the Presence of Sample Selection

    Get PDF
    Most sample selection models assume that the errors are independent of the regressors. Under this assumption, all quantile and mean functions are parallel, which implies that quantile estimators cannot reveal any (per definition non-existing) heterogeneity. However, quantile estimators are useful for testing the independence assumption, because they are consistent under the null hypothesis. We propose tests for this crucial restriction that are based on the entire conditional quantile regression process after correcting for sample selection bias. Monte Carlo simulations demonstrate that they are powerful and two empirical illustrations indicate that violations of this assumption are likely to be ubiquitous in labor economics.Sample selection, quantile regression, independence, test

    Bayesian Measurement Error Correction in Structured Additive Distributional Regression with an Application to the Analysis of Sensor Data on Soil-Plant Variability

    Full text link
    The flexibility of the Bayesian approach to account for covariates with measurement error is combined with semiparametric regression models for a class of continuous, discrete and mixed univariate response distributions with potentially all parameters depending on a structured additive predictor. Markov chain Monte Carlo enables a modular and numerically efficient implementation of Bayesian measurement error correction based on the imputation of unobserved error-free covariate values. We allow for very general measurement errors, including correlated replicates with heterogeneous variances. The proposal is first assessed by a simulation trial, then it is applied to the assessment of a soil-plant relationship crucial for implementing efficient agricultural management practices. Observations on multi-depth soil information forage ground-cover for a seven hectares Alfalfa stand in South Italy were obtained using sensors with very refined spatial resolution. Estimating a functional relation between ground-cover and soil with these data involves addressing issues linked to the spatial and temporal misalignment and the large data size. We propose a preliminary spatial interpolation on a lattice covering the field and subsequent analysis by a structured additive distributional regression model accounting for measurement error in the soil covariate. Results are interpreted and commented in connection to possible Alfalfa management strategies

    Variational inference for heteroscedastic and longitudinal regression models

    Full text link
    University of Technology Sydney. Faculty of Science.The focus of this thesis is on the development and assessment of mean field variational Bayes (MFVB), which is a fast, deterministic tool for inference in a Bayesian hierarchical model setting. We assess the performance of MFVB via the use of comprehensive comparisons against a Markov chain Monte Carlo (MCMC) benchmark. Each of the models considered are special cases of semiparametric regression. In particular, we focus on the development and assessment of the performance of MFVB for heteroscedastic and longitudinal semiparametric regression models. Generally, the new MFVB methodology performs well in its assessment of accuracy against MCMC for the semiparametric and nonparametric regression models considered in this thesis. It is also much faster and is shown to be applicable to real-time analyses. Several real data illustrations are provided. Altogether, MFVB proves to be a credible inference tool and a good alternative to MCMC, especially when analysis is hindered by time constraints

    Instrumental Regression in Partially Linear Models

    Get PDF
    We consider the semiparametric regression Xtβ+φ(Z) where β and φ(·) are unknown slope coefficient vector and function, and where the variables (X,Z) are endogeneous. We propose necessary and sufficient conditions for the identification of the parameters in the presence of instrumental variables. We also focus on the estimation of β. An incorrect parameterization of φ may generally lead to an inconsistent estimator of β, whereas even consistent nonparametric estimators for φ imply a slow rate of convergence of the estimator of β. An additional complication is that the solution of the equation necessitates the inversion of a compact operator that has to be estimated nonparametrically. In general this inversion is not stable, thus the estimation of β is ill-posed. In this paper, a √n-consistent estimator for β is derived under mild assumptions. One of these assumptions is given by the so-called source condition that is explicitly interprated in the paper. Finally we show that the estimator achieves the semiparametric efficiency bound, even if the model is heteroscedastic. Monte Carlo simulations demonstrate the reasonable performance of the estimation procedure on finite samples.

    Functional Regression

    Full text link
    Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively. This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development. First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression. For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods. The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed. At the end is a brief discussion describing potential areas of future development in this field

    An exact corrected log-likelihood function for Cox's proportional hazards model under measurement error and some extensions

    Get PDF
    This paper studies Cox`s proportional hazards model under covariate measurement error. Nakamura`s (1990) methodology of corrected log-likelihood will be applied to the so called Breslow likelihood, which is, in the absence of measurement error, equivalent to partial likelihood. For a general error model with possibly heteroscedastic and non-normal additive measurement error, corrected estimators of the regression parameter as well as of the baseline hazard rate are obtained. The estimators proposed by Nakamura (1992), Kong, Huang and Li (1998) and Kong and Gu (1999) are reestablished in the special cases considered there. This sheds new light on these estimators and justifies them as exact corrected score estimators. Finally, the method will be extended to some variants of the Cox model

    Quantile regression in partially linear varying coefficient models

    Full text link
    Semiparametric models are often considered for analyzing longitudinal data for a good balance between flexibility and parsimony. In this paper, we study a class of marginal partially linear quantile models with possibly varying coefficients. The functional coefficients are estimated by basis function approximations. The estimation procedure is easy to implement, and it requires no specification of the error distributions. The asymptotic properties of the proposed estimators are established for the varying coefficients as well as for the constant coefficients. We develop rank score tests for hypotheses on the coefficients, including the hypotheses on the constancy of a subset of the varying coefficients. Hypothesis testing of this type is theoretically challenging, as the dimensions of the parameter spaces under both the null and the alternative hypotheses are growing with the sample size. We assess the finite sample performance of the proposed method by Monte Carlo simulation studies, and demonstrate its value by the analysis of an AIDS data set, where the modeling of quantiles provides more comprehensive information than the usual least squares approach.Comment: Published in at http://dx.doi.org/10.1214/09-AOS695 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Survival Analysis of Microarray Data With Microarray Measurement Subject to Measurement Error

    Get PDF
    Microarray technology is essentially a measurement tool for measuring expressions of genes, and this measurement is subject to measurement error. Gene expressions could be employed as predictors for patient survival, and the measurement error involved in the gene expression is often ignored in the analysis of microarray data in the literature. Efforts are needed to establish statistical method for analyzing microarray data without ignoring the error in gene expression. A typical microarray data set has a large number of genes far exceeding the sample size. Proper selection of survival relevant genes contributes to an accurate prediction model. We study the effect of the measurement error on survival relevant gene selection under the accelerated failure time (AFT) model setting by regularizing weighted least square estimator with adaptive LASSO penalty. The simulation results and real data analysis show that ignoring measurement error will affect survival relevant gene selection. Simulation-Extrapolation (SIMEX) method is investigated to adjust the impact of measurement error to gene selection. The resulting model after adjustment is more accurate than the model selected by ignoring measurement error. Microarray experiments are often performed over a long period of time, and samples can be prepared and collected under different conditions. Moreover, different protocols or methodology may be applied in the experiment. All these factors contribute to a possibility of heteroscedastic measurement error associated with microarray data set. It is of practical importance to combine microarray data from different labs or platforms. We construct a prediction AFT model using data with heterogeneous covariate measurement error. Two variations of the SIMEX algorithm are investigated to adjust the effect of the mis-measured covariates. Simulation results show that the proposed method can achieve better prediction accuracy than the naive method. In this dissertation, the SIMEX method is used to adjust for the effects of covariate measurement error. This method is superior to other conventional methods in that it is not only more robust to distributional assumptions for error prone covariates, it also offers marked simplicity and flexibility for practical use. To implement this method, we developed an R package for general users
    corecore