974 research outputs found
Bayesian Semiparametric Multivariate Density Deconvolution
We consider the problem of multivariate density deconvolution when the
interest lies in estimating the distribution of a vector-valued random variable
but precise measurements of the variable of interest are not available,
observations being contaminated with additive measurement errors. The existing
sparse literature on the problem assumes the density of the measurement errors
to be completely known. We propose robust Bayesian semiparametric multivariate
deconvolution approaches when the measurement error density is not known but
replicated proxies are available for each unobserved value of the random
vector. Additionally, we allow the variability of the measurement errors to
depend on the associated unobserved value of the vector of interest through
unknown relationships which also automatically includes the case of
multivariate multiplicative measurement errors. Basic properties of finite
mixture models, multivariate normal kernels and exchangeable priors are
exploited in many novel ways to meet the modeling and computational challenges.
Theoretical results that show the flexibility of the proposed methods are
provided. We illustrate the efficiency of the proposed methods in recovering
the true density of interest through simulation experiments. The methodology is
applied to estimate the joint consumption pattern of different dietary
components from contaminated 24 hour recalls
Quantile Regression in the Presence of Sample Selection
Most sample selection models assume that the errors are independent of the regressors. Under this assumption, all quantile and mean functions are parallel, which implies that quantile estimators cannot reveal any (per definition non-existing) heterogeneity. However, quantile estimators are useful for testing the independence assumption, because they are consistent under the null hypothesis. We propose tests for this crucial restriction that are based on the entire conditional quantile regression process after correcting for sample selection bias. Monte Carlo simulations demonstrate that they are powerful and two empirical illustrations indicate that violations of this assumption are likely to be ubiquitous in labor economics.Sample selection, quantile regression, independence, test
Bayesian Measurement Error Correction in Structured Additive Distributional Regression with an Application to the Analysis of Sensor Data on Soil-Plant Variability
The flexibility of the Bayesian approach to account for covariates with
measurement error is combined with semiparametric regression models for a class
of continuous, discrete and mixed univariate response distributions with
potentially all parameters depending on a structured additive predictor. Markov
chain Monte Carlo enables a modular and numerically efficient implementation of
Bayesian measurement error correction based on the imputation of unobserved
error-free covariate values. We allow for very general measurement errors,
including correlated replicates with heterogeneous variances. The proposal is
first assessed by a simulation trial, then it is applied to the assessment of a
soil-plant relationship crucial for implementing efficient agricultural
management practices. Observations on multi-depth soil information forage
ground-cover for a seven hectares Alfalfa stand in South Italy were obtained
using sensors with very refined spatial resolution. Estimating a functional
relation between ground-cover and soil with these data involves addressing
issues linked to the spatial and temporal misalignment and the large data size.
We propose a preliminary spatial interpolation on a lattice covering the field
and subsequent analysis by a structured additive distributional regression
model accounting for measurement error in the soil covariate. Results are
interpreted and commented in connection to possible Alfalfa management
strategies
Variational inference for heteroscedastic and longitudinal regression models
University of Technology Sydney. Faculty of Science.The focus of this thesis is on the development and assessment of mean field variational Bayes (MFVB), which is a fast, deterministic tool for inference in a Bayesian hierarchical model setting. We assess the performance of MFVB via the use of comprehensive comparisons against a Markov chain Monte Carlo (MCMC) benchmark. Each of the models considered are special cases of semiparametric regression. In particular, we focus on the development and assessment of the performance of MFVB for heteroscedastic and longitudinal semiparametric regression models. Generally, the new MFVB methodology performs well in its assessment of accuracy against MCMC for the semiparametric and nonparametric regression models considered in this thesis. It is also much faster and is shown to be applicable to real-time analyses. Several real data illustrations are provided. Altogether, MFVB proves to be a credible inference tool and a good alternative to MCMC, especially when analysis is hindered by time constraints
Instrumental Regression in Partially Linear Models
We consider the semiparametric regression Xtβ+φ(Z) where β and φ(·) are unknown slope coefficient vector and function, and where the variables (X,Z) are endogeneous. We propose necessary and sufficient conditions for the identification of the parameters in the presence of instrumental variables. We also focus on the estimation of β. An incorrect parameterization of φ may generally lead to an inconsistent estimator of β, whereas even consistent nonparametric estimators for φ imply a slow rate of convergence of the estimator of β. An additional complication is that the solution of the equation necessitates the inversion of a compact operator that has to be estimated nonparametrically. In general this inversion is not stable, thus the estimation of β is ill-posed. In this paper, a √n-consistent estimator for β is derived under mild assumptions. One of these assumptions is given by the so-called source condition that is explicitly interprated in the paper. Finally we show that the estimator achieves the semiparametric efficiency bound, even if the model is heteroscedastic. Monte Carlo simulations demonstrate the reasonable performance of the estimation procedure on finite samples.
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
An exact corrected log-likelihood function for Cox's proportional hazards model under measurement error and some extensions
This paper studies Cox`s proportional hazards model under covariate measurement error. Nakamura`s (1990) methodology of corrected log-likelihood will be applied to the so called Breslow likelihood, which is, in the absence of measurement error, equivalent to partial likelihood. For a general error model with possibly heteroscedastic and non-normal additive measurement error, corrected estimators of the regression parameter as well as of the baseline hazard rate are obtained. The estimators proposed by Nakamura (1992), Kong, Huang and Li (1998) and Kong and Gu (1999) are reestablished in the special cases considered there. This sheds new light on these estimators and justifies them as exact corrected score estimators. Finally, the method will be extended to some variants of the Cox model
Quantile regression in partially linear varying coefficient models
Semiparametric models are often considered for analyzing longitudinal data
for a good balance between flexibility and parsimony. In this paper, we study a
class of marginal partially linear quantile models with possibly varying
coefficients. The functional coefficients are estimated by basis function
approximations. The estimation procedure is easy to implement, and it requires
no specification of the error distributions. The asymptotic properties of the
proposed estimators are established for the varying coefficients as well as for
the constant coefficients. We develop rank score tests for hypotheses on the
coefficients, including the hypotheses on the constancy of a subset of the
varying coefficients. Hypothesis testing of this type is theoretically
challenging, as the dimensions of the parameter spaces under both the null and
the alternative hypotheses are growing with the sample size. We assess the
finite sample performance of the proposed method by Monte Carlo simulation
studies, and demonstrate its value by the analysis of an AIDS data set, where
the modeling of quantiles provides more comprehensive information than the
usual least squares approach.Comment: Published in at http://dx.doi.org/10.1214/09-AOS695 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Survival Analysis of Microarray Data With Microarray Measurement Subject to Measurement Error
Microarray technology is essentially a measurement tool for measuring expressions of genes, and this measurement is subject to measurement error. Gene expressions could be employed as predictors for patient survival, and the measurement error involved in the gene expression is often ignored in the analysis of microarray data in the literature. Efforts are needed to establish statistical method for analyzing microarray data without ignoring the error in gene expression. A typical microarray data set has a large number of genes far exceeding the sample size. Proper selection of survival relevant genes contributes to an accurate prediction model. We study the effect of the measurement error on survival relevant gene selection under the accelerated failure time (AFT) model setting by regularizing weighted least square estimator with adaptive LASSO penalty. The simulation results and real data analysis show that ignoring measurement error will affect survival relevant gene selection. Simulation-Extrapolation (SIMEX) method is investigated to adjust the impact of measurement error to gene selection. The resulting model after adjustment is more accurate than the model selected by ignoring measurement error. Microarray experiments are often performed over a long period of time, and samples can be prepared and collected under different conditions. Moreover, different protocols or methodology may be applied in the experiment. All these factors contribute to a possibility of heteroscedastic measurement error associated with microarray data set. It is of practical importance to combine microarray data from different labs or platforms. We construct a prediction AFT model using data with heterogeneous covariate measurement error. Two variations of the SIMEX algorithm are investigated to adjust the effect of the mis-measured covariates. Simulation results show that the proposed method can achieve better prediction accuracy than the naive method. In this dissertation, the SIMEX method is used to adjust for the effects of covariate measurement error. This method is superior to other conventional methods in that it is not only more robust to distributional assumptions for error prone covariates, it also offers marked simplicity and flexibility for practical use. To implement this method, we developed an R package for general users
- …