349 research outputs found
Covariate selection for multilevel models with missing data
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/135979/1/sta4133_am.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/135979/2/sta4133.pd
Multiple imputation and selection of ordinal level 2 predictors in multilevel models. An analysis of the relationship between student ratings and teacher beliefs and practices
The paper is motivated by the analysis of the relationship between ratings
and teacher practices and beliefs, which are measured via a set of binary and
ordinal items collected by a specific survey with nearly half missing
respondents. The analysis, which is based on a two-level random effect model,
must face two about the items measuring teacher practices and beliefs: (i)
these items level 2 predictors severely affected by missingness; (ii) there is
redundancy in the number of items and the number of categories of their
measurement scale. tackle the first issue by considering a multiple imputation
strategy based on information at both level 1 and level 2. For the second
issue, we consider regularization techniques for ordinal predictors, also
accounting for the multilevel data structure. The proposed solution combines
existing methods in an original way to solve specific problem at hand, but it
is generally applicable to settings requiring to select predictors affected by
missing values. The results obtained with the final model out that some teacher
practices and beliefs are significantly related to ratings about teacher
ability to motivate students.Comment: Presented at the 12th International Multilevel Conference is held
April 9-10, 2019 , Utrech
Penalized regressions for variable selection model, single index model and an analysis of mass spectrometry data.
The focus of this dissertation is to develop statistical methods, under the framework of penalized regressions, to handle three different problems. The first research topic is to address missing data problem for variable selection models including elastic net (ENet) method and sparse partial least squares (SPLS). I proposed a multiple imputation (MI) based weighted ENet (MI-WENet) method based on the stacked MI data and a weighting scheme for each observation. Numerical simulations were implemented to examine the performance of the MIWENet method, and compare it with competing alternatives. I then applied the MI-WENet method to examine the predictors for the endothelial function characterized by median effective dose and maximum effect in an ex-vivo experiment. The second topic is to develop monotonic single-index models for assessing drug interactions. In single-index models, the link function f is unnecessary monotonic. However, in combination drug studies, it is desired to have a monotonic link function f . I proposed to estimate f by using penalized splines with I-spline basis. An algorithm for estimating f and the parameter a in the index was developed. Simulation studies were conducted to examine the performance of the proposed models in term of accuracy in estimating f and a. Moreover, I applied the proposed method to examine the drug interaction of two drugs in a real case study. The third topic was focused on the SPLS and ENet based accelerated failure time (AFT) models for predicting patient survival time with mass spectrometry (MS) data. A typical MS data set contains limited number of spectra, while each spectrum contains tens of thousands of intensity measurements representing an unknown number of peptide peaks as the key features of interest. Due to the high dimension and high correlations among features, traditional linear regression modeling is not applicable. Semi-parametric AFT model with an unspecified error distribution is a well-accepted approach in survival analysis. To reduce the bias caused in denoising step, we proposed a nonparametric imputation approach based on Kaplan-Meier estimator. Numerical simulations and a real case study were conducted under the proposed method
Multiple imputation with compatibility for high-dimensional data
Multiple Imputation (MI) is always challenging in high dimensional settings. The imputation model with some selected number of predictors can be incompatible with the analysis model leading to inconsistent and biased estimates. Although compatibility in such cases may not be achieved, but one can obtain consistent and unbiased estimates using a semi-compatible imputation model. We propose to relax the lasso penalty for selecting a large set of variables (at most n). The substantive model that also uses some formal variable selection procedure in high-dimensional structures is then expected to be nested in this imputation model. The resulting imputation model will be semi-compatible with high probability. The likelihood estimates can be unstable and can face the convergence issues as the number of variables becomes nearly as large as the sample size. To address these issues, we further propose to use a ridge penalty for obtaining the posterior distribution of the parameters based on the observed data. The proposed technique is compared with the standard MI software and MI techniques available for high-dimensional data in simulation studies and a real life dataset. Our results exhibit the superiority of the proposed approach to the existing MI approaches while addressing the compatibility issue
Calibrated Bayes, for Statistics in General, and Missing Data in Particular
It is argued that the Calibrated Bayesian (CB) approach to statistical
inference capitalizes on the strength of Bayesian and frequentist approaches to
statistical inference. In the CB approach, inferences under a particular model
are Bayesian, but frequentist methods are useful for model development and
model checking. In this article the CB approach is outlined. Bayesian methods
for missing data are then reviewed from a CB perspective. The basic theory of
the Bayesian approach, and the closely related technique of multiple
imputation, is described. Then applications of the Bayesian approach to normal
models are described, both for monotone and nonmonotone missing data patterns.
Sequential Regression Multivariate Imputation and Penalized Spline of
Propensity Models are presented as two useful approaches for relaxing
distributional assumptions.Comment: Published in at http://dx.doi.org/10.1214/10-STS318 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Missing covariates in logistic regression, estimation and distribution selection.
We derive explicit formulae for estimation in logistic regression models where some of the covariates are missing. Our approach allows for modeling the distribution of the missing covariates either as a multivariate normal or multivariate t-distribution. A main advantage of this method is that it is fast and does not require the use of iterative procedures. A model selection method is derived which allows to choose amongst these distributions. In addition we consider versions of AIC that are based on the EM algorithm and on multiple imputation methods that have a wide applicability to model selection in likelihood models in general.Akaike information criterion; Likelihood model; Logistic regression; Missing covariates; Model selection; Multiple imputation; t-distribution;
- …