5 research outputs found
Estimation of a regression spline sample selection model
It is often the case that an outcome of interest is observed for a restricted non-randomly selected sample of the population. In such a situation, standard statistical analysis yields biased results. This issue can be addressed using sample selection models which are based on the estimation of two regressions: a binary selection equation determining whether a particular statistical unit will be available in the outcome equation. Classic sample selection models assume a priori that continuous regressors have a pre-specified linear or non-linear relationship to the outcome, which can lead to erroneous conclusions. In the case of continuous response, methods in which covariate effects are modeled flexibly have been previously proposed, the most recent being based on a Bayesian Markov chain Monte Carlo approach. A frequentist counterpart which has the advantage of being computationally fast is introduced. The proposed algorithm is based on the penalized likelihood estimation framework. The construction of confidence intervals is also discussed. The empirical properties of the existing and proposed methods are studied through a simulation study. The approaches are finally illustrated by analyzing data from the RAND Health Insurance Experiment on annual health expenditures
Principal causal effect identification and surrogate endpoint evaluation by multiple trials
Principal stratification is a causal framework to analyze randomized
experiments with a post-treatment variable between the treatment and endpoint
variables. Because the principal strata defined by the potential outcomes of
the post-treatment variable are not observable, we generally cannot identify
the causal effects within principal strata. Motivated by a real data set of
phase III adjuvant colon clinical trials, we propose approaches to identifying
and estimating the principal causal effects via multiple trials. For the
identifiability, we remove the commonly-used exclusion restriction assumption
by stipulating that the principal causal effects are homogeneous across these
trials. To remove another commonly-used monotonicity assumption, we give a
necessary condition for the local identifiability, which requires at least
three trials. Applying our approaches to the data from adjuvant colon clinical
trials, we find that the commonly-used monotonicity assumption is untenable,
and disease-free survival with three-year follow-up is a valid surrogate
endpoint for overall survival with five-year follow-up, which satisfies both
the causal necessity and the causal sufficiency. We also propose a sensitivity
analysis approach based on Bayesian hierarchical models to investigate the
impact of the deviation from the homogeneity assumption
Comparing principal stratification and selection models in parametric causal inference with nonignorable missingness
Two approaches for dealing with ``endogenous selection'' problems when estimating causal
effects are considered. They are principal stratification and selection models. The main goal
is to highlight similarities and differences between the two approaches, by investigating
the different nature of their parametric hypotheses. The principal stratification approach
focuses on information contained in specific subgroups of units. The aim is to produce valid
inference conditional on such subgroups, without an a priori extension of the results to the
whole population. Selection models, on the contrary, aim at estimating parameters that
should be valid for the whole population, as if the data come from random sampling. A
simulation study is conducted to show their different performances, with data generating
processes coming from either approach. It is also argued that principal stratification is
able to suggest alternative identification strategies not always easily translatable into
assumptions of a selection model
Comparing principal stratification and selection models in parametric causal inference with nonignorable missingness
Two approaches for dealing with "endogenous selection" problems when estimating causal effects are considered. They are principal stratification and selection models. The main goal is to highlight similarities and differences between the two approaches, by investigating the different nature of their parametric hypotheses. The principal stratification approach focuses on information contained in specific subgroups of units. The aim is to produce valid inference conditional on such subgroups, without an a priori extension of the results to the whole population. Selection models, on the contrary, aim at estimating parameters that should be valid for the whole population, as if the data come from random sampling. A simulation study is conducted to show their different performances, with data generating processes coming from either approach. It is also argued that principal stratification is able to suggest alternative identification strategies not always easily translatable into assumptions of a selection model.