15 research outputs found
Fast Penalized Regression and Cross Validation for Tall Data with the oem Package
A large body of research has focused on theory and computation for variable selection techniques for high dimensional data. There has been substantially less work in the big "tall" data paradigm, where the number of variables may be large, but the number of observations is much larger. The orthogonalizing expectation maximization (OEM) algorithm is one approach for computation of penalized models which excels in the big tall data regime. The oem package is an efficient implementation of the OEM algorithm which provides a multitude of computation routines with a focus on big tall data, such as a function for out-of-memory computation, for large-scale parallel computation of penalized regression models. Furthermore, in this paper we propose a specialized implementation of the OEM algorithm for cross validation, dramatically reducing the computing time for cross validation over a naive implementation
Subgroup Identification Using the personalized Package
A plethora of disparate statistical methods have been proposed for subgroup identification to help tailor treatment decisions for patients. However a majority of them do not have corresponding R packages and the few that do pertain to particular statistical methods or provide little means of evaluating whether meaningful subgroups have been found. Recently, the work of Chen, Tian, Cai, and Yu (2017) unified many of these subgroup identification methods into one general, consistent framework. The goal of the personalized package is to provide a corresponding unified software framework for subgroup identification analyses that provides not only estimation of subgroups, but evaluation of treatment effects within estimated subgroups. The personalized package allows for a variety of subgroup identification methods for many types of outcomes commonly encountered in medical settings. The package is built to incorporate the entire subgroup identification analysis pipeline including propensity score diagnostics, subgroup estimation, analysis of the treatment effects within subgroups, and evaluation of identified subgroups. In this framework, different methods can be accessed with little change in the analysis code. Similarly, new methods can easily be incorporated into the package. Besides familiar statistical models, the package also allows flexible machine learning tools to be leveraged in subgroup identification. Further estimation improvements can be obtained via efficiency augmentation
Enhancing modified treatment policy effect estimation with weighted energy distance
The effects of continuous treatments are often characterized through the
average dose response function, which is challenging to estimate from
observational data due to confounding and positivity violations. Modified
treatment policies (MTPs) are an alternative approach that aim to assess the
effect of a modification to observed treatment values and work under relaxed
assumptions. Estimators for MTPs generally focus on estimating the conditional
density of treatment given covariates and using it to construct weights.
However, weighting using conditional density models has well-documented
challenges. Further, MTPs with larger treatment modifications have stronger
confounding and no tools exist to help choose an appropriate modification
magnitude. This paper investigates the role of weights for MTPs showing that to
control confounding, weights should balance the weighted data to an unobserved
hypothetical target population, that can be characterized with observed data.
Leveraging this insight, we present a versatile set of tools to enhance
estimation for MTPs. We introduce a distance that measures imbalance of
covariate distributions under the MTP and use it to develop new weighting
methods and tools to aid in the estimation of MTPs. We illustrate our methods
through an example studying the effect of mechanical power of ventilation on
in-hospital mortality
Counterfactual fairness for small subgroups
While methods for measuring and correcting differential performance in risk
prediction models have proliferated in recent years, most existing techniques
can only be used to assess fairness across relatively large subgroups. The
purpose of algorithmic fairness efforts is often to redress discrimination
against groups that are both marginalized and small, so this sample size
limitation often prevents existing techniques from accomplishing their main
aim. We take a three-pronged approach to address the problem of quantifying
fairness with small subgroups. First, we propose new estimands built on the
"counterfactual fairness" framework that leverage information across groups.
Second, we estimate these quantities using a larger volume of data than
existing techniques. Finally, we propose a novel data borrowing approach to
incorporate "external data" that lacks outcomes and predictions but contains
covariate and group membership information. This less stringent requirement on
the external data allows for more possibilities for external data sources. We
demonstrate practical application of our estimators to a risk prediction model
used by a major Midwestern health system during the COVID-19 pandemic
A reluctant additive model framework for interpretable nonlinear individualized treatment rules
Individualized treatment rules (ITRs) for treatment recommendation is an
important topic for precision medicine as not all beneficial treatments work
well for all individuals. Interpretability is a desirable property of ITRs, as
it helps practitioners make sense of treatment decisions, yet there is a need
for ITRs to be flexible to effectively model complex biomedical data for
treatment decision making. Many ITR approaches either focus on linear ITRs,
which may perform poorly when true optimal ITRs are nonlinear, or black-box
nonlinear ITRs, which may be hard to interpret and can be overly complex. This
dilemma indicates a tension between interpretability and accuracy of treatment
decisions. Here we propose an additive model-based nonlinear ITR learning
method that balances interpretability and flexibility of the ITR. Our approach
aims to strike this balance by allowing both linear and nonlinear terms of the
covariates in the final ITR. Our approach is parsimonious in that the nonlinear
term is included in the final ITR only when it substantially improves the ITR
performance. To prevent overfitting, we combine cross-fitting and a specialized
information criterion for model selection. Through extensive simulations, we
show that our methods are data-adaptive to the degree of nonlinearity and can
favorably balance ITR interpretability and flexibility. We further demonstrate
the robust performance of our methods with an application to a cancer drug
sensitive study
Causally-interpretable meta-analysis: clearly-defined causal effects and two case studies
Meta-analysis is commonly used to combine results from multiple clinical
trials, but traditional meta-analysis methods do not refer explicitly to a
population of individuals to whom the results apply and it is not clear how to
use their results to assess a treatment's effect for a population of interest.
We describe recently-introduced causally-interpretable meta-analysis methods
and apply their treatment effect estimators to two individual-participant data
sets. These estimators transport estimated treatment effects from studies in
the meta-analysis to a specified target population using individuals'
potentially effect-modifying covariates. We consider different regression and
weighting methods within this approach and compare the results to traditional
aggregated-data meta-analysis methods. In our applications, certain versions of
the causally-interpretable methods performed somewhat better than the
traditional methods, but the latter generally did well. The
causally-interpretable methods offer the most promise when covariates modify
treatment effects and our results suggest that traditional methods work well
when there is little effect heterogeneity. The causally-interpretable approach
gives meta-analysis an appealing theoretical framework by relating an estimator
directly to a specific population and lays a solid foundation for future
developments.Comment: 31 pages, 2 figures Submitted to Research Synthesis Method
A method for comparing multiple imputation techniques: A case study on the U.S. national COVID cohort collaborative.
Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful for assessing associations between patients’ predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases, whose removal may introduce severe bias. Several multiple imputation algorithms have been proposed to attempt to recover the missing information under an assumed missingness mechanism. Each algorithm presents strengths and weaknesses, and there is currently no consensus on which multiple imputation algorithm works best in a given scenario. Furthermore, the selection of each algorithm’s pa- rameters and data-related modeling choices are also both crucial and challenging