108 research outputs found

    Graphical displays for assessing covariate balance in matching studies

    Full text link
    Rationale, aims and objectivesAn essential requirement for ensuring the validity of outcomes in matching studies is that study groups are comparable on observed pre‐intervention characteristics. Investigators typically use numerical diagnostics, such as t‐tests, to assess comparability (referred to as ‘balance’). However, such diagnostics only test equality along one dimension (e.g. means in the case of t‐tests), and therefore do not adequately capture imbalances that may exist elsewhere in the distribution. Furthermore, these tests are generally sensitive to sample size, raising the concern that a reduction in power may be mistaken for an improvement in covariate balance. In this paper, we demonstrate the shortcomings of numerical diagnostics and demonstrate how visual displays provide a complete representation of the data to more robustly assess balance.MethodsWe generate artificial datasets specifically designed to demonstrate how widely used equality tests capture only a single‐dimension of the data and are sensitive to sample size. We then plot the covariate distributions using several graphical displays.ResultsAs expected, tests showing perfect covariate balance in means failed to reflect imbalances at higher moments (variances). However, these discrepancies were easily detected upon inspection of the graphic displays. Additionally, smaller sample sizes led to the appearance of covariate balance, when in fact it was a result of lower statistical power.ConclusionsGiven the limitations of numerical diagnostics, we advocate using graphical displays for assessing covariate balance and encourage investigators to provide such graphs when reporting balance statistics in their matching studies.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/110864/1/jep12297.pd

    Challenges to validity in single‐group interrupted time series analysis

    Full text link
    Rationale, aims and objectivesSingle‐group interrupted time series analysis (ITSA) is a popular evaluation methodology in which a single unit of observation is studied; the outcome variable is serially ordered as a time series, and the intervention is expected to “interrupt” the level and/or trend of the time series, subsequent to its introduction. The most common threat to validity is history—the possibility that some other event caused the observed effect in the time series. Although history limits the ability to draw causal inferences from single ITSA models, it can be controlled for by using a comparable control group to serve as the counterfactual.MethodTime series data from 2 natural experiments (effect of Florida’s 2000 repeal of its motorcycle helmet law on motorcycle fatalities and California’s 1988 Proposition 99 to reduce cigarette sales) are used to illustrate how history biases results of single‐group ITSA results—as opposed to when that group’s results are contrasted to those of a comparable control group.ResultsIn the first example, an external event occurring at the same time as the helmet repeal appeared to be the cause of a rise in motorcycle deaths, but was only revealed when Florida was contrasted with comparable control states. Conversely, in the second example, a decreasing trend in cigarette sales prior to the intervention raised question about a treatment effect attributed to Proposition 99, but was reinforced when California was contrasted with comparable control states.ConclusionsResults of single‐group ITSA should be considered preliminary, and interpreted with caution, until a more robust study design can be implemented.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/136442/1/jep12638_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/136442/2/jep12638.pd

    Improving causal inference with a doubly robust estimator that combines propensity score stratification and weighting

    Full text link
    Rationale, aims and objectivesWhen a randomized controlled trial is not feasible, health researchers typically use observational data and rely on statistical methods to adjust for confounding when estimating treatment effects. These methods generally fall into 3 categories: (1) estimators based on a model for the outcome using conventional regression adjustment; (2) weighted estimators based on the propensity score (ie, a model for the treatment assignment); and (3) “doubly robust” (DR) estimators that model both the outcome and propensity score within the same framework. In this paper, we introduce a new DR estimator that utilizes marginal mean weighting through stratification (MMWS) as the basis for weighted adjustment. This estimator may prove more accurate than treatment effect estimators because MMWS has been shown to be more accurate than other models when the propensity score is misspecified. We therefore compare the performance of this new estimator to other commonly used treatment effects estimators.MethodMonte Carlo simulation is used to compare the DR‐MMWS estimator to regression adjustment, 2 weighted estimators based on the propensity score and 2 other DR methods. To assess performance under varied conditions, we vary the level of misspecification of the propensity score model as well as misspecify the outcome model.ResultsOverall, DR estimators generally outperform methods that model one or the other components (eg, propensity score or outcome). The DR‐MMWS estimator outperforms all other estimators when both the propensity score and outcome models are misspecified and performs equally as well as other DR estimators when only the propensity score is misspecified.ConclusionsHealth researchers should consider using DR‐MMWS as the principal evaluation strategy in observational studies, as this estimator appears to outperform other estimators in its class.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/137762/1/jep12714_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/137762/2/jep12714.pd

    Assessing regression to the mean effects in health care initiatives

    Get PDF
    BACKGROUND: Interventions targeting individuals classified as “high-risk” have become common-place in health care. High-risk may represent outlier values on utilization, cost, or clinical measures. Typically, such individuals are invited to participate in an intervention intended to reduce their level of risk, and after a period of time, a follow-up measurement is taken. However, individuals initially identified by their outlier values will likely have lower values on re-measurement in the absence of an intervention. This statistical phenomenon is known as “regression to the mean” (RTM) and often leads to an inaccurate conclusion that the intervention caused the effect. Concerns about RTM are rarely raised in connection with most health care interventions, and it is uncommon to find evaluators who estimate its effect. This may be due to lack of awareness, cognitive biases that may cause people to systematically misinterpret RTM effects by creating (erroneous) explanations to account for it, or by design. METHODS: In this paper, the author fully describes the RTM phenomenon, and tests the accuracy of the traditional approach in calculating RTM assuming normality, using normally distributed data from a Monte Carlo simulation and skewed data from a control group in a pre-post evaluation of a health intervention. Confidence intervals are generated around the traditional RTM calculation to provide more insight into the potential magnitude of the bias introduced by RTM. Finally, suggestions are offered for designing interventions and evaluations to mitigate the effects of RTM. RESULTS: On multivariate normal data, the calculated RTM estimates are identical to true estimates. As expected, when using skewed data the calculated method underestimated the true RTM effect. Confidence intervals provide helpful guidance on the magnitude of the RTM effect. CONCLUSION: Decision-makers should always consider RTM to be a viable explanation of the observed change in an outcome in a pre-post study, and evaluators of health care initiatives should always take the appropriate steps to estimate the magnitude of the effect and control for it when possible. Regardless of the cause, failure to address RTM may result in wasteful pursuit of ineffective interventions, both at the organizational level and at the policy level

    A comparison of approaches for stratifying on the propensity score to reduce bias

    Full text link
    Rationale, aims, and objectivesStratification is a popular propensity score (PS) adjustment technique. It has been shown that stratifying the PS into 5 quantiles can remove over 90% of the bias due to the covariates used to generate the PS. Because of this finding, many investigators partition their data into 5 quantiles of the PS without examining whether a more robust solution (one that increases covariate balance while potentially reducing bias in the outcome analysis) can be found for their data. Two approaches (referred to herein as PSCORE and PSTRATA) obtain the optimal stratification solution by repeatedly dividing the data into strata until balance is achieved between treatment and control groups on the PS. These algorithms differ in how they partition the data, and it is not known which is better, or if either is better than a 5‐quantile default approach, for reducing bias in treatment effect estimates.MethodMonte Carlo simulations and empirical data are used to assess whether PS strata defined by PSCORE, PSTRATA, or 5 quantiles is best at reducing bias in treatment effect estimates, when used within a marginal mean weighting framework (MMWS). These estimates are further compared to results derived using inverse probability of treatment weights (IPTW).ResultsPSTRATA was slightly better than PSCORE in balancing covariates and reducing bias, while both approaches outperformed the 5‐quantile approach. Overall MMWS using any stratification method outperformed IPTW.ConclusionsInvestigators should routinely use stratification approaches that obtain the optimal stratification solution, rather than simply partitioning the data into 5 quantiles of the PS. Moreover, MMWS (in conjunction with an optimal stratification approach) should be considered as an alternative to IPTW in studies that use PS weights.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/137736/1/jep12701.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/137736/2/jep12701_am.pd

    Using data mining techniques to characterize participation in observational studies

    Full text link
    Data mining techniques are gaining in popularity among health researchers for an array of purposes, such as improving diagnostic accuracy, identifying high‐risk patients and extracting concepts from unstructured data. In this paper, we describe how these techniques can be applied to another area in the health research domain: identifying characteristics of individuals who do and do not choose to participate in observational studies. In contrast to randomized studies where individuals have no control over their treatment assignment, participants in observational studies self‐select into the treatment arm and therefore have the potential to differ in their characteristics from those who elect not to participate. These differences may explain part, or all, of the difference in the observed outcome, making it crucial to assess whether there is differential participation based on observed characteristics. As compared to traditional approaches to this assessment, data mining offers a more precise understanding of these differences. To describe and illustrate the application of data mining in this domain, we use data from a primary care‐based medical home pilot programme and compare the performance of commonly used classification approaches – logistic regression, support vector machines, random forests and classification tree analysis (CTA) – in correctly classifying participants and non‐participants. We find that CTA is substantially more accurate than the other models. Moreover, unlike the other models, CTA offers transparency in its computational approach, ease of interpretation via the decision rules produced and provides statistical results familiar to health researchers. Beyond their application to research, data mining techniques could help administrators to identify new candidates for participation who may most benefit from the intervention.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/134951/1/jep12515.pd

    Combining machine learning and propensity score weighting to estimate causal effects in multivalued treatments

    Full text link
    Rationale, aims and objectivesInterventions with multivalued treatments are common in medical and health research; examples include comparing the efficacy of competing interventions and contrasting various doses of a drug. In recent years, there has been growing interest in the development of methods that estimate multivalued treatment effects using observational data. This paper extends a previously described analytic framework for evaluating binary treatments to studies involving multivalued treatments utilizing a machine learning algorithm called optimal discriminant analysis (ODA).MethodWe describe the differences between regression‐based treatment effect estimators and effects estimated using the ODA framework. We then present an empirical example using data from an intervention including three study groups to compare corresponding effects.ResultsThe regression‐based estimators produced statistically significant mean differences between the two intervention groups, and between one of the treatment groups and controls. In contrast, ODA was unable to discriminate between distributions of any of the three study groups.ConclusionsOptimal discriminant analysis offers an appealing alternative to conventional regression‐based models for estimating effects in multivalued treatment studies because of its insensitivity to skewed data and use of accuracy measures applicable to all prognostic analyses. If these analytic approaches produce consistent treatment effect P values, this bolsters confidence in the validity of the results. If the approaches produce conflicting treatment effect P values, as they do in our empirical example, the investigator should consider the ODA‐derived estimates to be most robust, given that ODA uses permutation P values that require no distributional assumptions and are thus, always valid.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/135025/1/jep12610.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/135025/2/jep12610_am.pd

    Using classification tree analysis to generate propensity score weights

    Full text link
    Rationale, aims and objectivesIn evaluating non‐randomized interventions, propensity scores (PS) estimate the probability of assignment to the treatment group given observed characteristics. Machine learning algorithms have been proposed as an alternative to conventional logistic regression for modelling PS in order to avoid limitations of linear methods. We introduce classification tree analysis (CTA) to generate PS which is a “decision‐tree”‐like classification model that provides accurate, parsimonious decision rules that are easy to display and interpret, reports P values derived via permutation tests, and evaluates cross‐generalizability.MethodUsing empirical data, we identify all statistically valid CTA PS models and then use them to compute strata‐specific, observation‐level PS weights that are subsequently applied in outcomes analyses. We compare findings obtained using this framework to logistic regression and boosted regression, by evaluating covariate balance using standardized differences, model predictive accuracy, and treatment effect estimates obtained using median regression and a weighted CTA outcomes model.ResultsWhile all models had some imbalanced covariates, main‐effects logistic regression yielded the lowest average standardized difference, whereas CTA yielded the greatest predictive accuracy. Nevertheless, treatment effect estimates were generally consistent across all models.ConclusionsAssessing standardized differences in means as a test of covariate balance is inappropriate for machine learning algorithms that segment the sample into two or more strata. Because the CTA algorithm identifies all statistically valid PS models for a sample, it is most likely to identify a correctly specified PS model, and should be considered as an alternative approach to modeling the PS.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/137726/1/jep12744.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/137726/2/jep12744_am.pd

    Using machine learning to assess covariate balance in matching studies

    Full text link
    In order to assess the effectiveness of matching approaches in observational studies, investigators typically present summary statistics for each observed pre‐intervention covariate, with the objective of showing that matching reduces the difference in means (or proportions) between groups to as close to zero as possible. In this paper, we introduce a new approach to distinguish between study groups based on their distributions of the covariates using a machine‐learning algorithm called optimal discriminant analysis (ODA). Assessing covariate balance using ODA as compared with the conventional method has several key advantages: the ability to ascertain how individuals self‐select based on optimal (maximum‐accuracy) cut‐points on the covariates; the application to any variable metric and number of groups; its insensitivity to skewed data or outliers; and the use of accuracy measures that can be widely applied to all analyses. Moreover, ODA accepts analytic weights, thereby extending the assessment of covariate balance to any study design where weights are used for covariate adjustment. By comparing the two approaches using empirical data, we are able to demonstrate that using measures of classification accuracy as balance diagnostics produces highly consistent results to those obtained via the conventional approach (in our matched‐pairs example, ODA revealed a weak statistically significant relationship not detected by the conventional approach). Thus, investigators should consider ODA as a robust complement, or perhaps alternative, to the conventional approach for assessing covariate balance in matching studies.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/135124/1/jep12538_am.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/135124/2/jep12538.pd

    Combining machine learning and matching techniques to improve causal inference in program evaluation

    Full text link
    Rationale, aims and objectivesProgram evaluations often utilize various matching approaches to emulate the randomization process for group assignment in experimental studies. Typically, the matching strategy is implemented, and then covariate balance is assessed before estimating treatment effects. This paper introduces a novel analytic framework utilizing a machine learning algorithm called optimal discriminant analysis (ODA) for assessing covariate balance and estimating treatment effects, once the matching strategy has been implemented. This framework holds several key advantages over the conventional approach: application to any variable metric and number of groups; insensitivity to skewed data or outliers; and use of accuracy measures applicable to all prognostic analyses. Moreover, ODA accepts analytic weights, thereby extending the methodology to any study design where weights are used for covariate adjustment or more precise (differential) outcome measurement.MethodOne‐to‐one matching on the propensity score was used as the matching strategy. Covariate balance was assessed using standardized difference in means (conventional approach) and measures of classification accuracy (ODA). Treatment effects were estimated using ordinary least squares regression and ODA.ResultsUsing empirical data, ODA produced results highly consistent with those obtained via the conventional methodology for assessing covariate balance and estimating treatment effects.ConclusionsWhen ODA is combined with matching techniques within a treatment effects framework, the results are consistent with conventional approaches. However, given that it provides additional dimensions and robustness to the analysis versus what can currently be achieved using conventional approaches, ODA offers an appealing alternative.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/135101/1/jep12592.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/135101/2/jep12592_am.pd
    • 

    corecore