4,347 research outputs found
Recommended from our members
Alternative causal inference methods in population health research: Evaluating tradeoffs and triangulating evidence.
Population health researchers from different fields often address similar substantive questions but rely on different study designs, reflecting their home disciplines. This is especially true in studies involving causal inference, for which semantic and substantive differences inhibit interdisciplinary dialogue and collaboration. In this paper, we group nonrandomized study designs into two categories: those that use confounder-control (such as regression adjustment or propensity score matching) and those that rely on an instrument (such as instrumental variables, regression discontinuity, or differences-in-differences approaches). Using the Shadish, Cook, and Campbell framework for evaluating threats to validity, we contrast the assumptions, strengths, and limitations of these two approaches and illustrate differences with examples from the literature on education and health. Across disciplines, all methods to test a hypothesized causal relationship involve unverifiable assumptions, and rarely is there clear justification for exclusive reliance on one method. Each method entails trade-offs between statistical power, internal validity, measurement quality, and generalizability. The choice between confounder-control and instrument-based methods should be guided by these tradeoffs and consideration of the most important limitations of previous work in the area. Our goals are to foster common understanding of the methods available for causal inference in population health research and the tradeoffs between them; to encourage researchers to objectively evaluate what can be learned from methods outside one's home discipline; and to facilitate the selection of methods that best answer the investigator's scientific questions
On the uses and abuses of regression models: a call for reform of statistical practice and teaching
When students and users of statistical methods first learn about regression
analysis there is an emphasis on the technical details of models and estimation
methods that invariably runs ahead of the purposes for which these models might
be used. More broadly, statistics is widely understood to provide a body of
techniques for "modelling data", underpinned by what we describe as the "true
model myth", according to which the task of the statistician/data analyst is to
build a model that closely approximates the true data generating process. By
way of our own historical examples and a brief review of mainstream clinical
research journals, we describe how this perspective leads to a range of
problems in the application of regression methods, including misguided
"adjustment" for covariates, misinterpretation of regression coefficients and
the widespread fitting of regression models without a clear purpose. We then
outline an alternative approach to the teaching and application of regression
methods, which begins by focussing on clear definition of the substantive
research question within one of three distinct types: descriptive, predictive,
or causal. The simple univariable regression model may be introduced as a tool
for description, while the development and application of multivariable
regression models should proceed differently according to the type of question.
Regression methods will no doubt remain central to statistical practice as they
provide a powerful tool for representing variation in a response or outcome
variable as a function of "input" variables, but their conceptualisation and
usage should follow from the purpose at hand.Comment: 24 pages main document including 3 figures, plus 15 pages
supplementary material. Based on plenary lecture (President's Invited
Speaker) delivered to ISCB43, Newcastle, UK, August 2022. Submitted for
publication 12-Sep-2
Challenges of translating epidemiologic research: An application to rheumatic and musculoskeletal disorders
Translation of research into public health policy is featured in common definitions of epidemiology, as an end result of scientific discovery on disease occurrence and causes. This dual nature of epidemiology, which brings together discovery and its use, seems to imply two main dimensions by which to judge epidemiologic research: technical or field-specific quality and societal value. This paper uses our research on the epidemiology of rheumatic and musculoskeletal disorders as a starting point to discuss the interface between these dimensions, exploring a number of conceptual, practical and ethical challenges that epidemiologists increasingly need to address when aiming for research translation. Those include not only the appraisal of the technical quality of research, which is familiar to researchers, but also the judgement on the usefulness and actual use of knowledge, as well as the assessment of the legitimacy of research based on translation potential. Several challenges lie ahead, but interdisciplinary conceptual and technical developments have the potential to guide future epidemiologic research of consequence. Approaches that recognize complexity and formalize the involvement of stakeholders in the research process within transparent frameworks open promising avenues for an effective translation of epidemiologic research projected into the future.Research that led to this paper was funded by the European Regional Development Fund (ERDF), through COMPETE 2020 Operational Programme ‘Competitiveness and Internationalization’ together with national funding from the Foundation for Science and Technology (FCT) - Portuguese Ministry of Science, Technology and Higher Education - through the project “STEPACHE - The pediatric roots of amplified pain: from contextual influences to risk stratification” (POCI-01-0145-FEDER-029087, info:eu-repo/grantAgreement/FCT/9471 - RIDTI/PTDC/SAU-EPI/29087/2017/PT) and by the Epidemiology Research Unit - Instituto de Saúde Pública, Universidade do Porto (EPIUnit) (POCI-01-0145-FEDER-006862; UID/DTP/04750/2019), Administração Regional de Saúde Norte (Regional Department of the Portuguese Ministry of Health) and Calouste Gulbenkian Foundation. This work was also supported by a research grant from FOREUM Foundation for Research in Rheumatology (Career Research Grant)
Risk of Bias Assessments and Evidence Syntheses for Observational Epidemiologic Studies of Environmental and Occupational Exposures: Strengths and Limitations.
BACKGROUND: Increasingly, risk of bias tools are used to evaluate epidemiologic studies as part of evidence synthesis (evidence integration), often involving meta-analyses. Some of these tools consider hypothetical randomized controlled trials (RCTs) as gold standards. METHODS: We review the strengths and limitations of risk of bias assessments, in particular, for reviews of observational studies of environmental exposures, and we also comment more generally on methods of evidence synthesis. RESULTS: Although RCTs may provide a useful starting point to think about bias, they do not provide a gold standard for environmental studies. Observational studies should not be considered inherently biased vs. a hypothetical RCT. Rather than a checklist approach when evaluating individual studies using risk of bias tools, we call for identifying and quantifying possible biases, their direction, and their impacts on parameter estimates. As is recognized in many guidelines, evidence synthesis requires a broader approach than simply evaluating risk of bias in individual studies followed by synthesis of studies judged unbiased, or with studies given more weight if judged less biased. It should include the use of classical considerations for judging causality in human studies, as well as triangulation and integration of animal and mechanistic data. CONCLUSIONS: Bias assessments are important in evidence synthesis, but we argue they can and should be improved to address the concerns we raise here. Simplistic, mechanical approaches to risk of bias assessments, which may particularly occur when these tools are used by nonexperts, can result in erroneous conclusions and sometimes may be used to dismiss important evidence. Evidence synthesis requires a broad approach that goes beyond assessing bias in individual human studies and then including a narrow range of human studies judged to be unbiased in evidence synthesis. https://doi.org/10.1289/EHP6980
Development of a prognostic model for Macrophage Activation Syndrome in Systemic Juvenile Idiopathic Arthritis
Introduction:
Macrophage activation syndrome (MAS) is a potentially life-threatening complication of systemic juvenile idiopathic arthritis (SJIA) characterized by heterogeneous organ involvement and severity. Early identification of patients at high risk of complicated clinical course may improve outcome by helping initiate prompt, appropriate immunosuppressive and supportive treatments. Yet, despite recent progress in clarifying the underlying immunological mechanisms, factors driving organ damage and severe outcome are not entirely understood, nor has the prognostic value of routinely gathered clinical and laboratory factors been fully explored.
Objectives:
To develop a prognostic model for SJIA-MAS based on routinely available parameters at disease onset, accounting for patient heterogeneity, possible latent factors, non-linear relationships and confounders.
Methods:
We examined a retrospective multinational cohort of 362 patients diagnosed with SJIA-MAS. The relationships between demographic, laboratory features at MAS onset (such as hemoglobin, whole blood cells, platelets, ERS, CRP, AST, ALT, bilirubin, fibrinogen, d-dimer, ferritin and creatinine), therapeutic interventions and outcomes were analyzed. Outcomes of interest included a \u201csevere course\u201d (defined as ICU admission or death), occurring of organs failure and CSN dysfunction. To identify potential phenotypes related to clinical features and outcome, we explored laboratory parameter patterns at MAS onset through Latent class modeling, which detects multiple unobserved clusters in heterogeneous populations. A structural causal approach was then used for investigating causal pathways leading to severe outcomes. Directed acyclic graphs (DAGs) were employed to depict possible causal relationships between the candidate biomarkers, potential confounding variables, and the outcomes, and inform the choice of adjustment sets in multivariate regression models. We assessed the possible relationships between variables and outcomes by penalized likelihood logistic regression and identified optimal cut off points for prognostic factors using Multiple Adaptive Regression Splines (MARS) and Classification and Regression Trees (CART). To account for possible treatment confounders, the effect of cyclosporine and etoposide use on outcomes was estimated using augmented inverse probability weighting (IPW) with double robust methods. Finally, results from previous analyses were incorporated in a probabilistic framework through a Bayesian network (BN) model, which provides risk estimates for specific clinical scenarios and quantifies the amount of information contributed from the identified prognostic variables.
Results:
The latent class model revealed six clusters based on biomarkers at MAS onset, characterized by the following features: mild alterations of white blood cells, platelets, fibrinogen, d-dimer and ferritin values, considered the baseline type (cluster 1, n =115); hyperferritinemia with low organs involvement (cluster 2, n = 101); elevation of inflammatory markers (cluster 3, n =51); hepatobiliary involvement (cluster 4, n = 41); severe pancytopenia, liver and kidney failure with higher elevation of LDH, d-dimer, ferritin (cluster 5, n = 30); biliary and renal dysfunction (cluster 6, n = 24). Cluster 2 and 3 presented lower age and SJIA duration at MAS onset compared to other subgroups. Cluster membership was predictive of severe course (p<0.001), CSN involvement (p<0.001), Hemorrhagic complications (p <0.001) and Heart failure (p<0.001), with patients in cluster 5 showing the highest risk of severe course and heart failure, and increased occurrence of CNS and Hemorrhagic manifestations in both cluster 5 and 6. In multivariate regression models, parameters at onset associated with risk of severe course were creatinine (OR 1,6 [95% CI 1.13\u20132.3]; p = 0.008) and albumin levels (OR 0,65 [95% CI 0.44\u20130.98]; p = 0.044) Higher risk of CNS involvement was found for patients younger at MAS onset (OR 0,62 [95% CI 0.42\u20130.92]; p = 0.018). Na (OR 0.0,89 [95% CI 0.82\u20130.96]; p = 0.006) and creatinine values (OR 1.69 [95% CI 1.14\u20132.5]; p = 0.009) were identified as independent predictors of mortality. There was no evidence for an effect of etoposide (OR 1.03 [95% CI 0.91\u20131.12]) and cyclosporine (OR 1.04 [95% CI 0.92\u20131.19]) on severe course. BNs defined distinct groups with different probability of severe outcomes, achieving a c-index of 0.76 for mortality, 0.81 for severe course and 0.81 for CNS involvement. Adding the obtained latent clusters to the BN model increased the prediction accuracy for severe course up to a c-index of 0.83. Based on information theory metrics (mutual information) from the BN model, decision algorithms for each outcome and a web-based decision support tool for external users were implemented.
Conclusions:
We developed a probabilistic prognostic model of SJIA-MAS based on routinely available data. This stratification tool may facilitate informed decision-making about the clinical management of these patients. The probabilistic and information-theoretic approach offers a framework for further validation, expansion and integration of the model with emerging molecular biomarkers
Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments
We propose strategies to estimate and make inference on key features of
heterogeneous effects in randomized experiments. These key features include
best linear predictors of the effects using machine learning proxies, average
effects sorted by impact groups, and average characteristics of most and least
impacted units. The approach is valid in high dimensional settings, where the
effects are proxied by machine learning methods. We post-process these proxies
into the estimates of the key features. Our approach is generic, it can be used
in conjunction with penalized methods, deep and shallow neural networks,
canonical and new random forests, boosted trees, and ensemble methods. It does
not rely on strong assumptions. In particular, we don't require conditions for
consistency of the machine learning methods. Estimation and inference relies on
repeated data splitting to avoid overfitting and achieve validity. For
inference, we take medians of p-values and medians of confidence intervals,
resulting from many different data splits, and then adjust their nominal level
to guarantee uniform validity. This variational inference method is shown to be
uniformly valid and quantifies the uncertainty coming from both parameter
estimation and data splitting. We illustrate the use of the approach with two
randomized experiments in development on the effects of microcredit and nudges
to stimulate immunization demand.Comment: 53 pages, 6 figures, 15 table
- …