55 research outputs found

    How to conduct a systematic review and meta-analysis of prognostic model studies

    Get PDF
    Background: Prognostic models are typically developed to estimate the risk that an individual in a particular health state will develop a particular health outcome, to support (shared) decision making. Systematic reviews of prognostic model studies can help identify prognostic models that need to further be validated or are ready to be implemented in healthcare. Objectives: To provide a step-by-step guidance on how to conduct and read a systematic review of prognostic model studies and to provide an overview of methodology and guidance available for every step of the review progress. Sources: Published, peer-reviewed guidance articles. Content: We describe the following steps for conducting a systematic review of prognosis studies: 1) Developing the review question using the Population, Index model, Comparator model, Outcome(s), Timing, Setting format, 2) Searching and selection of articles, 3) Data extraction using the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist, 4) Quality and risk of bias assessment using the Prediction model Risk Of Bias ASsessment (PROBAST) tool, 5) Analysing data and undertaking quantitative meta-analysis, and 6) Presenting summary of findings, interpreting results, and drawing conclusions. Guidance for each step is described and illustrated using a case study on prognostic models for patients with COVID-19. Implications: Guidance for conducting a systematic review of prognosis studies is available, but the implications of these reviews for clinical practice and further research highly depend on complete reporting of primary studies

    Assessing the quality of prediction models in health care using the Prediction model Risk Of Bias ASsessment Tool (PROBAST): an evaluation of its use and practical application

    Get PDF
    BACKGROUND AND OBJECTIVES: Since 2019, the Prediction model Risk Of Bias ASsessment Tool (PROBAST; www.probast.org) has supported methodological quality assessments of prediction model studies. Most prediction model studies are rated with a "High" risk of bias (ROB) and researchers report low interrater reliability (IRR) using PROBAST. We aimed to (1) assess the IRR of PROBAST ratings between assessors of the same study and understand reasons for discrepancies, (2) determine which items contribute most to domain-level ROB ratings, and (3) explore the impact of consensus meetings. STUDY DESIGN AND SETTING: We used PROBAST assessments from a systematic review of diagnostic and prognostic COVID-19 prediction models as a case study. Assessors included international experts in prediction model studies or their reviews. We assessed IRR using prevalence-adjusted bias-adjusted kappa (PABAK) before consensus meetings, examined bias ratings per domain-level ROB judgments, and evaluated the impact of consensus meetings by identifying rating changes after discussion. RESULTS: We analyzed 2167 PROBAST assessments from 27 assessor pairs covering 760 prediction models: 384 developments, 242 validations, and 134 mixed assessments (including both). The IRR using PABAK was higher for overall ROB judgments (development: 0.82 [0.76; 0.89]; validation: 0.78 [0.68; 0.88]) compared to domain- and item-level judgments. Some PROBAST items frequently contributed to domain-level ROB judgments, eg, 3.5 Outcome blinding and 4.1 Sample size. Consensus discussions mainly led to item-level and never to overall ROB rating changes. CONCLUSION: Within this case study, PROBAST assessments received high IRR at the overall ROB level, with some variation at item- and domain-level. To reduce variability, PROBAST assessors should standardize item- and domain-level judgments and hold well-structured consensus meetings between assessors of the same study. PLAIN LANGUAGE SUMMARY: The Prediction model Risk Of Bias ASsessment Tool (PROBAST; www.probast.org) provides a set of items to assess the quality of medical studies on so-called prediction tools that calculate an individual's probability of having or developing a certain disease or health outcome. Previous research found low interrater reliability (IRR; ie, how consistently two assessors rate aspects of the same study) when using PROBAST. To understand why this is the case, we conducted a large study involving more than 30 experts from around the world, all of whom applied PROBAST to the same set of prediction tool studies. Based on more than 2150 PROBAST assessments, we identified which PROBAST items led to the most disagreements between raters, explored reasons for these disagreements, and examined whether the use of so-called consensus meetings (ie, different assessors of the same study discuss their ratings and decide on a finalized rating) impacted PROBAST ratings. Our study found that the IRR between different assessors of the same study was higher than previously reported. One explanation for the better agreement compared to previous research may be the preplanning on how to assess certain PROBAST aspects before starting the assessments, as well as holding well-structured consensus meetings. These improvements lead to a more effective use of PROBAST in evaluating the trustworthiness and quality of prediction tools in the health-care domain

    Validation of prognostic models predicting mortality or ICU admission in patients with COVID-19 in low- and middle-income countries: a global individual participant data meta-analysis

    Get PDF
    BACKGROUND: We evaluated the performance of prognostic models for predicting mortality or ICU admission in hospitalized patients with COVID-19 in the World Health Organization (WHO) Global Clinical Platform, a repository of individual-level clinical data of patients hospitalized with COVID-19, including in low- and middle-income countries (LMICs). METHODS: We identified eligible multivariable prognostic models for predicting overall mortality and ICU admission during hospital stay in patients with confirmed or suspected COVID-19 from a living review of COVID-19 prediction models. These models were evaluated using data contributed to the WHO Global Clinical Platform for COVID-19 from nine LMICs (Burkina Faso, Cameroon, Democratic Republic of Congo, Guinea, India, Niger, Nigeria, Zambia, and Zimbabwe). Model performance was assessed in terms of discrimination and calibration. RESULTS: Out of 144 eligible models, 140 were excluded due to a high risk of bias, predictors unavailable in LIMCs, or insufficient model description. Among 11,338 participants, the remaining models showed good discrimination for predicting in-hospital mortality (3 models), with areas under the curve (AUCs) ranging between 0.76 (95% CI 0.71-0.81) and 0.84 (95% CI 0.77-0.89). An AUC of 0.74 (95% CI 0.70-0.78) was found for predicting ICU admission risk (one model). All models showed signs of miscalibration and overfitting, with extensive heterogeneity between countries. CONCLUSIONS: Among the available COVID-19 prognostic models, only a few could be validated on data collected from LMICs, mainly due to limited predictor availability. Despite their discriminative ability, selected models for mortality prediction or ICU admission showed varying and suboptimal calibration

    Risk of bias assessments in individual participant data meta-analyses of test accuracy and prediction models:a review shows improvements are needed

    Get PDF
    OBJECTIVES: Risk of bias assessments are important in meta-analyses of both aggregate and individual participant data (IPD). There is limited evidence on whether and how risk of bias of included studies or datasets in IPD meta-analyses (IPDMAs) is assessed. We review how risk of bias is currently assessed, reported, and incorporated in IPDMAs of test accuracy and clinical prediction model studies and provide recommendations for improvement.STUDY DESIGN AND SETTING: We searched PubMed (January 2018-May 2020) to identify IPDMAs of test accuracy and prediction models, then elicited whether each IPDMA assessed risk of bias of included studies and, if so, how assessments were reported and subsequently incorporated into the IPDMAs.RESULTS: Forty-nine IPDMAs were included. Nineteen of 27 (70%) test accuracy IPDMAs assessed risk of bias, compared to 5 of 22 (23%) prediction model IPDMAs. Seventeen of 19 (89%) test accuracy IPDMAs used Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2), but no tool was used consistently among prediction model IPDMAs. Of IPDMAs assessing risk of bias, 7 (37%) test accuracy IPDMAs and 1 (20%) prediction model IPDMA provided details on the information sources (e.g., the original manuscript, IPD, primary investigators) used to inform judgments, and 4 (21%) test accuracy IPDMAs and 1 (20%) prediction model IPDMA provided information or whether assessments were done before or after obtaining the IPD of the included studies or datasets. Of all included IPDMAs, only seven test accuracy IPDMAs (26%) and one prediction model IPDMA (5%) incorporated risk of bias assessments into their meta-analyses. For future IPDMA projects, we provide guidance on how to adapt tools such as Prediction model Risk Of Bias ASsessment Tool (for prediction models) and QUADAS-2 (for test accuracy) to assess risk of bias of included primary studies and their IPD.CONCLUSION: Risk of bias assessments and their reporting need to be improved in IPDMAs of test accuracy and, especially, prediction model studies. Using recommended tools, both before and after IPD are obtained, will address this.</p

    Erratum to: Methods for evaluating medical tests and biomarkers

    Get PDF
    [This corrects the article DOI: 10.1186/s41512-016-0001-y.]

    Completeness of reporting of clinical prediction models developed using supervised machine learning: A systematic review

    Get PDF
    ABSTRACTObjectiveWhile many studies have consistently found incomplete reporting of regression-based prediction model studies, evidence is lacking for machine learning-based prediction model studies. We aim to systematically review the adherence of Machine Learning (ML)-based prediction model studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Statement.Study design and settingWe included articles reporting on development or external validation of a multivariable prediction model (either diagnostic or prognostic) developed using supervised ML for individualized predictions across all medical fields (PROSPERO, CRD42019161764). We searched PubMed from 1 January 2018 to 31 December 2019. Data extraction was performed using the 22-item checklist for reporting of prediction model studies (www.TRIPOD-statement.org). We measured the overall adherence per article and per TRIPOD item.ResultsOur search identified 24 814 articles, of which 152 articles were included: 94 (61.8%) prognostic and 58 (38.2%) diagnostic prediction model studies. Overall, articles adhered to a median of 38.7% (IQR 31.0-46.4) of TRIPOD items. No articles fully adhered to complete reporting of the abstract and very few reported the flow of participants (3.9%, 95% CI 1.8 to 8.3), appropriate title (4.6%, 95% CI 2.2 to 9.2), blinding of predictors (4.6%, 95% CI 2.2 to 9.2), model specification (5.2%, 95% CI 2.4 to 10.8), and model’s predictive performance (5.9%, 95% CI 3.1 to 10.9). There was often complete reporting of source of data (98.0%, 95% CI 94.4 to 99.3) and interpretation of the results (94.7%, 95% CI 90.0 to 97.3).ConclusionSimilar to prediction model studies developed using conventional regression-based techniques, the completeness of reporting is poor. Essential information to decide to use the model (i.e. model specification and its performance) is rarely reported. However, some items and sub-items of TRIPOD might be less suitable for ML-based prediction model studies and thus, TRIPOD requires extensions. Overall, there is an urgent need to improve the reporting quality and usability of research to avoid research waste.What is new?Key findings: Similar to prediction model studies developed using regression techniques, machine learning (ML)-based prediction model studies adhered poorly to the TRIPOD statement, the current standard reporting guideline.What this adds to what is known? In addition to efforts to improve the completeness of reporting in ML-based prediction model studies, an extension of TRIPOD for these type of studies is needed.What is the implication, what should change now? While TRIPOD-AI is under development, we urge authors to follow the recommendations of the TRIPOD statement to improve the completeness of reporting and reduce potential research waste of ML-based prediction model studies.</jats:sec

    Prediction models for diagnosis and prognosis of covid-19: : systematic review and critical appraisal

    Get PDF
    Readers’ note This article is a living systematic review that will be updated to reflect emerging evidence. Updates may occur for up to two years from the date of original publication. This version is update 3 of the original article published on 7 April 2020 (BMJ 2020;369:m1328). Previous updates can be found as data supplements (https://www.bmj.com/content/369/bmj.m1328/related#datasupp). When citing this paper please consider adding the update number and date of access for clarity. Funding: LW, BVC, LH, and MDV acknowledge specific funding for this work from Internal Funds KU Leuven, KOOR, and the COVID-19 Fund. LW is a postdoctoral fellow of Research Foundation-Flanders (FWO) and receives support from ZonMw (grant 10430012010001). BVC received support from FWO (grant G0B4716N) and Internal Funds KU Leuven (grant C24/15/037). TPAD acknowledges financial support from the Netherlands Organisation for Health Research and Development (grant 91617050). VMTdJ was supported by the European Union Horizon 2020 Research and Innovation Programme under ReCoDID grant agreement 825746. KGMM and JAAD acknowledge financial support from Cochrane Collaboration (SMF 2018). KIES is funded by the National Institute for Health Research (NIHR) School for Primary Care Research. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health and Social Care. GSC was supported by the NIHR Biomedical Research Centre, Oxford, and Cancer Research UK (programme grant C49297/A27294). JM was supported by the Cancer Research UK (programme grant C49297/A27294). PD was supported by the NIHR Biomedical Research Centre, Oxford. MOH is supported by the National Heart, Lung, and Blood Institute of the United States National Institutes of Health (grant R00 HL141678). ICCvDH and BCTvB received funding from Euregio Meuse-Rhine (grant Covid Data Platform (coDaP) interref EMR187). The funders played no role in study design, data collection, data analysis, data interpretation, or reporting.Peer reviewedPublisher PD

    Erratum to: Methods for evaluating medical tests and biomarkers

    Get PDF
    [This corrects the article DOI: 10.1186/s41512-016-0001-y.]

    Evidence synthesis to inform model-based cost-effectiveness evaluations of diagnostic tests: a methodological systematic review of health technology assessments

    Get PDF
    Background: Evaluations of diagnostic tests are challenging because of the indirect nature of their impact on patient outcomes. Model-based health economic evaluations of tests allow different types of evidence from various sources to be incorporated and enable cost-effectiveness estimates to be made beyond the duration of available study data. To parameterize a health-economic model fully, all the ways a test impacts on patient health must be quantified, including but not limited to diagnostic test accuracy. Methods: We assessed all UK NIHR HTA reports published May 2009-July 2015. Reports were included if they evaluated a diagnostic test, included a model-based health economic evaluation and included a systematic review and meta-analysis of test accuracy. From each eligible report we extracted information on the following topics: 1) what evidence aside from test accuracy was searched for and synthesised, 2) which methods were used to synthesise test accuracy evidence and how did the results inform the economic model, 3) how/whether threshold effects were explored, 4) how the potential dependency between multiple tests in a pathway was accounted for, and 5) for evaluations of tests targeted at the primary care setting, how evidence from differing healthcare settings was incorporated. Results: The bivariate or HSROC model was implemented in 20/22 reports that met all inclusion criteria. Test accuracy data for health economic modelling was obtained from meta-analyses completely in four reports, partially in fourteen reports and not at all in four reports. Only 2/7 reports that used a quantitative test gave clear threshold recommendations. All 22 reports explored the effect of uncertainty in accuracy parameters but most of those that used multiple tests did not allow for dependence between test results. 7/22 tests were potentially suitable for primary care but the majority found limited evidence on test accuracy in primary care settings. Conclusions: The uptake of appropriate meta-analysis methods for synthesising evidence on diagnostic test accuracy in UK NIHR HTAs has improved in recent years. Future research should focus on other evidence requirements for cost-effectiveness assessment, threshold effects for quantitative tests and the impact of multiple diagnostic tests
    corecore