18 research outputs found
Dementia prediction in the general population using clinically accessible variables: a proof-of-concept study using machine learning. The AGES-Reykjavik study
BACKGROUND: Early identification of dementia is crucial for prompt intervention for high-risk individuals in the general population. External validation studies on prognostic models for dementia have highlighted the need for updated models. The use of machine learning in dementia prediction is in its infancy and may improve predictive performance. The current study aimed to explore the difference in performance of machine learning algorithms compared to traditional statistical techniques, such as logistic and Cox regression, for prediction of all-cause dementia. Our secondary aim was to assess the feasibility of only using clinically accessible predictors rather than MRI predictors. METHODS: Data are from 4,793 participants in the population-based AGES-Reykjavik Study without dementia or mild cognitive impairment at baseline (mean age: 76 years, % female: 59%). Cognitive, biometric, and MRI assessments (total: 59 variables) were collected at baseline, with follow-up of incident dementia diagnoses for a maximum of 12 years. Machine learning algorithms included elastic net regression, random forest, support vector machine, and elastic net Cox regression. Traditional statistical methods for comparison were logistic and Cox regression. Model 1 was fit using all variables and model 2 was after feature selection using the Boruta package. A third model explored performance when leaving out neuroimaging markers (clinically accessible model). Ten-fold cross-validation, repeated ten times, was implemented during training. Upsampling was used to account for imbalanced data. Tuning parameters were optimized for recalibration automatically using the caret package in R. RESULTS: 19% of participants developed all-cause dementia. Machine learning algorithms were comparable in performance to logistic regression in all three models. However, a slight added performance was observed in the elastic net Cox regression in the third model (c = 0.78, 95% CI: 0.78-0.78) compared to the traditional Cox regression (c = 0.75, 95% CI: 0.74-0.77). CONCLUSIONS: Supervised machine learning only showed added benefit when using survival techniques. Removing MRI markers did not significantly worsen our model's performance. Further, we presented the use of a nomogram using machine learning methods, showing transportability for the use of machine learning models in clinical practice. External validation is needed to assess the use of this model in other populations. Identifying high-risk individuals will amplify prevention efforts and selection for clinical trials
Systematic review finds "spin" practices and poor reporting standards in studies on machine learning-based prediction models
Objectives
We evaluated the presence and frequency of spin practices and poor reporting standards in studies that developed and/or validated clinical prediction models using supervised machine learning techniques.
Study Design and Setting
We systematically searched PubMed from 01/2018 to 12/2019 to identify diagnostic and prognostic prediction model studies using supervised machine learning. No restrictions were placed on data source, outcome, or clinical specialty.
Results
We included 152 studies: 38% reported diagnostic models and 62% prognostic models. When reported, discrimination was described without precision estimates in 53/71 abstracts (74.6% [95% CI 63.4–83.3]) and 53/81 main texts (65.4% [95% CI 54.6–74.9]). Of the 21 abstracts that recommended the model to be used in daily practice, 20 (95.2% [95% CI 77.3–99.8]) lacked any external validation of the developed models. Likewise, 74/133 (55.6% [95% CI 47.2–63.8]) studies made recommendations for clinical use in their main text without any external validation. Reporting guidelines were cited in 13/152 (8.6% [95% CI 5.1–14.1]) studies.
Conclusion
Spin practices and poor reporting standards are also present in studies on prediction models using machine learning techniques. A tailored framework for the identification of spin will enhance the sound reporting of prediction model studies
Completeness of reporting of clinical prediction models developed using supervised machine learning: A systematic review
ABSTRACTObjectiveWhile many studies have consistently found incomplete reporting of regression-based prediction model studies, evidence is lacking for machine learning-based prediction model studies. We aim to systematically review the adherence of Machine Learning (ML)-based prediction model studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Statement.Study design and settingWe included articles reporting on development or external validation of a multivariable prediction model (either diagnostic or prognostic) developed using supervised ML for individualized predictions across all medical fields (PROSPERO, CRD42019161764). We searched PubMed from 1 January 2018 to 31 December 2019. Data extraction was performed using the 22-item checklist for reporting of prediction model studies (www.TRIPOD-statement.org). We measured the overall adherence per article and per TRIPOD item.ResultsOur search identified 24 814 articles, of which 152 articles were included: 94 (61.8%) prognostic and 58 (38.2%) diagnostic prediction model studies. Overall, articles adhered to a median of 38.7% (IQR 31.0-46.4) of TRIPOD items. No articles fully adhered to complete reporting of the abstract and very few reported the flow of participants (3.9%, 95% CI 1.8 to 8.3), appropriate title (4.6%, 95% CI 2.2 to 9.2), blinding of predictors (4.6%, 95% CI 2.2 to 9.2), model specification (5.2%, 95% CI 2.4 to 10.8), and model’s predictive performance (5.9%, 95% CI 3.1 to 10.9). There was often complete reporting of source of data (98.0%, 95% CI 94.4 to 99.3) and interpretation of the results (94.7%, 95% CI 90.0 to 97.3).ConclusionSimilar to prediction model studies developed using conventional regression-based techniques, the completeness of reporting is poor. Essential information to decide to use the model (i.e. model specification and its performance) is rarely reported. However, some items and sub-items of TRIPOD might be less suitable for ML-based prediction model studies and thus, TRIPOD requires extensions. Overall, there is an urgent need to improve the reporting quality and usability of research to avoid research waste.What is new?Key findings: Similar to prediction model studies developed using regression techniques, machine learning (ML)-based prediction model studies adhered poorly to the TRIPOD statement, the current standard reporting guideline.What this adds to what is known? In addition to efforts to improve the completeness of reporting in ML-based prediction model studies, an extension of TRIPOD for these type of studies is needed.What is the implication, what should change now? While TRIPOD-AI is under development, we urge authors to follow the recommendations of the TRIPOD statement to improve the completeness of reporting and reduce potential research waste of ML-based prediction model studies.</jats:sec
Overinterpretation of findings in machine learning prediction model studies in oncology: a systematic review
Objectives
In biomedical research, spin is the overinterpretation of findings, and it is a growing concern. To date, the presence of spin has not been evaluated in prognostic model research in oncology, including studies developing and validating models for individualized risk prediction.
Study Design and Setting
We conducted a systematic review, searching MEDLINE and EMBASE for oncology-related studies that developed and validated a prognostic model using machine learning published between 1st January, 2019, and 5th September, 2019. We used existing spin frameworks and described areas of highly suggestive spin practices.
Results
We included 62 publications (including 152 developed models; 37 validated models). Reporting was inconsistent between methods and the results in 27% of studies due to additional analysis and selective reporting. Thirty-two studies (out of 36 applicable studies) reported comparisons between developed models in their discussion and predominantly used discrimination measures to support their claims (78%). Thirty-five studies (56%) used an overly strong or leading word in their title, abstract, results, discussion, or conclusion.
Conclusion
The potential for spin needs to be considered when reading, interpreting, and using studies that developed and validated prognostic models in oncology. Researchers should carefully report their prognostic model research using words that reflect their actual results and strength of evidence
Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models
Background and Objectives
We sought to summarize the study design, modelling strategies, and performance measures reported in studies on clinical prediction models developed using machine learning techniques.
Methods
We search PubMed for articles published between 01/01/2018 and 31/12/2019, describing the development or the development with external validation of a multivariable prediction model using any supervised machine learning technique. No restrictions were made based on study design, data source, or predicted patient-related health outcomes.
Results
We included 152 studies, 58 (38.2% [95% CI 30.8–46.1]) were diagnostic and 94 (61.8% [95% CI 53.9–69.2]) prognostic studies. Most studies reported only the development of prediction models (n = 133, 87.5% [95% CI 81.3–91.8]), focused on binary outcomes (n = 131, 86.2% [95% CI 79.8–90.8), and did not report a sample size calculation (n = 125, 82.2% [95% CI 75.4–87.5]). The most common algorithms used were support vector machine (n = 86/522, 16.5% [95% CI 13.5–19.9]) and random forest (n = 73/522, 14% [95% CI 11.3–17.2]). Values for area under the Receiver Operating Characteristic curve ranged from 0.45 to 1.00. Calibration metrics were often missed (n = 494/522, 94.6% [95% CI 92.4–96.3]).
Conclusion
Our review revealed that focus is required on handling of missing values, methods for internal validation, and reporting of calibration to improve the methodological conduct of studies on machine learning–based prediction models.
Systematic review registration
PROSPERO, CRD42019161764
Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models
Background and ObjectivesWe sought to summarize the study design, modelling strategies, and performance measures reported in studies on clinical prediction models developed using machine learning techniques.MethodsWe search PubMed for articles published between 01/01/2018 and 31/12/2019, describing the development or the development with external validation of a multivariable prediction model using any supervised machine learning technique. No restrictions were made based on study design, data source, or predicted patient-related health outcomes.ResultsWe included 152 studies, 58 (38.2% [95% CI 30.8–46.1]) were diagnostic and 94 (61.8% [95% CI 53.9–69.2]) prognostic studies. Most studies reported only the development of prediction models (n = 133, 87.5% [95% CI 81.3–91.8]), focused on binary outcomes (n = 131, 86.2% [95% CI 79.8–90.8), and did not report a sample size calculation (n = 125, 82.2% [95% CI 75.4–87.5]). The most common algorithms used were support vector machine (n = 86/522, 16.5% [95% CI 13.5–19.9]) and random forest (n = 73/522, 14% [95% CI 11.3–17.2]). Values for area under the Receiver Operating Characteristic curve ranged from 0.45 to 1.00. Calibration metrics were often missed (n = 494/522, 94.6% [95% CI 92.4–96.3]).ConclusionOur review revealed that focus is required on handling of missing values, methods for internal validation, and reporting of calibration to improve the methodological conduct of studies on machine learning–based prediction models
Risk of bias assessments in individual participant data meta-analyses of test accuracy and prediction models: a review shows improvements are needed
Objectives: Risk of bias assessments are important in meta-analyses of both aggregate and individual participant data (IPD). There is limited evidence on whether and how risk of bias of included studies or datasets in IPD meta-analyses (IPDMAs) is assessed. We review how risk of bias is currently assessed, reported, and incorporated in IPDMAs of test accuracy and clinical prediction model studies and provide recommendations for improvement.
Study Design and Setting: We searched PubMed (January 2018–May 2020) to identify IPDMAs of test accuracy and prediction models, then elicited whether each IPDMA assessed risk of bias of included studies and, if so, how assessments were reported and subsequently incorporated into the IPDMAs.
Results: Forty-nine IPDMAs were included. Nineteen of 27 (70%) test accuracy IPDMAs assessed risk of bias, compared to 5 of 22 (23%) prediction model IPDMAs. Seventeen of 19 (89%) test accuracy IPDMAs used Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2), but no tool was used consistently among prediction model IPDMAs. Of IPDMAs assessing risk of bias, 7 (37%) test accuracy IPDMAs and 1 (20%) prediction model IPDMA provided details on the information sources (e.g., the original manuscript, IPD, primary investigators) used to inform judgments, and 4 (21%) test accuracy IPDMAs and 1 (20%) prediction model IPDMA provided information or whether assessments were done before or after obtaining the IPD of the included studies or datasets. Of all included IPDMAs, only seven test accuracy IPDMAs (26%) and one prediction model IPDMA (5%) incorporated risk of bias assessments into their meta-analyses. For future IPDMA projects, we provide guidance on how to adapt tools such as Prediction model Risk Of Bias ASsessment Tool (for prediction models) and QUADAS-2 (for test accuracy) to assess risk of bias of included primary studies and their IPD.
Conclusion: Risk of bias assessments and their reporting need to be improved in IPDMAs of test accuracy and, especially, prediction model studies. Using recommended tools, both before and after IPD are obtained, will address this
Systematic review finds "Spin" practices and poor reporting standards in studies on machine learning-based prediction models
Objectives: We evaluated the presence and frequency of spin practices and poor reporting standards in studies that developed and/or validated clinical prediction models using supervised machine learning techniques. Study Design and Setting: We systematically searched PubMed from 01/2018 to 12/2019 to identify diagnostic and prognostic prediction model studies using supervised machine learning. No restrictions were placed on data source, outcome, or clinical specialty. Results: We included 152 studies: 38% reported diagnostic models and 62% prognostic models. When reported, discrimination was described without precision estimates in 53/71 abstracts (74.6% [95% CI 63.4–83.3]) and 53/81 main texts (65.4% [95% CI 54.6–74.9]). Of the 21 abstracts that recommended the model to be used in daily practice, 20 (95.2% [95% CI 77.3–99.8]) lacked any external validation of the developed models. Likewise, 74/133 (55.6% [95% CI 47.2–63.8]) studies made recommendations for clinical use in their main text without any external validation. Reporting guidelines were cited in 13/152 (8.6% [95% CI 5.1–14.1]) studies. Conclusion: Spin practices and poor reporting standards are also present in studies on prediction models using machine learning techniques. A tailored framework for the identification of spin will enhance the sound reporting of prediction model studies
Spin-pm: A consensus framework to evaluate the presence of spin in studies on prediction models
Objectives: To develop a framework to identify and evaluate spin practices and its facilitators in studies on clinical prediction model regardless of the modeling technique. Study Design and Setting: We followed a three-phase consensus process: (1) premeeting literature review to generate items to be included; (2) a series of structured meetings to provide comments discussed and exchanged viewpoints on items to be included with a panel of experienced researchers; and (3) postmeeting review on final list of items and examples to be included. Through this iterative consensus process, a framework was derived after all panel's researchers agreed. Results: This consensus process involved a panel of eight researchers and resulted in SPIN-Prediction Models which consists of two categories of spin (misleading interpretation and misleading transportability), and within these categories, two forms of spin (spin practices and facilitators of spin). We provide criteria and examples. Conclusion: We proposed this guidance aiming to facilitate not only the accurate reporting but also an accurate interpretation and extrapolation of clinical prediction models which will likely improve the reporting quality of subsequent research, as well as reduce research waste
Risk of bias assessments in individual participant data meta-analyses of test accuracy and prediction models:a review shows improvements are needed
OBJECTIVES: Risk of bias assessments are important in meta-analyses of both aggregate and individual participant data (IPD). There is limited evidence on whether and how risk of bias of included studies or datasets in IPD meta-analyses (IPDMAs) is assessed. We review how risk of bias is currently assessed, reported, and incorporated in IPDMAs of test accuracy and clinical prediction model studies and provide recommendations for improvement.STUDY DESIGN AND SETTING: We searched PubMed (January 2018-May 2020) to identify IPDMAs of test accuracy and prediction models, then elicited whether each IPDMA assessed risk of bias of included studies and, if so, how assessments were reported and subsequently incorporated into the IPDMAs.RESULTS: Forty-nine IPDMAs were included. Nineteen of 27 (70%) test accuracy IPDMAs assessed risk of bias, compared to 5 of 22 (23%) prediction model IPDMAs. Seventeen of 19 (89%) test accuracy IPDMAs used Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2), but no tool was used consistently among prediction model IPDMAs. Of IPDMAs assessing risk of bias, 7 (37%) test accuracy IPDMAs and 1 (20%) prediction model IPDMA provided details on the information sources (e.g., the original manuscript, IPD, primary investigators) used to inform judgments, and 4 (21%) test accuracy IPDMAs and 1 (20%) prediction model IPDMA provided information or whether assessments were done before or after obtaining the IPD of the included studies or datasets. Of all included IPDMAs, only seven test accuracy IPDMAs (26%) and one prediction model IPDMA (5%) incorporated risk of bias assessments into their meta-analyses. For future IPDMA projects, we provide guidance on how to adapt tools such as Prediction model Risk Of Bias ASsessment Tool (for prediction models) and QUADAS-2 (for test accuracy) to assess risk of bias of included primary studies and their IPD.CONCLUSION: Risk of bias assessments and their reporting need to be improved in IPDMAs of test accuracy and, especially, prediction model studies. Using recommended tools, both before and after IPD are obtained, will address this.</p