Search CORE

9 research outputs found

Modelling for At-Risk MASH.

AUC for every classifier predicting At-Risk MASH by Feature Set and Model Composition.</p

AimsMetabolic dysfunction Associated Steatotic Liver Disease (MASLD) outcomes such as MASH (metabolic dysfunction associated steatohepatitis), fibrosis and cirrhosis are ordinarily determined by resource-intensive and invasive biopsies. We aim to show that routine clinical tests offer sufficient information to predict these endpoints.MethodsUsing the LITMUS Metacohort derived from the European NAFLD Registry, the largest MASLD dataset in Europe, we create three combinations of features which vary in degree of procurement including a 19-variable feature set that are attained through a routine clinical appointment or blood test. This data was used to train predictive models using supervised machine learning (ML) algorithm XGBoost, alongside missing imputation technique MICE and class balancing algorithm SMOTE. Shapley Additive exPlanations (SHAP) were added to determine relative importance for each clinical variable.ResultsAnalysing nine biopsy-derived MASLD outcomes of cohort size ranging between 5385 and 6673 subjects, we were able to predict individuals at training set AUCs ranging from 0.719-0.994, including classifying individuals who are At-Risk MASH at an AUC = 0.899. Using two further feature combinations of 26-variables and 35-variables, which included composite scores known to be good indicators for MASLD endpoints and advanced specialist tests, we found predictive performance did not sufficiently improve. We are also able to present local and global explanations for each ML model, offering clinicians interpretability without the expense of worsening predictive performance.ConclusionsThis study developed a series of ML models of accuracy ranging from 71.9—99.4% using only easily extractable and readily available information in predicting MASLD outcomes which are usually determined through highly invasive means.</div

FigShare

ML/Linear approach comparison for predicting At-Risk MASH.

Error bars denote +/- S.D from k = 5 fold cross-validation.</p

FigShare

Clinical variables owing to three feature sets used within this analysis.

Clinical variables owing to three feature sets used within this analysis.</p

FigShare

SHAP summary plots.

Ranking of Core variables in terms of their influence on predicting At-Risk MASH for XGBoost with MICE and SMOTE model.</p

FigShare

Evaluation metrics for <i>’Core’</i> dataset performance upon predicting all response using XGBoost with MICE and SMOTE.

Evaluation metrics for ’Core’ dataset performance upon predicting all response using XGBoost with MICE and SMOTE.</p

FigShare

SHAP force plots.

Force plots illustrating the impact of each feature upon the prediction of 4 random individual’s probability of At-Risk MASH. Top Left: A non-diabetic, 49 year old man of low fibrosis stage. Top Right: A diabetic, 69 year old woman of low fibrosis stage. Bottom Left: A non-diabetic 76 year old woman of high fibrosis stage. Bottom Right: A diabetic, 55 year old man of high fibrosis stage.</p

FigShare

Training/Test set comparison.

Training and Test AUCs and ROC curves for XGB + MICE + SMOTE model using Core variables upon predicting At-Risk MASH.</p

FigShare

MASLD target condition’s class distribution.

FigShare