Search CORE

19 research outputs found

Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach

Author: Anstee Quentin M.
Henderson Robin
McTeer Matthew
Missier Paolo
Publication venue
Publication date: 01/03/2024
Field of study

Aims: Overlapping asymmetric data sets are where a large cohort of observations have a small amount of information recorded, and within this group there exists a smaller cohort which have extensive further information available. Missing imputation is unwise if cohort size differs substantially; therefore, we aim to develop a way of modelling the smaller cohort whilst considering the larger. Methods: Through considering traditionally once penalized P-Spline approximations, we create a second penalty term through observing discrepancies in the marginal value of covariates that exist in both cohorts. Our now twice penalized P-Spline is designed to firstly prevent over/under-fitting of the smaller cohort and secondly to consider the larger cohort. Results: Through a series of data simulations, penalty parameter tunings, and model adaptations, our twice penalized model offers up to a 58% and 46% improvement in model fit upon a continuous and binary response, respectively, against existing B-Spline and once penalized P-Spline methods. Applying our model to an individual’s risk of developing steatohepatitis, we report an over 65% improvement over existing methods. Conclusions: We propose a twice penalized P-Spline method which can vastly improve the model fit of overlapping asymmetric data sets upon a common predictive endpoint, without the need for missing data imputation

University of Birmingham Research Portal

Directory of Open Access Journals

Handling Overlapping Asymmetric Datasets -- A Twice Penalized P-Spline Approach

Author: Anstee Quentin M
Henderson Robin
McTeer Matthew
Missier Paolo
Publication venue
Publication date: 20/11/2023
Field of study

Overlapping asymmetric datasets are common in data science and pose questions of how they can be incorporated together into a predictive analysis. In healthcare datasets there is often a small amount of information that is available for a larger number of patients such as an electronic health record, however a small number of patients may have had extensive further testing. Common solutions such as missing imputation can often be unwise if the smaller cohort is significantly different in scale to the larger sample, therefore the aim of this research is to develop a new method which can model the smaller cohort against a particular response, whilst considering the larger cohort also. Motivated by non-parametric models, and specifically flexible smoothing techniques via generalized additive models, we model a twice penalized P-Spline approximation method to firstly prevent over/under-fitting of the smaller cohort and secondly to consider the larger cohort. This second penalty is created through discrepancies in the marginal value of covariates that exist in both the smaller and larger cohorts. Through data simulations, parameter tunings and model adaptations to consider a continuous and binary response, we find our twice penalized approach offers an enhanced fit over a linear B-Spline and once penalized P-Spline approximation. Applying to a real-life dataset relating to a person's risk of developing Non-Alcoholic Steatohepatitis, we see an improved model fit performance of over 65%. Areas for future work within this space include adapting our method to not require dimensionality reduction and also consider parametric modelling methods. However, to our knowledge this is the first work to propose additional marginal penalties in a flexible regression of which we can report a vastly improved model fit that is able to consider asymmetric datasets, without the need for missing data imputation.Comment: 52 pages, 17 figures, 8 tables, 34 reference

arXiv.org e-Print Archive

Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information

Aims: Metabolic dysfunction Associated Steatotic Liver Disease (MASLD) outcomes such as MASH (metabolic dysfunction associated steatohepatitis), fibrosis and cirrhosis are ordinarily determined by resource-intensive and invasive biopsies. We aim to show that routine clinical tests offer sufficient information to predict these endpoints. Methods: Using the LITMUS Metacohort derived from the European NAFLD Registry, the largest MASLD dataset in Europe, we create three combinations of features which vary in degree of procurement including a 19-variable feature set that are attained through a routine clinical appointment or blood test. This data was used to train predictive models using supervised machine learning (ML) algorithm XGBoost, alongside missing imputation technique MICE and class balancing algorithm SMOTE. Shapley Additive exPlanations (SHAP) were added to determine relative importance for each clinical variable. Results: Analysing nine biopsy-derived MASLD outcomes of cohort size ranging between 5385 and 6673 subjects, we were able to predict individuals at training set AUCs ranging from 0.719-0.994, including classifying individuals who are At-Risk MASH at an AUC = 0.899. Using two further feature combinations of 26-variables and 35-variables, which included composite scores known to be good indicators for MASLD endpoints and advanced specialist tests, we found predictive performance did not sufficiently improve. We are also able to present local and global explanations for each ML model, offering clinicians interpretability without the expense of worsening predictive performance. Conclusions: This study developed a series of ML models of accuracy ranging from 71.9—99.4% using only easily extractable and readily available information in predicting MASLD outcomes which are usually determined through highly invasive means

University of Birmingham Research Portal

Oxford University Research Archive

Bern Open Repository and Information System (BORIS)

Helsingin yliopiston digitaalinen arkisto

Performance of non-invasive tests and histology for the prediction of clinical outcomes in patients with non-alcoholic fatty liver disease: an individual participant data meta-analysis

Author: Aithal Guruprasad P
Aithal Guruprasad Padur
Akbari Camilla
Akhtar Salma
Alexander Leigh
Allison Mike
Alonso Cristina
Alzoubi Osama
Ampuero Javier
Andersson Anneli
Anstee Quentin M
Applegate Douglas
Armandi Angelo
Arola Johanna
Balp Maria-Magdalena
Banerjee Rajarshi
Bedossa Pierre
Bernardo Barbara
Berzigotti Annalisa
Biegel Hannah
Billin Andrew
Bossuyt Patrick M
Boursier Jérôme
Bradley Christopher
Brass Clifford
Breckons Matt
Bugianesi Elisabetta
Burt Alastair
Campbell Mark
Cassinotto Christophe
Castell Javier
Chan Wah-Kheong
Chen Yan
Chen Yu
Chng Elaine
Clément Karine
Cobbold Jeremy
Cockell Simon
Cordell Heather J
Cortez-Pinto Helena
Coxson Harvey
Daly Ann K
Davidsen Peter
Davies Susan
Day Christopher P
de Lédinghen Victor
de Saint Loup Marc
Dennis Andrea
Derdak Zoltan
Doward Lynda
Driessen Ann
Duffin Kevin
Dufour Jean-François
Ehman Richard
Ekstedt Mattias
Ellegaard Jens
Ertle Judith
Eslam Mohammed
Fernández Isabel
Fournier-Poizat Céline
Francis Susan
Francque Sven
Gaia Silvia
Gallego-Durán Rocío
Gastaldelli Amalia
Geier Andreas
George Jacob
Gouw Annette
Govaere Olivier
Gómez-González Emilio
Hagström Hannes
Harder Lea Mørch
Harrington Magdalena Alicia
Harrison Stephen
Harrison Stephen A
Hirooka Masashi
Ho Gideon
Hockings Paul
Holleboom Adriaan G
Horan Gerald
Hytiroglou Prodromos
Hyötyläinen Tuulia
Hübscher Stefan
Jennings Lori
Kalutkiewicz Michael
Kamzolas Ioannis
Karlas Thomas
Karsdal Morten
Kechagias Stergios
Kelly Matt
Kjær Mette Skalshøi
Lackner Carolin
Landgren Henrik
Lee Dae Ho
Lee Jenny
Lee Jenny A
Leeming Diana Julie
Lelliott Chris J
Li Guanlin
Liguori Antonio
Lindén Daniel
Llorca Anne
Lupșor-Platon Monica
Löffler Jürgen
Mahadeva Sanjiv
Mak Anne Linde
Marra Fabio
Martic Miljen
Masoodi Mojgan
Mato Jose M
Mayo Rebeca
McGlinchey Aiden
McLeod Euan James
McTeer Matthew
Mendoza Yuly P
Miele Luca
Miller Melissa
Millet Óscar
Missier Paolo
Montero-Vallejo Rocío
Mozes Ferenc
Musa Kishwar
Myneni Sudha
Mózes Ferenc E
Nakajima Atsushi
Nasr Patrik
Neubauer Stefan
Newsome Philip
Nieuwdorp Max
Noureddin Mazen
Oakley Fiona
Oldenburger Anouk
Olodo-Atitebi Seliat
Oluboyede Yemi
Orešič Matej
Ostroff Rachel
Pais Raluca
Palaniyappan Naaventhan
Papatheodoridis George
Paradis Valerie
Paternostro Rafael
Patino-Navarrete Rafael
Patterson Scott D
Pavlides Michael
Pelusi Serena
Pennisi Grazia
Pepin Kay
Perfield James W
Petsalaki Evangelia
Petta Salvatore
Porthan Kimmo
Rajaram Ruveena
Rasmussen Daniel Guldager Kring
Ratziu Vlad
Reißing Johanna
Ridolfo Sofia
Rodrigues Cecilia M P
Rodrigues-Cuenca Sergio
Romero-Gómez Manuel
Rosenquist Christian
Ross Trenton
Rosso Chiara
Sandt Estelle
Schattenberg Jörn M
Schneider Moritz
Schuppan Detlef
Schölch Corinna
Sebastiani Giada
Shalimar
Shankar Sudha
Shima Toshihide
Shumbayawonda Elizabeth
Sinisi Antonia
Stauber Rudolf E
Staufer Katharina
Straub Beate K
Surabattula Rambabu
Svegliati Gianluca
Tai Dean
Thakker Paresh
Tiniakos Dina
Tonini Manuela
Torstenson Richard
Trauner Michael
Trautwein Christian
Truong Emily
Trylesinski Aldo
Tsochatzis Emmanuel
Tuthill Theresa
Twiss James
Vacca Michele
Vale Luke
Valenti Luca
Vali Yasaman
van Dijk Anne-Marieke
van Mil Saskia
Verheij Joanne
Vidal-Puig Toni
Viganò Mauro
Vonghia Luisa
Wenn David
Wiegand Johannes
Wigley Ioan
Wonders Kristy
Wong Grace Lai-Hung
Wong Vincent Wai-Sun
Yki-Järvinen Hannele
Yoneda Masato
Younes Ramy
Yunis Carla
Zafarmand Hadi
Zheng Ming-Hua
Publication venue: 'Elsevier BV'
Publication date: 05/06/2023
Field of study

BackgroundHistologically assessed liver fibrosis stage has prognostic significance in patients with non-alcoholic fatty liver disease (NAFLD) and is accepted as a surrogate endpoint in clinical trials for non-cirrhotic NAFLD. Our aim was to compare the prognostic performance of non-invasive tests with liver histology in patients with NAFLD.MethodsThis was an individual participant data meta-analysis of the prognostic performance of histologically assessed fibrosis stage (F0–4), liver stiffness measured by vibration-controlled transient elastography (LSM-VCTE), fibrosis-4 index (FIB-4), and NAFLD fibrosis score (NFS) in patients with NAFLD. The literature was searched for a previously published systematic review on the diagnostic accuracy of imaging and simple non-invasive tests and updated to Jan 12, 2022 for this study. Studies were identified through PubMed/MEDLINE, EMBASE, and CENTRAL, and authors were contacted for individual participant data, including outcome data, with a minimum of 12 months of follow-up. The primary outcome was a composite endpoint of all-cause mortality, hepatocellular carcinoma, liver transplantation, or cirrhosis complications (ie, ascites, variceal bleeding, hepatic encephalopathy, or progression to a MELD score ≥15). We calculated aggregated survival curves for trichotomised groups and compared them using stratified log-rank tests (histology: F0–2 vs F3 vs F4; LSM: 2·67; NFS: 0·676), calculated areas under the time-dependent receiver operating characteristic curves (tAUC), and performed Cox proportional-hazards regression to adjust for confounding. This study was registered with PROSPERO, CRD42022312226.FindingsOf 65 eligible studies, we included data on 2518 patients with biopsy-proven NAFLD from 25 studies (1126 [44·7%] were female, median age was 54 years [IQR 44–63), and 1161 [46·1%] had type 2 diabetes). After a median follow-up of 57 months [IQR 33–91], the composite endpoint was observed in 145 (5·8%) patients. Stratified log-rank tests showed significant differences between the trichotomised patient groups (p<0·0001 for all comparisons). The tAUC at 5 years were 0·72 (95% CI 0·62–0·81) for histology, 0·76 (0·70–0·83) for LSM-VCTE, 0·74 (0·64–0·82) for FIB-4, and 0·70 (0·63–0·80) for NFS. All index tests were significant predictors of the primary outcome after adjustment for confounders in the Cox regression.InterpretationSimple non-invasive tests performed as well as histologically assessed fibrosis in predicting clinical outcomes in patients with NAFLD and could be considered as alternatives to liver biopsy in some cases

Repository@Nottingham

Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach

Author: Matthew McTeer
Paolo Missier
Quentin M. Anstee
Robin Henderson
Publication venue: MDPI AG
Publication date: 01/03/2024
Field of study

Directory of Open Access Journals

Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information

Abstract: Aims Metabolic dysfunction Associated Steatotic Liver Disease (MASLD) outcomes such as MASH (metabolic dysfunction associated steatohepatitis), fibrosis and cirrhosis are ordinarily determined by resource-intensive and invasive biopsies. We aim to show that routine clinical tests offer sufficient information to predict these endpoints.Methods Using the LITMUS Metacohort derived from the European NAFLD Registry, the largest MASLD dataset in Europe, we create three combinations of features which vary in degree of procurement including a 19-variable feature set that are attained through a routine clinical appointment or blood test. This data was used to train predictive models using supervised machine learning (ML) algorithm XGBoost, alongside missing imputation technique MICE and class balancing algorithm SMOTE. Shapley Additive exPlanations (SHAP) were added to determine relative importance for each clinical variable.Results Analysing nine biopsy-derived MASLD outcomes of cohort size ranging between 5385 and 6673 subjects, we were able to predict individuals at training set AUCs ranging from 0.719-0.994, including classifying individuals who are At-Risk MASH at an AUC = 0.899. Using two further feature combinations of 26-variables and 35-variables, which included composite scores known to be good indicators for MASLD endpoints and advanced specialist tests, we found predictive performance did not sufficiently improve. We are also able to present local and global explanations for each ML model, offering clinicians interpretability without the expense of worsening predictive performance.Conclusions This study developed a series of ML models of accuracy ranging from 71.9-99.4% using only easily extractable and readily available information in predicting MASLD outcomes which are usually determined through highly invasive means

Institutional Repository Universiteit Antwerpen

Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information

Author: Adriaan G. Holleboom
Andreas Geier
Clifford Brass
Dina Tiniakos
Douglas Applegate
Elisabetta Bugianesi
Georgios Papatheodoridis
Hannele Yki-Jarvinen
Jean-Francois Dufour
Jeremy Cobbold
Jörn M. Schattenberg
Luca Miele
Luca Valenti
Manuel Romero Gomez
Matthew McTeer
Mattias Ekstedt
Michael Allison
Michael Pavlides
Paolo Missier
Peter Mesenbrink
Quentin M. Anstee
Sven Francque
Vlad Ratziu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2024
Field of study

Directory of Open Access Journals

Recommended from our members

Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information.

Author: Allison Michael
Anstee Quentin M
Applegate Douglas
Brass Clifford
Bugianesi Elisabetta
Cobbold Jeremy
Dufour Jean-Francois
Ekstedt Mattias
Francque Sven
Geier Andreas
Holleboom Adriaan G
LITMUS Consortium investigators
McTeer Matthew
Mesenbrink Peter
Miele Luca
Missier Paolo
Papatheodoridis Georgios
Pavlides Michael
Ratziu Vlad
Romero Gomez Manuel
Schattenberg Jörn M
Tiniakos Dina
Valenti Luca
Yki-Jarvinen Hannele
Publication venue: PLoS One
Publication date: 29/02/2024
Field of study

AIMS: Metabolic dysfunction Associated Steatotic Liver Disease (MASLD) outcomes such as MASH (metabolic dysfunction associated steatohepatitis), fibrosis and cirrhosis are ordinarily determined by resource-intensive and invasive biopsies. We aim to show that routine clinical tests offer sufficient information to predict these endpoints. METHODS: Using the LITMUS Metacohort derived from the European NAFLD Registry, the largest MASLD dataset in Europe, we create three combinations of features which vary in degree of procurement including a 19-variable feature set that are attained through a routine clinical appointment or blood test. This data was used to train predictive models using supervised machine learning (ML) algorithm XGBoost, alongside missing imputation technique MICE and class balancing algorithm SMOTE. Shapley Additive exPlanations (SHAP) were added to determine relative importance for each clinical variable. RESULTS: Analysing nine biopsy-derived MASLD outcomes of cohort size ranging between 5385 and 6673 subjects, we were able to predict individuals at training set AUCs ranging from 0.719-0.994, including classifying individuals who are At-Risk MASH at an AUC = 0.899. Using two further feature combinations of 26-variables and 35-variables, which included composite scores known to be good indicators for MASLD endpoints and advanced specialist tests, we found predictive performance did not sufficiently improve. We are also able to present local and global explanations for each ML model, offering clinicians interpretability without the expense of worsening predictive performance. CONCLUSIONS: This study developed a series of ML models of accuracy ranging from 71.9-99.4% using only easily extractable and readily available information in predicting MASLD outcomes which are usually determined through highly invasive means

Apollo (Cambridge)

Evaluation metrics for <i>’Core’</i> dataset performance upon predicting all response using XGBoost with MICE and SMOTE.

Evaluation metrics for ’Core’ dataset performance upon predicting all response using XGBoost with MICE and SMOTE.</p

The Francis Crick Institute

SHAP force plots.

Force plots illustrating the impact of each feature upon the prediction of 4 random individual’s probability of At-Risk MASH. Top Left: A non-diabetic, 49 year old man of low fibrosis stage. Top Right: A diabetic, 69 year old woman of low fibrosis stage. Bottom Left: A non-diabetic 76 year old woman of high fibrosis stage. Bottom Right: A diabetic, 55 year old man of high fibrosis stage.</p

The Francis Crick Institute