1 research outputs found
Explainability of Traditional and Deep Learning Models on Longitudinal Healthcare Records
Recent advances in deep learning have led to interest in training deep
learning models on longitudinal healthcare records to predict a range of
medical events, with models demonstrating high predictive performance.
Predictive performance is necessary but insufficient, however, with
explanations and reasoning from models required to convince clinicians for
sustained use. Rigorous evaluation of explainability is often missing, as
comparisons between models (traditional versus deep) and various explainability
methods have not been well-studied. Furthermore, ground truths needed to
evaluate explainability can be highly subjective depending on the clinician's
perspective. Our work is one of the first to evaluate explainability
performance between and within traditional (XGBoost) and deep learning (LSTM
with Attention) models on both a global and individual per-prediction level on
longitudinal healthcare data. We compared explainability using three popular
methods: 1) SHapley Additive exPlanations (SHAP), 2) Layer-Wise Relevance
Propagation (LRP), and 3) Attention. These implementations were applied on
synthetically generated datasets with designed ground-truths and a real-world
medicare claims dataset. We showed that overall, LSTMs with SHAP or LRP
provides superior explainability compared to XGBoost on both the global and
local level, while LSTM with dot-product attention failed to produce reasonable
ones. With the explosion of the volume of healthcare data and deep learning
progress, the need to evaluate explainability will be pivotal towards
successful adoption of deep learning models in healthcare settings.Comment: 21 pages, 10 figure