Progress of machine learning in critical care has been difficult to track, in
part due to absence of public benchmarks. Other fields of research (such as
computer vision and natural language processing) have established various
competitions and public benchmarks. Recent availability of large clinical
datasets has enabled the possibility of establishing public benchmarks. Taking
advantage of this opportunity, we propose a public benchmark suite to address
four areas of critical care, namely mortality prediction, estimation of length
of stay, patient phenotyping and risk of decompensation. We define each task
and compare the performance of both clinical models as well as baseline and
deep learning models using eICU critical care dataset of around 73,000
patients. This is the first public benchmark on a multi-centre critical care
dataset, comparing the performance of clinical gold standard with our
predictive model. We also investigate the impact of numerical variables as well
as handling of categorical variables on each of the defined tasks. The source
code, detailing our methods and experiments is publicly available such that
anyone can replicate our results and build upon our work.Comment: Source code to replicate the results
https://github.com/mostafaalishahi/eICU_Benchmar