Large amounts of electronic medical records collected by hospitals across the
developed world offer unprecedented possibilities for knowledge discovery using
computer based data mining and machine learning. Notwithstanding significant
research efforts, the use of this data in the prediction of disease development
has largely been disappointing. In this paper we examine in detail a recently
proposed method which has in preliminary experiments demonstrated highly
promising results on real-world data. We scrutinize the authors' claims that
the proposed model is scalable and investigate whether the tradeoff between
prediction specificity (i.e. the ability of the model to predict a wide number
of different ailments) and accuracy (i.e. the ability of the model to make the
correct prediction) is practically viable. Our experiments conducted on a data
corpus of nearly 3,000,000 admissions support the authors' expectations and
demonstrate that the high prediction accuracy is maintained well even when the
number of admission types explicitly included in the model is increased to
account for 98% of all admissions in the corpus. Thus several promising
directions for future work are highlighted.Comment: In Proc. International Conference on Bioinformatics and Computational
Biology, April 201