It appears to be a commonly held belief that Machine Learning (ML) classification algorithms
should achieve substantially higher predictive performance than manually specified Random
Utility Models (RUMs) for choice modelling. This belief is supported by several papers in
the mode choice literature, which highlight stand-out performance of non-linear ML classifiers
compared with linear models. However, many studies which compare ML classifiers with linear
models have a fundamental flaw in how they validate models on out-of-sample data. This paper
investigates the implications of this issue by repeating the experiments of three past papers using
two different sampling methods for panel data.
The results indicate that using trip-wise sampling with travel diary data causes significant data
leakage. Furthermore, the results demonstrate that this data leakage introduces substantial
bias in model performance estimates, particularly for flexible non-linear classifiers. Grouped
sampling is found to address the issues associated with trip-wise sampling and provides reliable
estimates of true Out-Of-Sample (OOS) predictive performance. Whilst the results from this
study indicate that there is a slight predictive performance advantage of non-linear classifiers
over linear Logistic Regression (LR) models, this advantage is much more modest than has been
suggested by previous investigations