The decision tree is one of the most popular and classical machine learning
models from the 1980s. However, in many practical applications, decision trees
tend to generate decision paths with excessive depth. Long decision paths often
cause overfitting problems, and make models difficult to interpret. With longer
decision paths, inference is also more likely to fail when the data contain
missing values. In this work, we propose a new tree model called Cascading
Decision Trees to alleviate this problem. The key insight of Cascading Decision
Trees is to separate the decision path and the explanation path. Our
experiments show that on average, Cascading Decision Trees generate 63.38%
shorter explanation paths, avoiding overfitting and thus achieve higher test
accuracy. We also empirically demonstrate that Cascading Decision Trees have
advantages in the robustness against missing values