This paper presents an ensemble forecasting method that shows strong results
on the M4 Competition dataset by decreasing feature and model selection
assumptions, termed DONUT (DO Not UTilize human beliefs). Our assumption
reductions, primarily consisting of auto-generated features and a more diverse
model pool for the ensemble, significantly outperform the statistical,
feature-based ensemble method FFORMA by Montero-Manso et al. (2020). We also
investigate feature extraction with a Long Short-term Memory Network (LSTM)
Autoencoder and find that such features contain crucial information not
captured by standard statistical feature approaches. The ensemble weighting
model uses LSTM and statistical features to combine the models accurately. The
analysis of feature importance and interaction shows a slight superiority for
LSTM features over the statistical ones alone. Clustering analysis shows that
essential LSTM features differ from most statistical features and each other.
We also find that increasing the solution space of the weighting model by
augmenting the ensemble with new models is something the weighting model learns
to use, thus explaining part of the accuracy gains. Moreover, we present a
formal ex-post-facto analysis of an optimal combination and selection for
ensembles, quantifying differences through linear optimization on the M4
dataset. Our findings indicate that classical statistical time series features,
such as trend and seasonality, alone do not capture all relevant information
for forecasting a time series. On the contrary, our novel LSTM features contain
significantly more predictive power than the statistical ones alone, but
combining the two feature sets proved the best in practice