Recent studies have demonstrated the great power of deep learning methods,
particularly Transformer and MLP, for time series forecasting. Despite its
success in NLP and CV, many studies found that Transformer is less effective
than MLP for time series forecasting. In this work, we design a special
Transformer, i.e., channel-aligned robust dual Transformer (CARD for short),
that addresses key shortcomings of Transformer in time series forecasting.
First, CARD introduces a dual Transformer structure that allows it to capture
both temporal correlations among signals and dynamical dependence among
multiple variables over time. Second, we introduce a robust loss function for
time series forecasting to alleviate the potential overfitting issue. This new
loss function weights the importance of forecasting over a finite horizon based
on prediction uncertainties. Our evaluation of multiple long-term and
short-term forecasting datasets demonstrates that CARD significantly
outperforms state-of-the-art time series forecasting methods, including both
Transformer and MLP-based models