67 research outputs found
Introduction to the M5 forecasting competition Special Issue
The M5 competition follows the previous four M competitions, whose purpose is to learn from empirical evidence how to improve forecasting performance and advance the theory and practice of forecasting. M5 focused on a retail sales forecasting application with the objective to produce the most accurate point forecasts for 42,840 time series that represent the hierarchical unit sales of the largest retail company in the world, Walmart, as well as to provide the most accurate estimates of the uncertainty of these forecasts. Hence, the competition consisted of two parallel challenges, namely the Accuracy and Uncertainty forecasting competitions. M5 extended the results of the previous M competitions by: (a) significantly expanding the number of participating methods, especially those in the category of machine learning; (b) evaluating the performance of the uncertainty distribution along with point forecast accuracy; (c) including exogenous/explanatory variables in addition to the time series data; (d) using grouped, correlated time series; and (e) focusing on series that display intermittency. This paper describes the background, organization, and implementations of the competition, and it presents the data used and their characteristics. Consequently, it serves as introductory material to the results of the two forecasting challenges to facilitate their understanding
The impact of imperfect weather forecasts on wind power forecasting performance: Evidence from two wind farms in Greece
Weather variables are an important driver of power generation from renewable energy sources. However, accurately predicting such variables is a challenging task, which has a significant impact on the accuracy of the power generation forecasts. In this study, we explore the impact of imperfect weather forecasts on two classes of forecasting methods (statistical and machine learning) for the case of wind power generation. We perform a stress test analysis to measure the robustness of different methods on the imperfect weather input, focusing on both the point forecasts and the 95% prediction intervals. The results indicate that different methods should be considered according to the uncertainty characterizing the weather forecasts
Model combinations through revised base-rates
Standard selection criteria for forecasting models focus on information that is calculated for each series independently, disregarding the general tendencies and performance of the candidate models. In this paper, we propose a new way to perform statistical model selection and model combination that incorporates the base rates of the candidate forecasting models, which are then revised so that the per-series information is taken into account. We examine two schemes that are based on the precision and sensitivity information from the contingency table of the base rates. We apply our approach on pools of either exponential smoothing or ARMA models, considering both simulated and real time series, and show that our schemes work better than standard statistical benchmarks. We test the significance and sensitivity of our results, discuss the connection of our approach to other cross-learning approaches, and offer insights regarding implications for theory and practice.<br/
On the selection of forecasting accuracy measures
A lot of controversy exists around the choice of the most appropriate error measure for assessing the performance of forecasting methods. While statisticians argue for the use of measures with good statistical properties, practitioners prefer measures that are easy to communicate and understand. Moreover, researchers argue that the loss-function for parameterizing a model should be aligned with how the post-performance measurement is made. In this paper we ask: Does it matter? Will the relative ranking of the forecasting methods change significantly if we choose one measure over another? Will a mismatch of the in-sample loss-function and the out-of-sample performance measure decrease the performance of the forecasting models? Focusing on the average ranked point forecast accuracy, we review the most commonly-used measures in both the academia and practice and perform a large-scale empirical study to understand the importance of the choice between measures. Our results suggest that there are only small discrepancies between the different error measures, especially within each measure category (percentage, relative, or scaled)
On the selection of forecasting accuracy measures
A lot of controversy exists around the choice of the most appropriate error measure for assessing the performance of forecasting methods. While statisticians argue for the use of measures with good statistical properties, practitioners prefer measures that are easy to communicate and understand. Moreover, researchers argue that the loss-function for parameterizing a model should be aligned with how the post-performance measurement is made. In this paper we ask: Does it matter? Will the relative ranking of the forecasting methods change significantly if we choose one measure over another? Will a mismatch of the in-sample loss-function and the out-of-sample performance measure decrease the performance of the forecasting models? Focusing on the average ranked point forecast accuracy, we review the most commonly-used measures in both the academia and practice and perform a large-scale empirical study to understand the importance of the choice between measures. Our results suggest that there are only small discrepancies between the different error measures, especially within each measure category (percentage, relative, or scaled)
The future of forecasting competitions: Design attributes and principles
Forecasting competitions are the equivalent of laboratory experimentation widely used in physical and life sciences. They provide useful, objective information to improve the theory and practice of forecasting, advancing the field, expanding its usage, and enhancing its value to decision and policymakers. We describe 10 design attributes to be considered when organizing forecasting competitions, taking into account trade-offs between optimal choices and practical concerns, such as costs, as well as the time and effort required to participate in them. Consequently, we map all major past competitions in respect to their design attributes, identifying similarities and differences between them, as well as design gaps, and making suggestions about the principles to be included in future competitions, putting a particular emphasis on learning as much as possible from their implementation in order to help improve forecasting accuracy and uncertainty. We discuss that the task of forecasting often presents a multitude of challenges that can be difficult to capture in a single forecasting contest. To assess the caliber of a forecaster, we, therefore, propose that organizers of future competitions consider a multicontest approach. We suggest the idea of a forecasting-“athlon” in which different challenges of varying characteristics take place
- …