406 research outputs found

    Statistical post-processing of heat index ensemble forecasts: is there a royal road?

    Full text link
    We investigate the effect of statistical post-processing on the probabilistic skill of discomfort index (DI) and indoor wet-bulb globe temperature (WBGTid) ensemble forecasts, both calculated from the corresponding forecasts of temperature and dew point temperature. Two different methodological approaches to calibration are compared. In the first case, we start with joint post-processing of the temperature and dew point forecasts and then create calibrated samples of DI and WBGTid using samples from the obtained bivariate predictive distributions. This approach is compared with direct post-processing of the heat index ensemble forecasts. For this purpose, a novel ensemble model output statistics model based on a generalized extreme value distribution is proposed. The predictive performance of both methods is tested on the operational temperature and dew point ensemble forecasts of the European Centre for Medium-Range Weather Forecasts and the corresponding forecasts of DI and WBGTid. For short lead times (up to day 6), both approaches significantly improve the forecast skill. Among the competing post-processing methods, direct calibration of heat indices exhibits the best predictive performance, very closely followed by the more general approach based on joint calibration of temperature and dew point temperature. Additionally, a machine learning approach is tested and shows comparable performance for the case when one is interested only in forecasting heat index warning level categories.Comment: 29 pages, 12 figure

    Weighted verification tools to evaluate univariate and multivariate probabilistic forecasts for high-impact weather events

    Get PDF
    To mitigate the impacts associated with adverse weather conditions, meteorological services issue weather warnings to the general public. These warnings rely heavily on forecasts issued by underlying prediction systems. When deciding which prediction system(s) to utilise to construct warnings, it is important to compare systems in their ability to forecast the occurrence and severity of extreme weather events. However, evaluating forecasts for extreme events is known to be a challenging task. This is exacerbated further by the fact that high-impact weather often manifests as a result of several confounding features, a realisation that has led to considerable research on so-called compound weather events. Both univariate and multivariate methods are therefore required to evaluate forecasts for high-impact weather. In this paper, we discuss weighted verification tools, which allow particular outcomes to be emphasised during forecast evaluation. We review and compare different approaches to construct weighted scoring rules, both in a univariate and multivariate setting, and we leverage existing results on weighted scores to introduce weighted probability integral transform (PIT) histograms, allowing forecast calibration to be assessed conditionally on particular outcomes having occurred. To illustrate the practical benefit afforded by these weighted verification tools, they are employed in a case study to evaluate forecasts for extreme heat events issued by the Swiss Federal Office of Meteorology and Climatology (MeteoSwiss)

    Assessing Predictive Performance: From Precipitation Forecasts over the Tropics to Receiver Operating Characteristic Curves and Back

    Get PDF
    Educated decision making involves two major ingredients: probabilistic forecasts for future events or quantities and an assessment of predictive performance. This thesis focuses on the latter topic and illustrates its importance and implications from both theoretical and applied perspectives. Receiver operating characteristic (ROC) curves are key tools for the assessment of predictions for binary events. Despite their popularity and ubiquitous use, the mathematical understanding of ROC curves is still incomplete. We establish the equivalence between ROC curves and cumulative distribution functions (CDFs) on the unit interval and elucidate the crucial role of concavity in interpreting and modeling ROC curves. Under this essential requirement, the classical binormal ROC model is strongly inhibited in its flexibility and we propose the novel beta ROC model as an alternative. For a class of models that includes the binormal and the beta model, we derive the large sample distribution of the minimum distance estimator. This allows for uncertainty quantification and statistical tests of goodness-of-fit or equal predictive ability. Turning to empirical examples, we analyze the suitability of both models and find empirical evidence for the increased flexibility of the beta model. A freely available software package called betaROC is currently prepared for release for the statistical programming language R. Throughout the tropics, probabilistic forecasts for accumulated precipitation are of economic importance. However, it is largely unknown how skillful current numerical weather prediction (NWP) models are at timescales of one to a few days. For the first time, we systematically assess the quality of nine global operational NWP ensembles for three regions in northern tropical Africa, and verify against station and satellite-based observations and for the monsoon seasons 2007-2014. All examined NWP models are uncalibrated and unreliable, in particular for high probabilities of precipitation, and underperform in the prediction of amount and occurrence of precipitation when compared to a climatological reference forecast. Statistical postprocessing corrects systematic deficiencies and realizes the full potential of ensemble forecasts. Postprocessed forecasts are calibrated and reliable and outperform raw ensemble forecasts in all regions and monsoon seasons. Disappointingly however, they have predictive performance only equal to the climatological reference. This assessment is robust and holds for all examined NWP models, all monsoon seasons, accumulation periods of 1 to 5 days, and station and spatially aggregated satellite-based observations. Arguably, it implies that current NWP ensembles cannot translate information about the atmospheric state into useful information regarding occurrence or amount of precipitation. We suspect convective parameterization as likely cause of the poor performance of NWP ensemble forecasts as it has been shown to be a first-order error source for the realistic representation of organized convection in NWP models. One may ask if the poor performance of NWP ensembles is exclusively confined to northern tropical Africa or if it applies to the tropics in general. In a comprehensive study, we assess the quality of two major NWP ensemble prediction systems (EPSs) for 1 to 5-day accumulated precipitation for ten climatic regions in the tropics and the period 2009-2017. In particular, we investigate their skill regarding the occurrence and amount of precipitation as well as the occurrence of extreme events. Both ensembles exhibit clear calibration problems and are unreliable and overconfident. Nevertheless, they are (slightly) skillful for most climates when compared to the climatological reference, except tropical and northern arid Africa and alpine climates. Statistical postprocessing corrects for the lack of calibration and reliability, and improves forecast quality. Postprocessed ensemble forecasts are skillful for most regions except the above mentioned ones. The lack of NWP forecast skill in tropical and northern arid Africa and alpine climates calls for alternative approaches for the prediction of precipitation. In a pilot study for northern tropical Africa, we investigate whether it is possible to construct skillful statistical models that rely on information about recent rainfall events. We focus on the prediction of the probability of precipitation and find clear evidence for its modulation by recent precipitation events. The spatio-temporal correlation of rainfall coincides with meteorological assumptions, is reasonably pronounced and stable, and allows to construct meaningful statistical forecasts. We construct logistic regression based forecasts that are reliable, have a higher resolution than the climatological reference forecast, and yield an average improvement of 20% for northern tropical Africa and the period 1998-2014

    Evaluation of ‘GLAMEPS’—a proposed multimodel EPS for short range forecasting

    Get PDF
    Grand Limited Area Model Ensemble Prediction System (GLAMEPS) is prepared for pan-European, short-range probabilistic numerical weather prediction of fine synoptic-scale, quasi-hydrostatic atmospheric flows. Four equally sized ensembles are combined: EuroTEPS, a version of the global ECMWF EPS with European target; AladEPS, a downscaling of EuroTEPS using the ALADIN model; HirEPS_K and HirEPS_S, two ensembles using the HIRLAM model nested into EuroTEPS including 3DVar data-assimilation for two control forecasts. A 52-member GLAMEPS thus samples forecast uncertainty by three analysed initial states combined with 12 singular vector-based perturbations, four different models and the stochastic physics tendencies in EuroTEPS. Over a 7-week test period in winter 2008, GLAMEPS produced better results than ECMWF’s EPS with 51 ensemble members. Apart from spatial resolution, the improvement is due to the multimodel combination and to a smaller extent the dedicated EuroTEPS. Ensemble resolution and reliability are both improved. Combining uncalibrated ensembles is seen to produce a better combined ensemble than the best single-model ensemble of the same size, except when one of the single-model ensembles is considerably better than the others. Bayesian Model Averaging improves reliability, but needs further elaboration to account for geographical variations. These conclusions need to be confirmed by long-period evaluations

    Statistical methods for post-processing ensemble weather forecasts

    Get PDF
    Until recent times, weather forecasts were deterministic in nature. For example, a forecast might state ``The temperature tomorrow will be 2020^\circC.'' More recently, however, increasing interest has been paid to the uncertainty associated with such predictions. By quantifying the uncertainty of a forecast, for example with a probability distribution, users can make risk-based decisions. The uncertainty in weather forecasts is typically based upon `ensemble forecasts'. Rather than issuing a single forecast from a numerical weather prediction (NWP) model, ensemble forecasts comprise multiple model runs that differ in either the model physics or initial conditions. Ideally, ensemble forecasts would provide a representative sample of the possible outcomes of the verifying observations. However, due to model biases and inadequate specification of initial conditions, ensemble forecasts are often biased and underdispersed. As a result, estimates of the most likely values of the verifying observations, and the associated forecast uncertainty, are often inaccurate. It is therefore necessary to correct, or post-process ensemble forecasts, using statistical models known as `ensemble post-processing methods'. To this end, this thesis is concerned with the application of statistical methodology in the field of probabilistic weather forecasting, and in particular ensemble post-processing. Using various datasets, we extend existing work and propose the novel use of statistical methodology to tackle several aspects of ensemble post-processing. Our novel contributions to the field are the following. In chapter~3 we present a comparison study for several post-processing methods, with a focus on probabilistic forecasts for extreme events. We find that the benefits of ensemble post-processing are larger for forecasts of extreme events, compared with forecasts of common events. We show that allowing flexible corrections to the biases in ensemble location is important for the forecasting of extreme events. In chapter~4 we tackle the complicated problem of post-processing ensemble forecasts without making distributional assumptions, to produce recalibrated ensemble forecasts without the intermediate step of specifying a probability forecast distribution. We propose a latent variable model, and make a novel application of measurement error models. We show in three case studies that our distribution-free method is competitive with a popular alternative that makes distributional assumptions. We suggest that our distribution-free method could serve as a useful baseline on which forecasters should seek to improve. In chapter~5 we address the subject of parameter uncertainty in ensemble post-processing. As in all parametric statistical models, the parameter estimates are subject to uncertainty. We approximate the distribution of model parameters by bootstrap resampling, and demonstrate improvements in forecast skill by incorporating this additional source of uncertainty in to out-of-sample probability forecasts. In chapter~6 we use model diagnostic tools to determine how specific post-processing models may be improved. We subsequently introduce bias correction schemes that move beyond the standard linear schemes employed in the literature and in practice, particularly in the case of correcting ensemble underdispersion. Finally, we illustrate the complicated problem of assessing the skill of ensemble forecasts whose members are dependent, or correlated. We show that dependent ensemble members can result in surprising conclusions when employing standard measures of forecast skill

    Ensemble prediction for nowcasting with a convection-permitting model—I: description of the system and the impact of radar-derived surface precipitation rates

    Get PDF
    A key strategy to improve the skill of quantitative predictions of precipitation, as well as hazardous weather such as severe thunderstorms and flash floods is to exploit the use of observations of convective activity (e.g. from radar). In this paper, a convection-permitting ensemble prediction system (EPS) aimed at addressing the problems of forecasting localized weather events with relatively short predictability time scale and based on a 1.5 km grid-length version of the Met Office Unified Model is presented. Particular attention is given to the impact of using predicted observations of radar-derived precipitation intensity in the ensemble transform Kalman filter (ETKF) used within the EPS. Our initial results based on the use of a 24-member ensemble of forecasts for two summer case studies show that the convective-scale EPS produces fairly reliable forecasts of temperature, horizontal winds and relative humidity at 1 h lead time, as evident from the inspection of rank histograms. On the other hand, the rank histograms seem also to show that the EPS generates too much spread for forecasts of (i) surface pressure and (ii) surface precipitation intensity. These may indicate that for (i) the value of surface pressure observation error standard deviation used to generate surface pressure rank histograms is too large and for (ii) may be the result of non-Gaussian precipitation observation errors. However, further investigations are needed to better understand these findings. Finally, the inclusion of predicted observations of precipitation from radar in the 24-member EPS considered in this paper does not seem to improve the 1-h lead time forecast skill
    corecore