51 research outputs found

    Measuring forecast performance in the presence of observation error

    Get PDF
    This is the author accepted manuscript. The final version is available from Wiley via the DOI in this record.A new framework is introduced for measuring the performance of probability forecasts when the true value of the predictand is observed with error. In these circumstances, proper scoring rules favour good forecasts of observations rather than of truth and yield scores that vary with the quality of the observations. Proper scoring rules thus can favour forecasters who issue worse forecasts of the truth and can mask real changes in forecast performance if observation quality varies over time. Existing approaches to accounting for observation error provide unsatisfactory solutions to these two problems. A new class of ‘error-corrected’ proper scoring rules is defined that solves both problems by producing unbiased estimates of the scores that would be obtained if the forecasts could be verified against the truth. A general method for constructing error-corrected proper scoring rules is given for the case of categorical predictands, and error-corrected versions of the Dawid-Sebastiani scoring rule are proposed for numerical predictands. The benefits of accounting for observation error in ensemble post-processing and in forecast verification are illustrated in three data examples that include forecasts for the occurrence of tornadoes and of aircraft icing

    Fair scores for ensemble forecasts

    Get PDF
    This is the author accepted manuscript. The final version is available from the publisher via the DOI in this record.The notion of fair scores for ensemble forecasts was introduced recently to reward ensembles whose members behave as though they and the verifying observation are sampled from the same distribution. In the case of forecasting binary outcomes, a characterization is given of a general class of fair scores for ensembles that are interpreted as random samples. This is also used to construct classes of fair scores for ensembles that forecast multi-category and continuous outcomes. The usual Brier, ranked probability, and continuous ranked probability scores for ensemble forecasts are shown to be unfair, while adjusted versions of these scores are shown to be fair. A definition of fairness is also proposed for ensembles whose members are interpreted as being dependent, and it is shown that fair scores exist only for some forms of dependence

    Proper scoring rules for interval probabilistic forecasts

    Get PDF
    This is the author accepted manuscript. The final version is available from Wiley via the DOI in this record.Interval probabilistic forecasts for a binary event are forecasts issued as a range of probabilities for the occurrence of the event, for example, ‘chance of rain: 10-20%’. To verify interval probabilistic forecasts, use can be made of a scoring rule that assigns a score to each forecast-outcome pair. An important requirement for scoring rules, if they are to provide a faithful assessment of a forecaster, is that they be proper, by which is meant that they direct forecasters to issue their true beliefs as their forecasts. Proper scoring rules for probabilistic forecasts issued as precise numbers have been studied extensively. But, applying such a proper scoring rule to, for example, the mid-point of an interval probabilistic forecast, does not, typically, produce a proper scoring rule for interval probabilistic forecasts. Complementing parallel work by other authors, we derive a general characterisation of scoring rules that are proper for interval probabilistic forecasts and from this characterisation we determine particular scoring rules for interval probabilistic forecasts that correspond to the familiar scoring rules used for probabilistic forecasts given as precise probabilities. All the scoring rules we derive apply immediately to rounded probabilistic forecasts, being a special case of interval probabilistic forecasts

    Three recommendations for evaluating climate predictions

    Get PDF
    This is the final version of the article. Available from Wiley / Royal Meteorological Society via the DOI in this record.Evaluation is important for improving climate prediction systems and establishing the credibility of their predictions of the future. This paper shows how the choices that must be made about how to evaluate predictions affect the outcome and ultimately our view of the prediction system's quality. The aim of evaluation is to measure selected attributes of the predictions, but some attributes are susceptible to having their apparent performance artificially inflated by the presence of climate trends, thus rendering past performance an unreliable indicator of future performance. We describe a class of performance measures that are immune to such spurious skill. The way in which an ensemble prediction is interpreted also has strong implications for the apparent performance, so we give recommendations about how evaluation should be tailored to different interpretations. Finally, we explore the role of the timescale of the predictand in evaluation and suggest ways to describe the relationship between timescale and performance. The ideas in this paper are illustrated using decadal temperature hindcasts from the CMIP5 archive.This work was part of the EQUIP project (http://www.equip.leeds.ac.uk) funded by NERC Directed Grant NE/H003509/1. The authors thank Leon Hermanson, Doug Smith and Holger Pohlmann for useful discussion, Helen Hanlon for assistance with obtaining data, and two anonymous reviewers for comments that helped us to improve the presentation of our ideas

    Regime‐dependent statistical post‐processing of ensemble forecasts

    Get PDF
    This is the final version. Available on open access from Wiley via the DOI in this recordA number of realisations of one or more numerical weather prediction (NWP) models, initialised at a variety of initial conditions, compose an ensemble forecast. These forecasts exhibit systematic errors and biases that can be corrected by statistical post‐processing. Post‐processing yields calibrated forecasts by analysing the statistical relationship between historical forecasts and their corresponding observations. This paper aims to extend post‐processing methodology to incorporate atmospheric circulation. The circulation, or flow, is largely responsible for the weather that we experience and it is hypothesised here that relationships between the NWP model and the atmosphere depend upon the prevailing flow. Numerous studies have focussed on the tendency of this flow to reduce to a set of recognisable arrangements, known as regimes, which recur and persist at fixed geographical locations. This dynamical phenomenon allows the circulation to be categorised into a small number of regime states. In a highly idealised model of the atmosphere, the Lorenz ’96 system, ensemble forecasts are subjected to well‐known post‐processing techniques conditional on the system's underlying regime. Two different variables, one of the state variables and one related to the energy of the system, are forecasted and considerable improvements in forecast skill upon standard post‐processing are seen when the distribution of the predictand varies depending on the regime. Advantages of this approach and its inherent challenges are discussed, along with potential extensions for operational forecasters.Natural Environment Research Council (NERC

    Best practices for post-processing ensemble climate forecasts, part I: selecting appropriate recalibration methods

    Get PDF
    ArticleThis is the final version of the article. Available from the publisher via the DOI in this record.This study describes a systematic approach to selecting optimal statistical recalibration methods and hindcast designs for producing reliable probability forecasts on seasonal-to-decadal time scales. A new recalibration method is introduced that includes adjustments for both unconditional and conditional biases in the mean and variance of the forecast distribution, and linear time-dependent bias in the mean. The complexity of the recalibration can be systematically varied by restricting the parameters. Simple recalibration methods may outperform more complex ones given limited training data. A new cross-validation methodology is proposed that allows the comparison of multiple recalibration methods and varying training periods using limited data. Part I considers the effect on forecast skill of varying the recalibration complexity and training period length. The interaction between these factors is analysed for grid box forecasts of annual mean near-surface temperature from the CanCM4 model. Recalibration methods that include conditional adjustment of the ensemble mean outperform simple bias correction by issuing climatological forecasts where the model has limited skill. Trend-adjusted forecasts outperform forecasts without trend adjustment at almost 75% of grid boxes. The optimal training period is around 30 years for trend-adjusted forecasts, and around 15 years otherwise. The optimal training period is strongly related to the length of the optimal climatology. Longer training periods may increase overall performance, but at the expense of very poor forecasts where skill is limited

    An efficient semiparametric maxima estimator of the extremal index

    Get PDF
    The extremal index θ\theta, a measure of the degree of local dependence in the extremes of a stationary process, plays an important role in extreme value analyses. We estimate θ\theta semiparametrically, using the relationship between the distribution of block maxima and the marginal distribution of a process to define a semiparametric model. We show that these semiparametric estimators are simpler and substantially more efficient than their parametric counterparts. We seek to improve efficiency further using maxima over sliding blocks. A simulation study shows that the semiparametric estimators are competitive with the leading estimators. An application to sea-surge heights combines inferences about θ\theta with a standard extreme value analysis of block maxima to estimate marginal quantiles.Comment: 17 pages, 7 figures. Minor edits made to version 1 prior to journal publication. The final publication is available at Springer via http://dx.doi.org/10.1007/s10687-015-0221-

    A conditional decomposition of proper scores: quantifying the sources of information in a forecast

    No full text
    This is the author accepted manuscriptScoring rules condense all information regarding the performance of a probabilistic forecast into a single numerical value, providing a convenient framework with which to objectively rank and compare competing prediction schemes. Although scoring rules provide only a single measure of forecast accuracy, the expected score can be decomposed into components that each assess a distinct aspect of the forecast, such as its calibration or information content. Since these components could depend on several factors, it is useful to evaluate forecast performance under different circumstances; if a forecaster were able to identify situations in which their forecasts perform particularly poorly, then they could more easily develop their forecast strategy to account for these deficiencies. To help forecasters identify such situations, a novel decomposition of scores is introduced that quantifies conditional forecast biases, allowing for a more detailed examination of the sources of information in the forecast. From this, we claim that decompositions of proper scores provide a broad generalisation of the well-known analysis of variance (ANOVA) framework. The new decomposition is applied to the Brier score, which is then used to evaluate forecasts that the daily maximum temperature will exceed a range of thresholds, issued by the Swiss Federal Office of Meteorology and Climatology (MeteoSwiss). We demonstrate how the additional information provided by this decomposition can be used to improve the performance of these forecasts, by identifying appropriate auxiliary information to include within statistical post-processing methodsNatural Environment Research Council (NERC
    corecore