Hindcasting experiments (conducting a model forecast for a time period in
which observational data are available) are being undertaken increasingly
often by the integrated assessment model (IAM) community, across many scales
of models. When they are undertaken, the results are often evaluated using
global aggregates or otherwise highly aggregated skill scores that mask
deficiencies. We select a set of deviation-based measures that can be applied
on different spatial scales (regional versus global) to make evaluating the
large number of variable–region combinations in IAMs more tractable. We also
identify performance benchmarks for these measures, based on the statistics
of the observational dataset, that allow a model to be evaluated in absolute
terms rather than relative to the performance of other models at similar
tasks. An ideal evaluation method for hindcast experiments in IAMs would
feature both absolute measures for evaluation of a single experiment for a
single model and relative measures to compare the results of multiple
experiments for a single model or the same experiment repeated across
multiple models, such as in community intercomparison studies. The
performance benchmarks highlight the use of this scheme for model evaluation
in absolute terms, providing information about the reasons a model may
perform poorly on a given measure and therefore identifying opportunities for
improvement. To demonstrate the use of and types of results possible with the
evaluation method, the measures are applied to the results of a past hindcast
experiment focusing on land allocation in the Global Change Assessment Model
(GCAM) version 3.0. The question of how to more holistically evaluate models
as complex as IAMs is an area for future research. We find quantitative
evidence that global aggregates alone are not sufficient for evaluating IAMs
that require global supply to equal global demand at each time period, such
as GCAM. The results of this work indicate it is unlikely that a single
evaluation measure for all variables in an IAM exists, and therefore sector-by-sector evaluation may be necessary
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.