We review the methods used in many papers to evaluate DSGE models by comparing their simulated moments and other features with data equivalents. We note that they select, scale and characterise the shocks without reference to the data; crucially they fail to use the joint distribution of the features under comparison. We illustrate this point by recomputing an assessment of a two-country model in a recent paper; we find that the paper's conclusions are essentially reversed