We describe a new test of photometric redshift performance given a
spectroscopic redshift sample. This test complements the traditional comparison
of redshift {\it differences} by testing whether the probability density
functions p(z) have the correct {\it width}. We test two photometric redshift
codes, BPZ and EAZY, on each of two data sets and find that BPZ is consistently
overconfident (the p(z) are too narrow) while EAZY produces approximately the
correct level of confidence. We show that this is because EAZY models the
uncertainty in its spectral energy distribution templates, and that post-hoc
smoothing of the BPZ p(z) provides a reasonable substitute for detailed
modeling of template uncertainties. Either remedy still leaves a small surplus
of galaxies with spectroscopic redshift very far from the peaks. Thus, better
modeling of low-probability tails will be needed for high-precision work such
as dark energy constraints with the Large Synoptic Survey Telescope and other
large surveys.Comment: accepted to MNRA