The findings of the 2023 AAPM Grand Challenge on Deep Generative Modeling for
Learning Medical Image Statistics are reported in this Special Report. The goal
of this challenge was to promote the development of deep generative models
(DGMs) for medical imaging and to emphasize the need for their domain-relevant
assessment via the analysis of relevant image statistics. As part of this Grand
Challenge, a training dataset was developed based on 3D anthropomorphic breast
phantoms from the VICTRE virtual imaging toolbox. A two-stage evaluation
procedure consisting of a preliminary check for memorization and image quality
(based on the Frechet Inception distance (FID)), and a second stage evaluating
the reproducibility of image statistics corresponding to domain-relevant
radiomic features was developed. A summary measure was employed to rank the
submissions. Additional analyses of submissions was performed to assess DGM
performance specific to individual feature families, and to identify various
artifacts. 58 submissions from 12 unique users were received for this
Challenge. The top-ranked submission employed a conditional latent diffusion
model, whereas the joint runners-up employed a generative adversarial network,
followed by another network for image superresolution. We observed that the
overall ranking of the top 9 submissions according to our evaluation method (i)
did not match the FID-based ranking, and (ii) differed with respect to
individual feature families. Another important finding from our additional
analyses was that different DGMs demonstrated similar kinds of artifacts. This
Grand Challenge highlighted the need for domain-specific evaluation to further
DGM design as well as deployment. It also demonstrated that the specification
of a DGM may differ depending on its intended use