The problem of how to assess cross-modality medical image synthesis has been
largely unexplored. The most used measures like PSNR and SSIM focus on
analyzing the structural features but neglect the crucial lesion location and
fundamental k-space speciality of medical images. To overcome this problem, we
propose a new metric K-CROSS to spur progress on this challenging problem.
Specifically, K-CROSS uses a pre-trained multi-modality segmentation network to
predict the lesion location, together with a tumor encoder for representing
features, such as texture details and brightness intensities. To further
reflect the frequency-specific information from the magnetic resonance imaging
principles, both k-space features and vision features are obtained and employed
in our comprehensive encoders with a frequency reconstruction penalty. The
structure-shared encoders are designed and constrained with a similarity loss
to capture the intrinsic common structural information for both modalities. As
a consequence, the features learned from lesion regions, k-space, and
anatomical structures are all captured, which serve as our quality evaluators.
We evaluate the performance by constructing a large-scale cross-modality
neuroimaging perceptual similarity (NIRPS) dataset with 6,000 radiologist
judgments. Extensive experiments demonstrate that the proposed method
outperforms other metrics, especially in comparison with the radiologists on
NIRPS