In this study, we investigate the feasibility of utilizing state-of-the-art
image perceptual metrics for evaluating audio signals by representing them as
spectrograms. The encouraging outcome of the proposed approach is based on the
similarity between the neural mechanisms in the auditory and visual pathways.
Furthermore, we customise one of the metrics which has a psychoacoustically
plausible architecture to account for the peculiarities of sound signals. We
evaluate the effectiveness of our proposed metric and several baseline metrics
using a music dataset, with promising results in terms of the correlation
between the metrics and the perceived quality of audio as rated by human
evaluators