Experiments to understand the sensorimotor neural interactions in the human
cortical speech system support the existence of a bidirectional flow of
interactions between the auditory and motor regions. Their key function is to
enable the brain to `learn' how to control the vocal tract for speech
production. This idea is the impetus for the recently proposed "MirrorNet", a
constrained autoencoder architecture. In this paper, the MirrorNet is applied
to learn, in an unsupervised manner, the controls of a specific audio
synthesizer (DIVA) to produce melodies only from their auditory spectrograms.
The results demonstrate how the MirrorNet discovers the synthesizer parameters
to generate the melodies that closely resemble the original and those of unseen
melodies, and even determine the best set parameters to approximate renditions
of complex piano melodies generated by a different synthesizer. This
generalizability of the MirrorNet illustrates its potential to discover from
sensory data the controls of arbitrary motor-plants