The purpose of this study is to investigate how humans interpret musical
scores expressively, and then design machines that sing like humans. We
consider six factors that have a strong influence on the expression of human
singing. The factors are related to the acoustic, phonetic, and musical
features of a real singing signal. Given real singing voices recorded following
the MIDI scores and lyrics, our analysis module can extract the expression
parameters from the real singing signals semi-automatically. The expression
parameters are used to control the singing voice synthesis (SVS) system for
Mandarin Chinese, which is based on the harmonic plus noise model (HNM). The
results of perceptual experiments show that integrating the expression factors
into the SVS system yields a notable improvement in perceptual naturalness,
clearness, and expressiveness. By one-to-one mapping of the real singing signal
and expression controls to the synthesizer, our SVS system can simulate the
interpretation of a real singer with the timbre of a speaker.Comment: 8 pages, technical repor