1,229 research outputs found
Predicting the Quality of Synthesized and Natural Speech Impaired by Packet Loss and Coding Using PESQ and P.563 Models
This paper investigates the impact of independent and dependent losses and coding on speech quality predictions
provided by PESQ (also known as ITU-T P.862) and P.563 models, when both naturally-produced and synthesized
speech are used. Two synthesized speech samples generated with two different Text-to-Speech systems
and one naturally-produced sample are investigated. In addition, we assess the variability of PESQ’s and P.563’s
predictions with respect to the type of speech used (naturally-produced or synthesized) and loss conditions as
well as their accuracy, by comparing the predictions with subjective assessments. The results show that there is
no difference between the impact of packet loss on naturally-produced speech and synthesized speech. On the
other hand, the impact of coding is different for the two types of stimuli. In addition, synthesized speech seems
to be insensitive to degradations provided by most of the codecs investigated here. The reasons for those findings
are particularly discussed. Finally, it is concluded that both models are capable of predicting the quality of transmitted
synthesized speech under the investigated conditions to a certain degree. As expected, PESQ achieves the
best performance over almost all of the investigated conditions
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
- …