1 research outputs found
Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM
Nowadays, most of the objective speech quality assessment tools (e.g.,
perceptual evaluation of speech quality (PESQ)) are based on the comparison of
the degraded/processed speech with its clean counterpart. The need of a
"golden" reference considerably restricts the practicality of such assessment
tools in real-world scenarios since the clean reference usually cannot be
accessed. On the other hand, human beings can readily evaluate the speech
quality without any reference (e.g., mean opinion score (MOS) tests), implying
the existence of an objective and non-intrusive (no clean reference needed)
quality assessment mechanism. In this study, we propose a novel end-to-end,
non-intrusive speech quality evaluation model, termed Quality-Net, based on
bidirectional long short-term memory. The evaluation of utterance-level quality
in Quality-Net is based on the frame-level assessment. Frame constraints and
sensible initializations of forget gate biases are applied to learn meaningful
frame-level quality assessment from the utterance-level quality label.
Experimental results show that Quality-Net can yield high correlation to PESQ
(0.9 for the noisy speech and 0.84 for the speech processed by speech
enhancement). We believe that Quality-Net has potential to be used in a wide
variety of applications of speech signal processing.Comment: Accepted in Interspeech201