966 research outputs found

    Wavenet based low rate speech coding

    Full text link
    Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.Comment: 5 pages, 2 figure

    Noise-robust detection of peak-clipping in decoded speech

    Get PDF

    Voice quality estimation in combined radio-VoIP networks for dispatching systems

    Get PDF
    The voice quality modelling assessment and planning field is deeply and widely theoretically and practically mastered for common voice communication systems, especially for the public fixed and mobile telephone networks including Next Generation Networks (NGN - internet protocol based networks). This article seeks to contribute voice quality modelling assessment and planning for dispatching communication systems based on Internet Protocol (IP) and private radio networks. The network plan, correction in E-model calculation and default values for the model are presented and discussed

    E-model modification for case of cascade codecs arrangement

    Get PDF
    Speech quality assessment is one of the key matters of voice services and every provider should ensure adequate connection quality to end users. Speech quality has to be measured by a trusted method and results have to correlate with intelligibility and clarity of the speech, as perceived by the listener. It can be achieved by subjective methods but in real life we must rely on objective measurements based on reliable models. One of them is E-model that we can consider as mainly adopted method in IP telephony. This method is based on evaluation of transmission path impairments influencing speech signal, especially delays and packet losses. These parameters which are common in IP network can affect dramatically speech quality. In this article, a new modification of E-model, that takes into consideration the cascade codecs arrangement, is presented. The proposed a correction function improves the current computational non-intrusive approach that is described in recommendation ITU-T G.107, so-called E-model.Scopus551447143
    corecore