53 research outputs found
DeepVoCoder: A CNN model for compression and coding of narrow band speech
This paper proposes a convolutional neural network (CNN)-based encoder model to compress and code speech signal directly from raw input speech. Although the model can synthesize wideband speech by implicit bandwidth extension, narrowband is preferred for IP telephony and telecommunications purposes. The model takes time domain speech samples as inputs and encodes them using a cascade of convolutional filters in multiple layers, where pooling is applied after some layers to downsample the encoded speech by half. The final bottleneck layer of the CNN encoder provides an abstract and compact representation of the speech signal. In this paper, it is demonstrated that this compact representation is sufficient to reconstruct the original speech signal in high quality using the CNN decoder. This paper also discusses the theoretical background of why and how CNN may be used for end-to-end speech compression and coding. The complexity, delay, memory requirements, and bit rate versus quality are discussed in the experimental results.Web of Science7750897508
A code excited linear predictive coder: using a moments algorithm
A speech coding algorithm was developed which was based on a new method of selecting the excitation signal from a codebook of residual error sequences. The residual error sequences in the codebook were generated from 512 frames of real speech signals. L.P.C. inverse filtering was used to obtain the residual signal.
Each residual error signal was assigned an index. The index was generated using a moments algorithm. These indices were stored on a Graded Binary Tree. A Binary Search was then used to select the correct index. The use of a Graded Binary Tree in the coding algorithm reduced the search time.
The algorithm faithfully reproduced the original speech when the test residual error signal was chosen from the training data. When the test residual error signal was outside the training data, synthetic speech of a recognisable quality was produced.
Finally, the fundamentals of speech coders are discussed in detail and various developments are suggested
Perceptual models in speech quality assessment and coding
The ever-increasing demand for good communications/toll
quality speech has created a renewed interest into the
perceptual impact of rate compression. Two general areas are
investigated in this work, namely speech quality assessment
and speech coding.
In the field of speech quality assessment, a model is
developed which simulates the processing stages of the
peripheral auditory system. At the output of the model a
"running" auditory spectrum is obtained. This represents
the auditory (spectral) equivalent of any acoustic sound such
as speech. Auditory spectra from coded speech segments serve
as inputs to a second model. This model simulates the
information centre in the brain which performs the speech
quality assessment. [Continues.
- …