35 research outputs found
Analysis and improvement of the vector quantization in SELP (Stochastically Excited Linear Prediction)
The Stochastically Excited Linear Prediction (SELP) algorithm is described as a speech coding method employing a two-stage vector quantization. The first stage uses an adaptive codebook which efficiently encodes the periodicity of voiced speech, and the second stage uses a stochastic codebook to encode the remainder of the excitation signal. The adaptive codebook performs well when the pitch period of the speech signal is larger than the frame size. An extension is introduced, which increases its performance for the case that the frame size is longer than the pitch period. The performance of the stochastic stage, which improves with frame length, is shown to be best in those sections of the speech signal where a high level of short-term correlations is present. It can be concluded that the SELP algorithm performs best during voiced speech where the pitch period is longer than the frame length
Performance of a low data rate speech codec for land-mobile satellite communications
In an effort to foster the development of new technologies for the emerging land mobile satellite communications services, JPL funded two development contracts in 1984: one to the Univ. of Calif., Santa Barbara and the other to the Georgia Inst. of Technology, to develop algorithms and real time hardware for near toll quality speech compression at 4800 bits per second. Both universities have developed and delivered speech codecs to JPL, and the UCSB codec was extensively tested by JPL in a variety of experimental setups. The basic UCSB speech codec algorithms and the test results of the various experiments performed with this codec are presented
Estimating the Quality of Digitally Transmitted Speech over Satellite Communication Channels
Analogue speech signal is one of the most natural means used by humans for communication purposes. The
emergence of digital modulation and coding techniques has made the transmission of analogue speech (as
digital content) over various conduits possible, albeit with inevitable signal degradation as a result of errors
inherent in the conversion process. A need naturally arises for determining the quality of speech received at the
information sink, with a view to enhancing its robustness to degradation suffered in transit over the
communication channel. We present in this paper analytic methods of qualitative assessment of the quality of
recovered digitally transmitted speech. A methodology for determining the intelligibility of speech by using
segmental SNR gotten by dividing the speech signal into M integer segments is proposed. This methodology
has the following advantages: a) it allows for assessing the dynamics of change of speech quality in real-time
through statistical modeling, b) it obviates the need for expensive, yet subjective experimental approaches like
MOS, and c) it takes into consideration not only the signal power, but also its spectral characteristics which is a
step above the use of Modulated Noise Reference Units (MNRUs). Using the obtained results, a procedure for
analysis of speech intelligibility by means of statistical modeling is developed
Estimating the Quality of Digitally Transmitted Speech over Satellite Communication Channels
Analogue speech signal is one of the most natural means used by humans for communication purposes. The emergence of digital modulation and coding techniques has made the transmission of analogue speech (as digital content) over various conduits possible, albeit with inevitable signal degradation as a result of errors inherent in the conversion process. A need naturally arises for determining the quality of speech received at the information sink, with a view to enhancing its robustness to degradation suffered in transit over the communication channel. We present in this paper analytic methods of qualitative assessment of the quality of recovered digitally transmitted speech. A methodology for determining the intelligibility of speech by using segmental SNR gotten by dividing the speech signal into M integer segments is proposed. This methodology has the following advantages: a) it allows for assessing the dynamics of change of speech quality in real-time through statistical modeling, b) it obviates the need for expensive, yet subjective experimental approaches like MOS, and c) it takes into consideration not only the signal power, but also its spectral characteristics which is a step above the use of Modulated Noise Reference Units (MNRUs). Using the obtained results, a procedure for analysis of speech intelligibility by means of statistical modeling is developed. Keywords: Speech processing, Mean opinion score, MOS, SNR, PCM, Quantization nois
Étude comparative de filtres perceptuels adaptés à des codeurs audio
Les codeurs audio de haute qualité utilisent souvent un modèle psychoacoustique pour prendre en compte les propriétés de l'oreille. On compare des filtres perceptuels, calculés à partir d'une prédiction linéaire, avec des filtres obtenus avec des seuils de masquage utilisés dans des codeurs de musique. Nous avons remarqué que ces derniers ne donnent pas de meilleurs résultats. Si la démarche la plus naturelle consiste à définir un meilleur modèle psychoacoustique, on propose ici une méthode intermédiaire consistant à donner plus de degrés de liberté à une méthode de type standard, en traitant individuellement les zéros du filtre blanchissant