115 research outputs found

    Scalable Speech Coding for IP Networks

    Get PDF
    The emergence of Voice over Internet Protocol (VoIP) has posed new challenges to the development of speech codecs. The key issue of transporting real-time voice packet over IP networks is the lack of guarantee for reasonable speech quality due to packet delay or loss. Most of the widely used narrowband codecs depend on the Code Excited Linear Prediction (CELP) coding technique. The CELP technique utilizes the long-term prediction across the frame boundaries and therefore causes error propagation in the case of packet loss and need to transmit redundant information in order to mitigate the problem. The internet Low Bit-rate Codec (iLBC) employs the frame-independent coding and therefore inherently possesses high robustness to packet loss. However, the original iLBC lacks in some of the key features of speech codecs for IP networks: Rate flexibility, Scalability, and Wideband support. This dissertation presents novel scalable narrowband and wideband speech codecs for IP networks using the frame independent coding scheme based on the iLBC. The rate flexibility is added to the iLBC by employing the discrete cosine transform (DCT) and iii the scalable algebraic vector quantization (AVQ) and by allocating different number of bits to the AVQ. The bit-rate scalability is obtained by adding the enhancement layer to the core layer of the multi-rate iLBC. The enhancement layer encodes the weighted iLBC coding error in the modified DCT (MDCT) domain. The proposed wideband codec employs the bandwidth extension technique to extend the capabilities of existing narrowband codecs to provide wideband coding functionality. The wavelet transform is also used to further enhance the performance of the proposed codec. The performance evaluation results show that the proposed codec provides high robustness to packet loss and achieves equivalent or higher speech quality than state-of-the-art codecs under the clean channel condition

    Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web

    Get PDF
    The Internet Protocol (IP) environment poses two relevant sources of distortion to the speech recognition problem: lossy speech coding and packet loss. In this paper, we propose a new front-end for speech recognition over IP networks. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bit stream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant benefits. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion due to the encoding-decoding process. Second, when packet loss occurs, our front-end becomes more effective since it is not constrained to the error handling mechanism of the codec. We have considered the ITU G.723.1 standard codec, which is one of the most preponderant coding algorithms in voice over IP (VoIP) and compared the proposed front-end with the conventional approach in two automatic speech recognition (ASR) tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated packet loss rates. Furthermore, the improvement is higher as network conditions worsen.Publicad

    Frequency-warped autoregressive modeling and filtering

    Get PDF
    This thesis consists of an introduction and nine articles. The articles are related to the application of frequency-warping techniques to audio signal processing, and in particular, predictive coding of wideband audio signals. The introduction reviews the literature and summarizes the results of the articles. Frequency-warping, or simply warping techniques are based on a modification of a conventional signal processing system so that the inherent frequency representation in the system is changed. It is demonstrated that this may be done for basically all traditional signal processing algorithms. In audio applications it is beneficial to modify the system so that the new frequency representation is close to that of human hearing. One of the articles is a tutorial paper on the use of warping techniques in audio applications. Majority of the articles studies warped linear prediction, WLP, and its use in wideband audio coding. It is proposed that warped linear prediction would be particularly attractive method for low-delay wideband audio coding. Warping techniques are also applied to various modifications of classical linear predictive coding techniques. This was made possible partly by the introduction of a class of new implementation techniques for recursive filters in one of the articles. The proposed implementation algorithm for recursive filters having delay-free loops is a generic technique. This inspired to write an article which introduces a generalized warped linear predictive coding scheme. One example of the generalized approach is a linear predictive algorithm using almost logarithmic frequency representation.reviewe

    Improving the robustness of CELP-like speech decoders using late-arrival packets information : application to G.729 standard in VoIP

    Get PDF
    L'utilisation de la voix sur Internet est une nouvelle tendance dans Ie secteur des télécommunications et de la réseautique. La paquetisation des données et de la voix est réalisée en utilisant Ie protocole Internet (IP). Plusieurs codecs existent pour convertir la voix codée en paquets. La voix codée est paquetisée et transmise sur Internet. À la réception, certains paquets sont soit perdus, endommages ou arrivent en retard. Ceci est cause par des contraintes telles que Ie délai («jitter»), la congestion et les erreurs de réseau. Ces contraintes dégradent la qualité de la voix. Puisque la transmission de la voix est en temps réel, Ie récepteur ne peut pas demander la retransmission de paquets perdus ou endommages car ceci va causer plus de délai. Au lieu de cela, des méthodes de récupération des paquets perdus (« concealment ») s'appliquent soit à l'émetteur soit au récepteur pour remplacer les paquets perdus ou endommages. Ce projet vise à implémenter une méthode innovatrice pour améliorer Ie temps de convergence suite a la perte de paquets au récepteur d'une application de Voix sur IP. La méthode a déjà été intégrée dans un codeur large-bande (AMR-WB) et a significativement amélioré la qualité de la voix en présence de <<jitter » dans Ie temps d'arrivée des trames au décodeur. Dans ce projet, la même méthode sera intégrée dans un codeur a bande étroite (ITU-T G.729) qui est largement utilise dans les applications de voix sur IP. Le codeur ITU-T G.729 défini des standards pour coder et décoder la voix a 8 kb/s en utilisant 1'algorithme CS-CELP (Conjugate Stmcture Algebraic Code-Excited Linear Prediction).Abstract: Voice over Internet applications is the new trend in telecommunications and networking industry today. Packetizing data/voice is done using the Internet protocol (IP). Various codecs exist to convert the raw voice data into packets. The coded and packetized speech is transmitted over the Internet. At the receiving end some packets are either lost, damaged or arrive late. This is due to constraints such as network delay (fitter), network congestion and network errors. These constraints degrade the quality of speech. Since voice transmission is in real-time, the receiver can not request the retransmission of lost or damaged packets as this will cause more delay. Instead, concealment methods are applied either at the transmitter side (coder-based) or at the receiver side (decoder-based) to replace these lost or late-arrival packets. This work attempts to implement a novel method for improving the recovery time of concealed speech The method has already been integrated in a wideband speech coder (AMR-WB) and significantly improved the quality of speech in the presence of jitter in the arrival time of speech frames at the decoder. In this work, the same method will be integrated in a narrowband speech coder (ITU-T G.729) that is widely used in VoIP applications. The ITUT G.729 coder defines the standards for coding and decoding speech at 8 kb/s using Conjugate Structure Algebraic Code-Excited Linear Prediction (CS-CELP) Algorithm

    Gaussian Mixture Model-based Quantization of Line Spectral Frequencies for Adaptive Multirate Speech Codec

    Get PDF
    In this paper, we investigate the use of a Gaussian MixtureModel (GMM)-based quantizer for quantization of the Line Spectral Frequencies (LSFs) in the Adaptive Multi-Rate (AMR) speech codec. We estimate the parametric GMM model of the probability density function (pdf) for the prediction error (residual) of mean-removed LSF parameters that are used in the AMR codec for speech spectral envelope representation. The studied GMM-based quantizer is based on transform coding using Karhunen-Loeve transform (KLT) and transform domain scalar quantizers (SQ) individually designed for each Gaussian mixture. We have investigated the applicability of such a quantization scheme in the existing AMR codec by solely replacing the AMR LSF quantization algorithm segment. The main novelty in this paper lies in applying and adapting the entropy constrained (EC) coding for fixed-rate scalar quantization of transformed residuals thereby allowing for better adaptation to the local statistics of the source. We study and evaluate the compression efficiency, computational complexity and memory requirements of the proposed algorithm. Experimental results show that the GMM-based EC quantizer provides better rate/distortion performance than the quantization schemes used in the referent AMR codec by saving up to 7.32 bits/frame at much lower rate-independent computational complexity and memory requirements

    Automatic Speech Codec Identification with Applications to Tampering Detection of Speech Recordings

    Get PDF
    In this work many versions of CELP codecs are explored, and an observation is made that different codebooks are used to encode noisy part of residual. Taking advantage of noise patterns they generated, an algorithm was proposed to detect GSM-AMR,EFR,HR and SILK codecs. Another partly knowledge-based and partly data driven algorithm is also proposed to improve the performance for SILK. Then it's extended to identify subframe offset to do tampering detection of cellphone speech recordings

    Comparison of Wideband Earpiece Integrations in Mobile Phone

    Get PDF
    Perinteisesti puhelinverkoissa välitettävä puhe on ollut kapeakaistaista, kaistan ollessa 300 - 3400 Hz. Voidaan kuitenkin olettaa, että laajakaistaiset puhepalvelut tulevat saamaan markkinoilla enemmän jalansijaa tulevina vuosina. Tässä lopputyössä esitellään puheenkoodauksen perusteet laajakaistaisen adaptiivisen moninopeuspuhekoodekin (AMR-WB) kanssa. Laajakaistainen puhekoodekki laajentaa puhekaistan 50-7000 Hz käyttäen 16 kHz näytetaajuutta. Käytännössä laajempi kaista tarkoittaa parannuksia puheen ymmärrettävyyteen ja tekee siitä luonnollisemman ja mukavamman kuuloista. Tämän lopputyön päätavoite on vertailla kahden eri laajakaistaisen matkapuhelinkuulokkeen integrointia. Kysymys kuuluu, kuinka paljon käyttäjä hyötyy isommasta kuulokkeesta matkapuhelimessa? Kuulokkeiden suorituskyvyn selvittämiseksi niille tehtiin objektiivisia mittauksia vapaakentässä. Mittauksia tehtiin myös puhelimelle pää- ja torsosimulaattorissa (HATS) johdottamalla kuuloke suoraan vahvistimelle, sekä lisäksi puhelun ollessa aktiivisena GSM ja WCDMA verkoissa. Objektiiviset mittaukset osoittivat kahden eri integroinnin väliset erot kuulokkeiden taajuusvasteessa ja särössä erityisesti matalilla taajuuksilla. Lopuksi tehtiin kuuntelukoe tarkoituksena selvittää erottaako loppukäyttäjä pienemmän ja isomman kuulokkeen välistä eroa käyttäen kapeakaistaisia ja laajakaistaisia puhelinääninäytteitä. Kuuntelukokeen tuloksien pohjalta voidaan sanoa, että käyttäjä erottaa kahden eri integroinnin erot ja miespuhuja hyötyy naispuhujaa enemmän isommasta kuulokkeesta laajakaistaisella puhekoodekilla.The speech in telecommunication networks has been traditionally narrowband ranging from 300 Hz to 3400 Hz. It can be expected that wideband speech call services will increase their foothold in the markets during the coming years. In this thesis speech coding basics with adaptive multirate wideband (AMR-WB) are introduced. The wideband codec widens the speech band to new range from 50 Hz to 7000 Hz using 16 kHz sampling frequency. In practice the wider band means improvements to speech intelligibility and makes it more natural and comfortable to listen to. The main focus of this thesis work is to compare two different wideband earpiece integrations. The question is how much the end-user will benefit from using a larger earpiece in a mobile phone? To find out speaker performance, objective measurements in free field were done for the earpiece modules. Measurements were performed also for the phone on head and torso simulator (HATS) by wiring the earpieces directly to a power amplifier and with over the air on GSM and WCDMA networks. The results of objective measurements showed differences between the earpiece integrations especially on low frequencies in frequency response and distortion. Finally the subjective listening test is done for comparison to see if the end-user notices the difference between smaller and larger earpiece integrations using narrowband and wideband speech samples. Based on these subjective test results it can be said that the user can differentiate between two different integrations and that a male speaker benefits more from a larger earpiece than a female speaker

    Performance of a low data rate speech codec for land-mobile satellite communications

    Get PDF
    In an effort to foster the development of new technologies for the emerging land mobile satellite communications services, JPL funded two development contracts in 1984: one to the Univ. of Calif., Santa Barbara and the other to the Georgia Inst. of Technology, to develop algorithms and real time hardware for near toll quality speech compression at 4800 bits per second. Both universities have developed and delivered speech codecs to JPL, and the UCSB codec was extensively tested by JPL in a variety of experimental setups. The basic UCSB speech codec algorithms and the test results of the various experiments performed with this codec are presented

    Frequency Domain Methods for Coding the Linear Predictive Residual of Speech Signals

    Get PDF
    The most frequently used speech coding paradigm is ACELP, famous because it encodes speech with high quality, while consuming a small bandwidth. ACELP performs linear prediction filtering in order to eliminate the effect of the spectral envelope from the signal. The noise-like excitation is then encoded using algebraic codebooks. The search of this codebook, however, can not be performed optimally with conventional encoders due to the correlation between their samples. Because of this, more complex algorithms are required in order to maintain the quality. Four different transformation algorithms have been implemented (DCT, DFT, Eigenvalue decomposition and Vandermonde decomposition) in order to decorrelate the samples of the innovative excitation in ACELP. These transformations have been integrated in the ACELP of the EVS codec. The transformed innovative excitation is coded using the envelope based arithmetic coder. Objective and subjective tests have been carried out to evaluate the quality of the encoding, the degree of decorrelation achieved by the transformations and the computational complexity of the algorithms
    • …
    corecore