232 research outputs found

    New techniques in signal coding

    Get PDF

    Improving the robustness of CELP-like speech decoders using late-arrival packets information : application to G.729 standard in VoIP

    Get PDF
    L'utilisation de la voix sur Internet est une nouvelle tendance dans Ie secteur des tĂ©lĂ©communications et de la rĂ©seautique. La paquetisation des donnĂ©es et de la voix est rĂ©alisĂ©e en utilisant Ie protocole Internet (IP). Plusieurs codecs existent pour convertir la voix codĂ©e en paquets. La voix codĂ©e est paquetisĂ©e et transmise sur Internet. À la rĂ©ception, certains paquets sont soit perdus, endommages ou arrivent en retard. Ceci est cause par des contraintes telles que Ie dĂ©lai («jitter»), la congestion et les erreurs de rĂ©seau. Ces contraintes dĂ©gradent la qualitĂ© de la voix. Puisque la transmission de la voix est en temps rĂ©el, Ie rĂ©cepteur ne peut pas demander la retransmission de paquets perdus ou endommages car ceci va causer plus de dĂ©lai. Au lieu de cela, des mĂ©thodes de rĂ©cupĂ©ration des paquets perdus (« concealment ») s'appliquent soit Ă  l'Ă©metteur soit au rĂ©cepteur pour remplacer les paquets perdus ou endommages. Ce projet vise Ă  implĂ©menter une mĂ©thode innovatrice pour amĂ©liorer Ie temps de convergence suite a la perte de paquets au rĂ©cepteur d'une application de Voix sur IP. La mĂ©thode a dĂ©jĂ  Ă©tĂ© intĂ©grĂ©e dans un codeur large-bande (AMR-WB) et a significativement amĂ©liorĂ© la qualitĂ© de la voix en prĂ©sence de <<jitter » dans Ie temps d'arrivĂ©e des trames au dĂ©codeur. Dans ce projet, la mĂȘme mĂ©thode sera intĂ©grĂ©e dans un codeur a bande Ă©troite (ITU-T G.729) qui est largement utilise dans les applications de voix sur IP. Le codeur ITU-T G.729 dĂ©fini des standards pour coder et dĂ©coder la voix a 8 kb/s en utilisant 1'algorithme CS-CELP (Conjugate Stmcture Algebraic Code-Excited Linear Prediction).Abstract: Voice over Internet applications is the new trend in telecommunications and networking industry today. Packetizing data/voice is done using the Internet protocol (IP). Various codecs exist to convert the raw voice data into packets. The coded and packetized speech is transmitted over the Internet. At the receiving end some packets are either lost, damaged or arrive late. This is due to constraints such as network delay (fitter), network congestion and network errors. These constraints degrade the quality of speech. Since voice transmission is in real-time, the receiver can not request the retransmission of lost or damaged packets as this will cause more delay. Instead, concealment methods are applied either at the transmitter side (coder-based) or at the receiver side (decoder-based) to replace these lost or late-arrival packets. This work attempts to implement a novel method for improving the recovery time of concealed speech The method has already been integrated in a wideband speech coder (AMR-WB) and significantly improved the quality of speech in the presence of jitter in the arrival time of speech frames at the decoder. In this work, the same method will be integrated in a narrowband speech coder (ITU-T G.729) that is widely used in VoIP applications. The ITUT G.729 coder defines the standards for coding and decoding speech at 8 kb/s using Conjugate Structure Algebraic Code-Excited Linear Prediction (CS-CELP) Algorithm

    DeepVoCoder: A CNN model for compression and coding of narrow band speech

    Get PDF
    This paper proposes a convolutional neural network (CNN)-based encoder model to compress and code speech signal directly from raw input speech. Although the model can synthesize wideband speech by implicit bandwidth extension, narrowband is preferred for IP telephony and telecommunications purposes. The model takes time domain speech samples as inputs and encodes them using a cascade of convolutional filters in multiple layers, where pooling is applied after some layers to downsample the encoded speech by half. The final bottleneck layer of the CNN encoder provides an abstract and compact representation of the speech signal. In this paper, it is demonstrated that this compact representation is sufficient to reconstruct the original speech signal in high quality using the CNN decoder. This paper also discusses the theoretical background of why and how CNN may be used for end-to-end speech compression and coding. The complexity, delay, memory requirements, and bit rate versus quality are discussed in the experimental results.Web of Science7750897508

    Comparison of Wideband Earpiece Integrations in Mobile Phone

    Get PDF
    Perinteisesti puhelinverkoissa vÀlitettÀvÀ puhe on ollut kapeakaistaista, kaistan ollessa 300 - 3400 Hz. Voidaan kuitenkin olettaa, ettÀ laajakaistaiset puhepalvelut tulevat saamaan markkinoilla enemmÀn jalansijaa tulevina vuosina. TÀssÀ lopputyössÀ esitellÀÀn puheenkoodauksen perusteet laajakaistaisen adaptiivisen moninopeuspuhekoodekin (AMR-WB) kanssa. Laajakaistainen puhekoodekki laajentaa puhekaistan 50-7000 Hz kÀyttÀen 16 kHz nÀytetaajuutta. KÀytÀnnössÀ laajempi kaista tarkoittaa parannuksia puheen ymmÀrrettÀvyyteen ja tekee siitÀ luonnollisemman ja mukavamman kuuloista. TÀmÀn lopputyön pÀÀtavoite on vertailla kahden eri laajakaistaisen matkapuhelinkuulokkeen integrointia. Kysymys kuuluu, kuinka paljon kÀyttÀjÀ hyötyy isommasta kuulokkeesta matkapuhelimessa? Kuulokkeiden suorituskyvyn selvittÀmiseksi niille tehtiin objektiivisia mittauksia vapaakentÀssÀ. Mittauksia tehtiin myös puhelimelle pÀÀ- ja torsosimulaattorissa (HATS) johdottamalla kuuloke suoraan vahvistimelle, sekÀ lisÀksi puhelun ollessa aktiivisena GSM ja WCDMA verkoissa. Objektiiviset mittaukset osoittivat kahden eri integroinnin vÀliset erot kuulokkeiden taajuusvasteessa ja sÀrössÀ erityisesti matalilla taajuuksilla. Lopuksi tehtiin kuuntelukoe tarkoituksena selvittÀÀ erottaako loppukÀyttÀjÀ pienemmÀn ja isomman kuulokkeen vÀlistÀ eroa kÀyttÀen kapeakaistaisia ja laajakaistaisia puhelinÀÀninÀytteitÀ. Kuuntelukokeen tuloksien pohjalta voidaan sanoa, ettÀ kÀyttÀjÀ erottaa kahden eri integroinnin erot ja miespuhuja hyötyy naispuhujaa enemmÀn isommasta kuulokkeesta laajakaistaisella puhekoodekilla.The speech in telecommunication networks has been traditionally narrowband ranging from 300 Hz to 3400 Hz. It can be expected that wideband speech call services will increase their foothold in the markets during the coming years. In this thesis speech coding basics with adaptive multirate wideband (AMR-WB) are introduced. The wideband codec widens the speech band to new range from 50 Hz to 7000 Hz using 16 kHz sampling frequency. In practice the wider band means improvements to speech intelligibility and makes it more natural and comfortable to listen to. The main focus of this thesis work is to compare two different wideband earpiece integrations. The question is how much the end-user will benefit from using a larger earpiece in a mobile phone? To find out speaker performance, objective measurements in free field were done for the earpiece modules. Measurements were performed also for the phone on head and torso simulator (HATS) by wiring the earpieces directly to a power amplifier and with over the air on GSM and WCDMA networks. The results of objective measurements showed differences between the earpiece integrations especially on low frequencies in frequency response and distortion. Finally the subjective listening test is done for comparison to see if the end-user notices the difference between smaller and larger earpiece integrations using narrowband and wideband speech samples. Based on these subjective test results it can be said that the user can differentiate between two different integrations and that a male speaker benefits more from a larger earpiece than a female speaker

    An investigation into glottal waveform based speech coding

    Get PDF
    Coding of voiced speech by extraction of the glottal waveform has shown promise in improving the efficiency of speech coding systems. This thesis describes an investigation into the performance of such a system. The effect of reverberation on the radiation impedance at the lips is shown to be negligible under normal conditions. Also, the accuracy of the Image Method for adding artificial reverberation to anechoic speech recordings is established. A new algorithm, Pre-emphasised Maximum Likelihood Epoch Detection (PMLED), for Glottal Closure Instant detection is proposed. The algorithm is tested on natural speech and is shown to be both accurate and robust. Two techniques for giottai waveform estimation, Closed Phase Inverse Filtering (CPIF) and Iterative Adaptive Inverse Filtering (IAIF), are compared. In tandem with an LF model fitting procedure, both techniques display a high degree of accuracy However, IAIF is found to be slightly more robust. Based on these results, a Glottal Excited Linear Predictive (GELP) coding system for voiced speech is proposed and tested. Using a differential LF parameter quantisation scheme, the system achieves speech quality similar to that of U S Federal Standard 1016 CELP at a lower mean bit rate while incurring no extra delay

    Scalable Speech Coding for IP Networks

    Get PDF
    The emergence of Voice over Internet Protocol (VoIP) has posed new challenges to the development of speech codecs. The key issue of transporting real-time voice packet over IP networks is the lack of guarantee for reasonable speech quality due to packet delay or loss. Most of the widely used narrowband codecs depend on the Code Excited Linear Prediction (CELP) coding technique. The CELP technique utilizes the long-term prediction across the frame boundaries and therefore causes error propagation in the case of packet loss and need to transmit redundant information in order to mitigate the problem. The internet Low Bit-rate Codec (iLBC) employs the frame-independent coding and therefore inherently possesses high robustness to packet loss. However, the original iLBC lacks in some of the key features of speech codecs for IP networks: Rate flexibility, Scalability, and Wideband support. This dissertation presents novel scalable narrowband and wideband speech codecs for IP networks using the frame independent coding scheme based on the iLBC. The rate flexibility is added to the iLBC by employing the discrete cosine transform (DCT) and iii the scalable algebraic vector quantization (AVQ) and by allocating different number of bits to the AVQ. The bit-rate scalability is obtained by adding the enhancement layer to the core layer of the multi-rate iLBC. The enhancement layer encodes the weighted iLBC coding error in the modified DCT (MDCT) domain. The proposed wideband codec employs the bandwidth extension technique to extend the capabilities of existing narrowband codecs to provide wideband coding functionality. The wavelet transform is also used to further enhance the performance of the proposed codec. The performance evaluation results show that the proposed codec provides high robustness to packet loss and achieves equivalent or higher speech quality than state-of-the-art codecs under the clean channel condition

    Speech coding at medium bit rates using analysis by synthesis techniques

    Get PDF
    Speech coding at medium bit rates using analysis by synthesis technique

    Quantisation mechanisms in multi-protoype waveform coding

    Get PDF
    Prototype Waveform Coding is one of the most promising methods for speech coding at low bit rates over telecommunications networks. This thesis investigates quantisation mechanisms in Multi-Prototype Waveform (MPW) coding, and two prototype waveform quantisation algorithms for speech coding at bit rates of 2.4kb/s are proposed. Speech coders based on these algorithms have been found to be capable of producing coded speech with equivalent perceptual quality to that generated by the US 1016 Federal Standard CELP-4.8kb/s algorithm. The two proposed prototype waveform quantisation algorithms are based on Prototype Waveform Interpolation (PWI). The first algorithm is in an open loop architecture (Open Loop Quantisation). In this algorithm, the speech residual is represented as a series of prototype waveforms (PWs). The PWs are extracted in both voiced and unvoiced speech, time aligned and quantised and, at the receiver, the excitation is reconstructed by smooth interpolation between them. For low bit rate coding, the PW is decomposed into a slowly evolving waveform (SEW) and a rapidly evolving waveform (REW). The SEW is coded using vector quantisation on both magnitude and phase spectra. The SEW codebook search is based on the best matching of the SEW and the SEW codebook vector. The REW phase spectra is not quantised, but it is recovered using Gaussian noise. The REW magnitude spectra, on the other hand, can be either quantised with a certain update rate or only derived according to SEW behaviours
    • 

    corecore