1,210 research outputs found

    Scalable Speech Coding for IP Networks

    Get PDF
    The emergence of Voice over Internet Protocol (VoIP) has posed new challenges to the development of speech codecs. The key issue of transporting real-time voice packet over IP networks is the lack of guarantee for reasonable speech quality due to packet delay or loss. Most of the widely used narrowband codecs depend on the Code Excited Linear Prediction (CELP) coding technique. The CELP technique utilizes the long-term prediction across the frame boundaries and therefore causes error propagation in the case of packet loss and need to transmit redundant information in order to mitigate the problem. The internet Low Bit-rate Codec (iLBC) employs the frame-independent coding and therefore inherently possesses high robustness to packet loss. However, the original iLBC lacks in some of the key features of speech codecs for IP networks: Rate flexibility, Scalability, and Wideband support. This dissertation presents novel scalable narrowband and wideband speech codecs for IP networks using the frame independent coding scheme based on the iLBC. The rate flexibility is added to the iLBC by employing the discrete cosine transform (DCT) and iii the scalable algebraic vector quantization (AVQ) and by allocating different number of bits to the AVQ. The bit-rate scalability is obtained by adding the enhancement layer to the core layer of the multi-rate iLBC. The enhancement layer encodes the weighted iLBC coding error in the modified DCT (MDCT) domain. The proposed wideband codec employs the bandwidth extension technique to extend the capabilities of existing narrowband codecs to provide wideband coding functionality. The wavelet transform is also used to further enhance the performance of the proposed codec. The performance evaluation results show that the proposed codec provides high robustness to packet loss and achieves equivalent or higher speech quality than state-of-the-art codecs under the clean channel condition

    ADAPTIVE SPEECH QUALITY IN VOICE-OVER-IP COMMUNICATIONS

    Get PDF
    The quality of VoIP communication relies significantly on the network that transports the voice packets because this network does not usually guarantee the available bandwidth, delay, and loss that are critical for real-time voice traffic. The solution proposed here is to manage the voice-over-IP stream dynamically, changing parameters as needed to assure quality. The main objective of this dissertation is to develop an adaptive speech encoding system that can be applied to conventional (telephony-grade) and wideband voice communications. This comprehensive study includes the investigation and development of three key components of the system. First, to manage VoIP quality dynamically, a tool is needed to measure real-time changes in quality. The E-model, which exists for narrowband communication, is extended to a single computational technique that measures speech quality for narrowband and wideband VoIP codecs. This part of the dissertation also develops important theoretical work in the area of wideband telephony. The second system component is a variable speech-encoding algorithm. Although VoIP performance is affected by multiple codecs and network-based factors, only three factors can be managed dynamically: voice payload size, speech compression and jitter buffer management. Using an existing adaptive jitter-buffer algorithm, voice packet-size and compression variation are studied as they affect speech quality under different network conditions. This study explains the relationships among multiple parameters as they affect speech transmission and its resulting quality. Then, based on these two components, the third system component is a novel adaptive-rate control algorithm that establishes the interaction between a VoIP sender and receiver, and manages voice quality in real-time. Simulations demonstrate that the system provides better average voice quality than traditional VoIP

    Low Delay Sparse and Mixed Excitation CELP Coders for Wideband Speech Coding

    Get PDF
    Code Excited Linear Prediction (CELP) algorithmsare proposed for compression of speech in 8 kHz band atswitched or variable bit rate and algorithmic delay not exceeding2 msec. Two structures of Low-Delay CELP coders are analyzed:Low-delay sparse excitation and mixed excitation CELP. Sparseexcitation is based on MP-MLQ and multilayer models. Mixedexcitation CELP algorithm stems from the narrowband G.728standard. As opposed to G.728 LD-CELP coder, mixed excitationcodebook consists of pseudorandom vectors and sequencesobtained with Long-Term Prediction (LTP). Variable rate codingconsists in maximizing vector dimension while keeping therequired speech quality. Good speech quality (MOS=3.9according to PESQ algorithm) is obtained at average bit rate 33.5kbit/sec

    SIMULASI REKONSTRUKSI SINYAL BICARA PITA LEBAR MENGGUNAKAN METODE PERGESERAN SPEKTRAL

    Get PDF
    ABSTRAKSI: Pada kebanyakan sistem komunikasi, sinyal bicara dikirimkan pada rentang frekuensi pita sempit (narrowband) yang berkisar antara 300 Hz sampai dengan 3400 Hz. Dibandingkan dengan sinyal bicara normal yang pada umumnya memiliki energi perseptual dalam rentang frekuensi yang berkisar antara 50 Hz hingga 8000 Hz, sinyal bicara ini terdengar teredam dan mengurangi kejelasan bicara, khususnya dirasakan pada saat pelafalan huruf frikatif seperti /s/ dan /f/. Hal ini terjadi karena informasi frekuensi tinggi dalam proses pengirimannya dihilangkan. Tidak seperti kanal telephoni, pada pengkodean suara pita lebar (wideband speech coding) batasan frekuensi maksimum yang dikirimkan memungkinkan adanya frekuensi tinggi hingga 8000 Hz, namun kekurangannya dalam hal ini membutuhkan bandwidth yang lebih besar untuk ditransmisikan serta bitrate yang lebih tinggi dibandingkan dengan pengiriman sinyal bicara pita sempit (narrowband).Rekonstruksi sinyal bicara pita lebar adalah sebuah solusi yang dapat mengatasi keterbatasan pengiriman rentang frekuensi pada sinyal bicara pita sempit. Sistem ini bekerja dengan menambahkan sintesa sinyal bicara frekuensi tinggi pada sinyal bicara pita sempit untuk dapat menghasilkan sinyal suara pita lebar (wideband speech). Dalam rekonstruksi sinyal bicara pita lebar ini tidak membutuhkan tambahan bandwidth dan bitrate dalam pengirimannya, karena frekuensi tinggi yang hilang dibentuk dari hubungan antara sinyal bicara pita sempit dan frekuensi tinggi. Rekostruksi wideband juga dapat dikatakan sebagai pasca prosesor pada sisi penerima sinyal bicara narrowband.Tugas akhir ini bertujuan untuk mensimulasikan pelebaran bandwidth dengan menggunakan metode pergeseran spektral dan codebook untuk mengestimasi selubung dari sinyal bicara pada frekuensi tinggi. Algoritma untuk pelebaran badwidth ini dapat dibuktikan dapat berjalan dengan baik, meskipun adanya artefak yang terdengar pada rekonstruksi sinyal. Meski demikian, pengujian secara objektif dan subjektif menyatakan bahwa sistem pelebaran bandwidth dengan pergeseran spektral ini memiliki presentase kecenderungan 50% dari responden dengan mean SNR maksimum 5,13 dB. Parameter teroptimal dalam simulasi rekonstruksi sinyal bicara pita lebar adalah dengan parameter jarak Euclidean, bobot K=1 untuk klasifikasi KNN dan parameter jarak correlation dengan klaster sebanyak 256 untuk pengklasteran Kmean. Waktu komputasi untuk metode pergeseran spektral yaitu 0.144 detik, dengan penggandaan spektral 0.138 detik dan codebook 164,2 detik. Penilaian subjektif nilai DMOS rata-rata untuk rekonstruksi dengan menggunakan metode pergeseran spektral relatif memiliki nilai 3.65 dibanding dengan penggandaan spektral dengan nilai rata-rata DMOS 2. Penelitian dan peningkatan kualitas suara sinyal bicara lebih lanjut tetap dibutuhkan agar nantinya dapat diterapkan pada terminal penerima.Kata Kunci : rekonstruksi wideband, narrowband, codebook, pergeseran spektral.ABSTRACT: In most of the communication systems speech is transmittes in narrowband, containing frequencies from 300 Hz to 3400 Hz. Compared with normal speech which is generally contains a perceptually significant amount of energy up to 8 kHz, this speech has a muffled quality and reduced intelligibility, particularly noticeable in sounds such as /s/ and /f/ . Speech which has been bandlimited to 8 kHz is often coded for this reason, but this requires an increase in the bit rate.Wideband reconstruction is a scheme that adds a synthesized highband signal to narrowband speech to produce a higher quality wideband speech signal. The synthesized highband signal is based entirely on information contained in the narrowband speech, and is thus achieved at zero increase in the bit rate from a coding perspective. Wideband reconstruction can function as a post-processor to any narrowband telephone receiver, or alternatively it can be combined with any narrowband speech coder to produce a very low bit rate wideband speech coder. Applications include higher quality mobile, teleconferencing, and internet telephony.This final project aims to simulate the bandwidth extension system using spectral shifting method for highband excitation, which is used codebook and linear mapping to estimate the envelope of highband. The algorithm for wide band expansion proved to work, though certain unwanted artefacts were introduced in the reconstructed signal. Listening tests confirmed the presence of these unwanted artefacts. Objective and subjective tests demonstrate that wideband speech synthesized using these techniques have presentage in (numerical) 50 % of the respondences with SNR 5,13 dB. Optimum parameter used in this system goes to Euclidean distance with K=1 for KNN classification and correlation distance with 256 clusters for Kmean clustering. Computational time for spectral shifting 0.144 s, for spectral folding 0.138 s and codebook needs 164,2 s. Subjective measurement using DMOS for spectral shifting about 3.65 and for spectral folding 2. However further research and improvement to reach higher quality from this system for implementation are still needed.Keyword: wideband reconstruction, narrowband, highband, codebook, spectral shifting

    Comparison of Wideband Earpiece Integrations in Mobile Phone

    Get PDF
    Perinteisesti puhelinverkoissa välitettävä puhe on ollut kapeakaistaista, kaistan ollessa 300 - 3400 Hz. Voidaan kuitenkin olettaa, että laajakaistaiset puhepalvelut tulevat saamaan markkinoilla enemmän jalansijaa tulevina vuosina. Tässä lopputyössä esitellään puheenkoodauksen perusteet laajakaistaisen adaptiivisen moninopeuspuhekoodekin (AMR-WB) kanssa. Laajakaistainen puhekoodekki laajentaa puhekaistan 50-7000 Hz käyttäen 16 kHz näytetaajuutta. Käytännössä laajempi kaista tarkoittaa parannuksia puheen ymmärrettävyyteen ja tekee siitä luonnollisemman ja mukavamman kuuloista. Tämän lopputyön päätavoite on vertailla kahden eri laajakaistaisen matkapuhelinkuulokkeen integrointia. Kysymys kuuluu, kuinka paljon käyttäjä hyötyy isommasta kuulokkeesta matkapuhelimessa? Kuulokkeiden suorituskyvyn selvittämiseksi niille tehtiin objektiivisia mittauksia vapaakentässä. Mittauksia tehtiin myös puhelimelle pää- ja torsosimulaattorissa (HATS) johdottamalla kuuloke suoraan vahvistimelle, sekä lisäksi puhelun ollessa aktiivisena GSM ja WCDMA verkoissa. Objektiiviset mittaukset osoittivat kahden eri integroinnin väliset erot kuulokkeiden taajuusvasteessa ja särössä erityisesti matalilla taajuuksilla. Lopuksi tehtiin kuuntelukoe tarkoituksena selvittää erottaako loppukäyttäjä pienemmän ja isomman kuulokkeen välistä eroa käyttäen kapeakaistaisia ja laajakaistaisia puhelinääninäytteitä. Kuuntelukokeen tuloksien pohjalta voidaan sanoa, että käyttäjä erottaa kahden eri integroinnin erot ja miespuhuja hyötyy naispuhujaa enemmän isommasta kuulokkeesta laajakaistaisella puhekoodekilla.The speech in telecommunication networks has been traditionally narrowband ranging from 300 Hz to 3400 Hz. It can be expected that wideband speech call services will increase their foothold in the markets during the coming years. In this thesis speech coding basics with adaptive multirate wideband (AMR-WB) are introduced. The wideband codec widens the speech band to new range from 50 Hz to 7000 Hz using 16 kHz sampling frequency. In practice the wider band means improvements to speech intelligibility and makes it more natural and comfortable to listen to. The main focus of this thesis work is to compare two different wideband earpiece integrations. The question is how much the end-user will benefit from using a larger earpiece in a mobile phone? To find out speaker performance, objective measurements in free field were done for the earpiece modules. Measurements were performed also for the phone on head and torso simulator (HATS) by wiring the earpieces directly to a power amplifier and with over the air on GSM and WCDMA networks. The results of objective measurements showed differences between the earpiece integrations especially on low frequencies in frequency response and distortion. Finally the subjective listening test is done for comparison to see if the end-user notices the difference between smaller and larger earpiece integrations using narrowband and wideband speech samples. Based on these subjective test results it can be said that the user can differentiate between two different integrations and that a male speaker benefits more from a larger earpiece than a female speaker

    A General Framework for Analyzing, Characterizing, and Implementing Spectrally Modulated, Spectrally Encoded Signals

    Get PDF
    Fourth generation (4G) communications will support many capabilities while providing universal, high speed access. One potential enabler for these capabilities is software defined radio (SDR). When controlled by cognitive radio (CR) principles, the required waveform diversity is achieved via a synergistic union called CR-based SDR. Research is rapidly progressing in SDR hardware and software venues, but current CR-based SDR research lacks the theoretical foundation and analytic framework to permit efficient implementation. This limitation is addressed here by introducing a general framework for analyzing, characterizing, and implementing spectrally modulated, spectrally encoded (SMSE) signals within CR-based SDR architectures. Given orthogonal frequency division multiplexing (OFDM) is a 4G candidate signal, OFDM-based signals are collectively classified as SMSE since modulation and encoding are spectrally applied. The proposed framework provides analytic commonality and unification of SMSE signals. Applicability is first shown for candidate 4G signals, and resultant analytic expressions agree with published results. Implementability is then demonstrated in multiple coexistence scenarios via modeling and simulation to reinforce practical utility
    corecore