36 research outputs found

    Predictive multiple-scale lattice VQ for LSF quantization

    Full text link

    Spectral Envelope Modelling for Full-Band Speech Coding

    Get PDF
    Speech coding considering historically narrow-band was in the latest years significantly improved by widening the coded audio bandwidth. However, existing speech coders still employ a limited band source-filter model extended by parametric coding of the higher band. In this thesis, a full-band source-filter model is considered and especially its spectral magnitude envelope modelling. To match full-band operating mode, we modified, tuned and compared two methods, Linear Predictive Coding (LPC) and Distribution Quantization (DQ). LPC uses autoregressive modeling, while DQ quantifies the energy ratios between parts of the spectrum. Parameters of both methods were quantized with multi-stage vector quantization. Objective and subjective evaluations indicate the two methods used in a full-band source-filter coding scheme perform on the same range and are competitive against conventional speech coders requiring an extra bandwidth extension

    PENGKODEAN SUARA PITA LEBAR

    Get PDF
    Makalah ini menampilkan studi literatur tentang pengkode suara pita lebar yang ditujukan untuk aplikasi pada sistem komunikasi bergerak generasi ke-tiga (3G). Teknologi 3G telah memberi peluang penggunaan suara pita lebar (frekuensi 50-7000 Hz) untuk meningkatkan kualitas komunikasi suara. Suara pita lebar telah terbukti mampu membuat suara terdengar lebih alami (naturalness), memudahkan pendengar membedakan fricative sounds, dan mengurangi tingkat kelelahan dalam berkomunikasi (listener fatigue). Perkembangan penelitian tentang metode pengkodean dan metode kuantisasi vektor terhadap LPC parameter pada pengkode suara pita lebar disampaikan beserta algoritma yang digunakan untuk perancangan quantiser vektor

    A Parametric Approach for Efficient Speech Storage, Flexible Synthesis and Voice Conversion

    Get PDF
    During the past decades, many areas of speech processing have benefited from the vast increases in the available memory sizes and processing power. For example, speech recognizers can be trained with enormous speech databases and high-quality speech synthesizers can generate new speech sentences by concatenating speech units retrieved from a large inventory of speech data. However, even in today's world of ever-increasing memory sizes and computational resources, there are still lots of embedded application scenarios for speech processing techniques where the memory capacities and the processor speeds are very limited. Thus, there is still a clear demand for solutions that can operate with limited resources, e.g., on low-end mobile devices. This thesis introduces a new segmental parametric speech codec referred to as the VLBR codec. The novel proprietary sinusoidal speech codec designed for efficient speech storage is capable of achieving relatively good speech quality at compression ratios beyond the ones offered by the standardized speech coding solutions, i.e., at bitrates of approximately 1 kbps and below. The efficiency of the proposed coding approach is based on model simplifications, mode-based segmental processing, and the method of adaptive downsampling and quantization. The coding efficiency is also further improved using a novel flexible multi-mode matrix quantizer structure and enhanced dynamic codebook reordering. The compression is also facilitated using a new perceptual irrelevancy removal method. The VLBR codec is also applied to text-to-speech synthesis. In particular, the codec is utilized for the compression of unit selection databases and for the parametric concatenation of speech units. It is also shown that the efficiency of the database compression can be further enhanced using speaker-specific retraining of the codec. Moreover, the computational load is significantly decreased using a new compression-motivated scheme for very fast and memory-efficient calculation of concatenation costs, based on techniques and implementations used in the VLBR codec. Finally, the VLBR codec and the related speech synthesis techniques are complemented with voice conversion methods that allow modifying the perceived speaker identity which in turn enables, e.g., cost-efficient creation of new text-to-speech voices. The VLBR-based voice conversion system combines compression with the popular Gaussian mixture model based conversion approach. Furthermore, a novel method is proposed for converting the prosodic aspects of speech. The performance of the VLBR-based voice conversion system is also enhanced using a new approach for mode selection and through explicit control of the degree of voicing. The solutions proposed in the thesis together form a complete system that can be utilized in different ways and configurations. The VLBR codec itself can be utilized, e.g., for efficient compression of audio books, and the speech synthesis related methods can be used for reducing the footprint and the computational load of concatenative text-to-speech synthesizers to levels required in some embedded applications. The VLBR-based voice conversion techniques can be used to complement the codec both in storage applications and in connection with speech synthesis. It is also possible to only utilize the voice conversion functionality, e.g., in games or other entertainment applications

    Speech coding

    Full text link

    Nouvelles techniques de quantification vectorielle algébrique basées sur le codage de Voronoi : application au codage AMR-WB+

    Get PDF
    L'objet de cette thèse est l'étude de la quantification (vectorielle) par réseau de points et de son application au modèle de codage audio ACELP/TCX multi-mode. Le modèle ACELP/TCX constitue une solution possible au problème du codage audio universel---par codage universel, on entend la représentation unifiée de bonne qualité des signaux de parole et de musique à différents débits et fréquences d'échantillonnage. On considère ici comme applications la quantification des coefficients de prédiction linéaire et surtout le codage par transformée au sein du modèle TCX; l'application au codage TCX a un fort intérêt pratique, car le modèle TCX conditionne en grande partie le caractère universel du codage ACELP/TCX. La quantification par réseau de points est une technique de quantification par contrainte, exploitant la structure linéaire des réseaux réguliers. Elle a toujours été considérée, par rapport à la quantification vectorielle non structurée, comme une technique prometteuse du fait de sa complexité réduite (en stockage et quantité de calculs). On montre ici qu'elle possède d'autres avantages importants: elle rend possible la construction de codes efficaces en dimension relativement élevée et à débit arbitrairement élevé, adaptés au codage multi-débit (par transformée ou autre); en outre, elle permet de ramener la distorsion à la seule erreur granulaire au prix d'un codage à débit variable. Plusieurs techniques de quantification par réseau de points sont présentées dans cette thèse. Elles sont toutes élaborées à partir du codage de Voronoï. Le codage de Voronoï quasi-ellipsoïdal est adapté au codage d'une source gaussienne vectorielle dans le contexte du codage paramétrique de coefficients de prédiction linéaire à l'aide d'un modèle de mélange gaussien. La quantification vectorielle multi-débit par extension de Voronoï ou par codage de Voronoï à troncature adaptative est adaptée au codage audio par transformée multi-débit. L'application de la quantification vectorielle multi-débit au codage TCX est plus particulièrement étudiée. Une nouvelle technique de codage algébrique de la cible TCX est ainsi conçue à partir du principe d'allocation des bits par remplissage inverse des eaux

    Sparsity in Linear Predictive Coding of Speech

    Get PDF
    nrpages: 197status: publishe

    Speech spectrum non-stationarity detection based on line spectrum frequencies and related applications

    Get PDF
    Ankara : Department of Electrical and Electronics Engineering and The Institute of Engineering and Sciences of Bilkent University, 1998.Thesis (Master's) -- Bilkent University, 1998.Includes bibliographical references leaves 124-132In this thesis, two new speech variation measures for speech spectrum nonstationarity detection are proposed. These measures are based on the Line Spectrum Frequencies (LSF) and the spectral values at the LSF locations. They are formulated to be subjectively meaningful, mathematically tractable, and also have low computational complexity property. In order to demonstrate the usefulness of the non-stationarity detector, two applications are presented: The first application is an implicit speech segmentation system which detects non-stationary regions in speech signal and obtains the boundaries of the speech segments. The other application is a Variable Bit-Rate Mixed Excitation Linear Predictive (VBR-MELP) vocoder utilizing a novel voice activity detector to detect silent regions in the speech. This voice activity detector is designed to be robust to non-stationary background noise and provides efficient coding of silent sections and unvoiced utterances to decrease the bit-rate. Simulation results are also presented.Ertan, Ali ErdemM.S
    corecore