153 research outputs found

    CELP ALGORITHMAND IMPLEMENTATION FORSPEECHCOMPRESSION

    Get PDF
    ABSTRACT This paper describes a fast algorithm and implementation of code excited linear predictive (CELP) speech coding. It presents principles of the algorithm, including (i) fast conversion of line spectrum pair parameters to linear predictive coding parameters, and (ii) fast searches of the parameters of adaptive and stochastic codebooks. The algorithm can be readily used for speech compression applications, such as on (i) high quality low-bit rate speech transmission in pointto-point or store-and-forward (network based) mode, and (ii) efficient speech storage in speech recording or multimedia databases. The implementation performs in real-time and near real-time on various platforms, including an IBM-PC AT equipped with a TMS32OC30 module, an IBM PC 486, a SUN Sparcstation 2, a SUN Sparcstation 5, and an IBM Power PC (Power 590). l. INTRODUCTION Why is CELP Useful ? Obtaining efficient representation of speech at low bit rates for communication or storage has been a problem of considerable importance, because of technical as well as economical requirements. Telephone-quality digital speech in a pulse code modulation (PCM) form requires a 64 kbits/s rate which cannot be transmitted in real time through 6 kHz and 30 kHz channel capacities of HF and VHF bands, respectively. Voice mail and multimedia employ speech storage, demanding efficient ways of storing speech, since one minute of PCM speech already requires 480 kbytes of storage space. Even if the channel can accommodate real-time speech, speech compression allows more communication connections to share the precious channel. Similarly, speech compression allows more speech messages to be stored in the storage of the same size. This paper describes a speech compression technique for those purposes, called code-excited linear predictive (CELP) coding [Atal86] [JlaJS93], which obtains bit rates of as low as 4.8 kbits/s, giving a compression ratio of up to 13: 1 The importance of CELP goes beyond its quality vs. bit-rate performance, as it *provides a generic structure for future generation of' perceptual speech coders If further compression iIs still required, the coder minimizes the error perceptibility by exploiting masking properties of human speech perception. To certain extent, the speech energy itself perceptually masks the distortion. Thus the same energy levels of distortion have different perceptual effect if applied to speech signals with different energy levels. This approach promises a new level of highier quality and lower bit rate speech compression One novelty of CELP is in incorporating the masking property in a working, practical scheme. Such incorporation is non trivial blecause perceptual distortion measures lack tractable means that have often been available in the traditional distortion energy measure. 9

    Scalable and perceptual audio compression

    Get PDF
    This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner

    A robust low bit rate quad-band excitation LSP vocoder.

    Get PDF
    by Chiu Kim Ming.Thesis (M.Phil.)--Chinese University of Hong Kong, 1994.Includes bibliographical references (leaves 103-108).Chapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Speech production --- p.2Chapter 1.2 --- Low bit rate speech coding --- p.4Chapter Chapter 2 --- Speech analysis & synthesis --- p.8Chapter 2.1 --- Linear prediction of speech signal --- p.8Chapter 2.2 --- LPC vocoder --- p.11Chapter 2.2.1 --- Pitch and voiced/unvoiced decision --- p.11Chapter 2.2.2 --- Spectral envelope representation --- p.15Chapter 2.3 --- Excitation --- p.16Chapter 2.3.1 --- Regular pulse excitation and Multipulse excitation --- p.16Chapter 2.3.2 --- Coded excitation and vector sum excitation --- p.19Chapter 2.4 --- Multiband excitation --- p.22Chapter 2.5 --- Multiband excitation vocoder --- p.25Chapter Chapter 3 --- Dual-band and Quad-band excitation --- p.31Chapter 3.1 --- Dual-band excitation --- p.31Chapter 3.2 --- Quad-band excitation --- p.37Chapter 3.3 --- Parameters determination --- p.41Chapter 3.3.1 --- Pitch detection --- p.41Chapter 3.3.2 --- Voiced/unvoiced pattern generation --- p.43Chapter 3.4 --- Excitation generation --- p.47Chapter Chapter 4 --- A low bit rate Quad-Band Excitation LSP Vocoder --- p.51Chapter 4.1 --- Architecture of QBELSP vocoder --- p.51Chapter 4.2 --- Coding of excitation parameters --- p.58Chapter 4.2.1 --- Coding of pitch value --- p.58Chapter 4.2.2 --- Coding of voiced/unvoiced pattern --- p.60Chapter 4.3 --- Spectral envelope estimation and coding --- p.62Chapter 4.3.1 --- Spectral envelope & the gain value --- p.62Chapter 4.3.2 --- Line Spectral Pairs (LSP) --- p.63Chapter 4.3.3 --- Coding of LSP frequencies --- p.68Chapter 4.3.4 --- Coding of gain value --- p.77Chapter Chapter 5 --- Performance evaluation --- p.80Chapter 5.1 --- Spectral analysis --- p.80Chapter 5.2 --- Subjective listening test --- p.93Chapter 5.2.1 --- Mean Opinion Score (MOS) --- p.93Chapter 5.2.2 --- Diagnostic Rhyme Test (DRT) --- p.96Chapter Chapter 6 --- Conclusions and discussions --- p.99References --- p.103Appendix A Subroutine of pitch detection --- p.A-I - A-IIIAppendix B Subroutine of voiced/unvoiced decision --- p.B-I - B-VAppendix C Subroutine of LPC coefficients calculation using Durbin's recursive method --- p.C-I - C-IIAppendix D Subroutine of LSP calculation using Chebyshev Polynomials --- p.D-I - D-IIIAppendix E Single syllable word pairs for Diagnostic Rhyme Test --- p.E-

    Audio Processing and Loudness Estimation Algorithms with iOS Simulations

    Get PDF
    abstract: The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters from human auditory models, such as auditory patterns and loudness, involves computationally intensive operations which can strain device resources. Hence, strategies for implementing computationally efficient human auditory models for loudness estimation have been studied in this thesis. Existing algorithms for reducing computations in auditory pattern and loudness estimation have been examined and improved algorithms have been proposed to overcome limitations of these methods. In addition, real-time applications such as perceptual loudness estimation and loudness equalization using auditory models have also been implemented. A software implementation of loudness estimation on iOS devices is also reported in this thesis. In addition to the loudness estimation algorithms and software, in this thesis project we also created new illustrations of speech and audio processing concepts for research and education. As a result, a new suite of speech/audio DSP functions was developed and integrated as part of the award-winning educational iOS App 'iJDSP." These functions are described in detail in this thesis. Several enhancements in the architecture of the application have also been introduced for providing the supporting framework for speech/audio processing. Frame-by-frame processing and visualization functionalities have been developed to facilitate speech/audio processing. In addition, facilities for easy sound recording, processing and audio rendering have also been developed to provide students, practitioners and researchers with an enriched DSP simulation tool. Simulations and assessments have been also developed for use in classes and training of practitioners and students.Dissertation/ThesisM.S. Electrical Engineering 201

    New techniques in signal coding

    Get PDF

    Enhanced Spectral Modeling for Sinusoidal Speech Coders

    Get PDF
    • …
    corecore