566 research outputs found

    Simulation and implementation of a linear predictive coder

    Get PDF
    The main objective of this research was to design and build a Linear Predictive Coder (LPC) based on the TMS320 processor, and to incorporate this in the design of a low bit rate voice coding server for a Cambridge Ring. In order to decide on a suitable algorithm for the LPC, extensive simulations were carried out on a BBC computer. The computer used was interfaced to a frame store which, although its original purpose was to store video information, acted as a suitable store for speech. Up to six seconds of speech could be fed in from a microphone in real time for analysis. The BBC was fitted with a second processor, but in spite of this the processing times were very slow. [Continues.

    Frequency-warped autoregressive modeling and filtering

    Get PDF
    This thesis consists of an introduction and nine articles. The articles are related to the application of frequency-warping techniques to audio signal processing, and in particular, predictive coding of wideband audio signals. The introduction reviews the literature and summarizes the results of the articles. Frequency-warping, or simply warping techniques are based on a modification of a conventional signal processing system so that the inherent frequency representation in the system is changed. It is demonstrated that this may be done for basically all traditional signal processing algorithms. In audio applications it is beneficial to modify the system so that the new frequency representation is close to that of human hearing. One of the articles is a tutorial paper on the use of warping techniques in audio applications. Majority of the articles studies warped linear prediction, WLP, and its use in wideband audio coding. It is proposed that warped linear prediction would be particularly attractive method for low-delay wideband audio coding. Warping techniques are also applied to various modifications of classical linear predictive coding techniques. This was made possible partly by the introduction of a class of new implementation techniques for recursive filters in one of the articles. The proposed implementation algorithm for recursive filters having delay-free loops is a generic technique. This inspired to write an article which introduces a generalized warped linear predictive coding scheme. One example of the generalized approach is a linear predictive algorithm using almost logarithmic frequency representation.reviewe

    Selected methods for improving synthesis speech quality using linear predictive coding: system description, coefficient smoothing and streak

    Get PDF
    technical reportThis report develops two generalizations of the standard Linear Predictive Coding (LPC) implementation of a narrow band speech compression system. The purpose of each method is to improve the speech quality that is available from a standard LPC system

    Quantisation mechanisms in multi-protoype waveform coding

    Get PDF
    Prototype Waveform Coding is one of the most promising methods for speech coding at low bit rates over telecommunications networks. This thesis investigates quantisation mechanisms in Multi-Prototype Waveform (MPW) coding, and two prototype waveform quantisation algorithms for speech coding at bit rates of 2.4kb/s are proposed. Speech coders based on these algorithms have been found to be capable of producing coded speech with equivalent perceptual quality to that generated by the US 1016 Federal Standard CELP-4.8kb/s algorithm. The two proposed prototype waveform quantisation algorithms are based on Prototype Waveform Interpolation (PWI). The first algorithm is in an open loop architecture (Open Loop Quantisation). In this algorithm, the speech residual is represented as a series of prototype waveforms (PWs). The PWs are extracted in both voiced and unvoiced speech, time aligned and quantised and, at the receiver, the excitation is reconstructed by smooth interpolation between them. For low bit rate coding, the PW is decomposed into a slowly evolving waveform (SEW) and a rapidly evolving waveform (REW). The SEW is coded using vector quantisation on both magnitude and phase spectra. The SEW codebook search is based on the best matching of the SEW and the SEW codebook vector. The REW phase spectra is not quantised, but it is recovered using Gaussian noise. The REW magnitude spectra, on the other hand, can be either quantised with a certain update rate or only derived according to SEW behaviours

    The development of speech coding and the first standard coder for public mobile telephony

    Get PDF
    This thesis describes in its core chapter (Chapter 4) the original algorithmic and design features of the ??rst coder for public mobile telephony, the GSM full-rate speech coder, as standardized in 1988. It has never been described in so much detail as presented here. The coder is put in a historical perspective by two preceding chapters on the history of speech production models and the development of speech coding techniques until the mid 1980s, respectively. In the epilogue a brief review is given of later developments in speech coding. The introductory Chapter 1 starts with some preliminaries. It is de- ??ned what speech coding is and the reader is introduced to speech coding standards and the standardization institutes which set them. Then, the attributes of a speech coder playing a role in standardization are explained. Subsequently, several applications of speech coders - including mobile telephony - will be discussed and the state of the art in speech coding will be illustrated on the basis of some worldwide recognized standards. Chapter 2 starts with a summary of the features of speech signals and their source, the human speech organ. Then, historical models of speech production which form the basis of di??erent kinds of modern speech coders are discussed. Starting with a review of ancient mechanical models, we will arrive at the electrical source-??lter model of the 1930s. Subsequently, the acoustic-tube models as they arose in the 1950s and 1960s are discussed. Finally the 1970s are reviewed which brought the discrete-time ??lter model on the basis of linear prediction. In a unique way the logical sequencing of these models is exposed, and the links are discussed. Whereas the historical models are discussed in a narrative style, the acoustic tube models and the linear prediction tech nique as applied to speech, are subject to more mathematical analysis in order to create a sound basis for the treatise of Chapter 4. This trend continues in Chapter 3, whenever instrumental in completing that basis. In Chapter 3 the reader is taken by the hand on a guided tour through time during which successive speech coding methods pass in review. In an original way special attention is paid to the evolutionary aspect. Speci??cally, for each newly proposed method it is discussed what it added to the known techniques of the time. After presenting the relevant predecessors starting with Pulse Code Modulation (PCM) and the early vocoders of the 1930s, we will arrive at Residual-Excited Linear Predictive (RELP) coders, Analysis-by-Synthesis systems and Regular- Pulse Excitation in 1984. The latter forms the basis of the GSM full-rate coder. In Chapter 4, which constitutes the core of this thesis, explicit forms of Multi-Pulse Excited (MPE) and Regular-Pulse Excited (RPE) analysis-by-synthesis coding systems are developed. Starting from current pulse-amplitude computation methods in 1984, which included solving sets of equations (typically of order 10-16) two hundred times a second, several explicit-form designs are considered by which solving sets of equations in real time is avoided. Then, the design of a speci??c explicitform RPE coder and an associated eÆcient architecture are described. The explicit forms and the resulting architectural features have never been published in so much detail as presented here. Implementation of such a codec enabled real-time operation on a state-of-the-art singlechip digital signal processor of the time. This coder, at a bit rate of 13 kbit/s, has been selected as the Full-Rate GSM standard in 1988. Its performance is recapitulated. Chapter 5 is an epilogue brie y reviewing the major developments in speech coding technology after 1988. Many speech coding standards have been set, for mobile telephony as well as for other applications, since then. The chapter is concluded by an outlook

    Comparison of CELP speech coder with a wavelet method

    Get PDF
    This thesis compares the speech quality of Code Excited Linear Predictor (CELP, Federal Standard 1016) speech coder with a new wavelet method to compress speech. The performances of both are compared by performing subjective listening tests. The test signals used are clean signals (i.e. with no background noise), speech signals with room noise and speech signals with artificial noise added. Results indicate that for clean signals and signals with predominantly voiced components the CELP standard performs better than the wavelet method but for signals with room noise the wavelet method performs much better than the CELP. For signals with artificial noise added, the results are mixed depending on the level of artificial noise added with CELP performing better for low level noise added signals and the wavelet method performing better for higher noise levels

    Multirate Frequency Transformations: Wideband AM-FM Demodulation with Applications to Signal Processing and Communications

    Get PDF
    The AM-FM (amplitude & frequency modulation) signal model finds numerous applications in image processing, communications, and speech processing. The traditional approaches towards demodulation of signals in this category are the analytic signal approach, frequency tracking, or the energy operator approach. These approaches however, assume that the amplitude and frequency components are slowly time-varying, e.g., narrowband and incur significant demodulation error in the wideband scenarios. In this thesis, we extend a two-stage approach towards wideband AM-FM demodulation that combines multirate frequency transformations (MFT) enacted through a combination of multirate systems with traditional demodulation techniques, e.g., the Teager-Kasiser energy operator demodulation (ESA) approach to large wideband to narrowband conversion factors. The MFT module comprises of multirate interpolation and heterodyning and converts the wideband AM-FM signal into a narrowband signal, while the demodulation module such as ESA demodulates the narrowband signal into constituent amplitude and frequency components that are then transformed back to yield estimates for the wideband signal. This MFT-ESA approach is then applied to the various problems of: (a) wideband image demodulation and fingerprint demodulation, where multidimensional energy separation is employed, (b) wideband first-formant demodulation in vowels, and (c) wideband CPM demodulation with partial response signaling, to demonstrate its validity in both monocomponent and multicomponent scenarios as an effective multicomponent AM-FM signal demodulation and analysis technique for image processing, speech processing, and communications based applications

    Computer speech synthesis: a systematic method to extract synthesis parameters for formant synthesizers.

    Get PDF
    by Yu Wai Leung.Thesis (M.Phil.)--Chinese University of Hong Kong, 1993.Includes bibliographical references (leaves 94-96).Abstract --- p.1Introduction --- p.2Chapter 1. --- Human speech and its production modelChapter 1.1 --- The human vocal system --- p.4Chapter 1.2 --- Speech production mechanism --- p.5Chapter 1.3 --- Acoustic properties of human speech --- p.5Chapter 1.4 --- Modeling the speech production process --- p.6Chapter 1.5 --- Speech as the spoken form of a language --- p.7Chapter 2. --- Speech analysis techniquesChapter 2.1 --- Short time speech analysis and speech segmentation --- p.9Chapter 2.2 --- Pre-emphasis --- p.9Chapter 2.3 --- Linear predictive analysis --- p.10Chapter 2.4 --- Formant tracking --- p.13Chapter 2.5 --- Pitch determination --- p.20Chapter 3. --- Speech synthesis technologyChapter 3.1 --- Overview --- p.24Chapter 3.2 --- Articulatory synthesis --- p.24Chapter 3.3 --- Concatenation synthesis --- p.24Chapter 3.4 --- LPC synthesis --- p.27Chapter 3.5 --- Formant speech synthesis --- p.28Chapter 3.6 --- Synthesis by rule --- p.29Chapter 4. --- LSYNTH: A parallel formant synthesizerChapter 4.1 --- OverviewChapter 4.2 --- Synthesizer configuration: cascade and parallel --- p.32Chapter 4.3 --- Structure ofLSYNTH --- p.33Chapter 5. --- Automatic formant parameter extraction for parallel formant synthesizersChapter 5.1 --- Introduction --- p.47Chapter 5.2 --- The idea of a feedback analysis system --- p.48Chapter 5.3 --- Overview of the feedback analysis system --- p.49Chapter 5.4 --- Iterative spectral matching algorithm --- p.52Chapter 5.5 --- Results and discussions --- p.65Chapter 6. --- Generate formant trajectories in synthesis-by-rule systemsChapter 6.1 --- Formant trajectories generation in synthesis-by-rule systems --- p.70Chapter 6.2 --- Modeling formant transitions --- p.71Chapter 6.3 --- Conventional formant transition calculation --- p.72Chapter 6.4 --- The 4-point Bezier curve model --- p.73Chapter 6.5 --- Modeling of formant transitions for Cantonese --- p.77Chapter 7. --- Some listening test resultsChapter 7.1 --- Introduction --- p.87Chapter 7.2 --- Tone recognition test --- p.87Chapter 7.3 --- Cantonese final recognition test --- p.89Chapter 7.4 --- Problems and discussions --- p.91Conclusion --- p.92References --- p.94Appendix A: The Cantonese phonetic system --- p.97"Appendix B: TPIT, A tone trajectory generator for Cantonese" --- p.10
    • …
    corecore