555 research outputs found

    An introduction to statistical parametric speech synthesis

    Get PDF

    Low bit rate digital apeech signal processing systems

    Get PDF
    Imperial Users onl

    Novel Pitch Detection Algorithm With Application to Speech Coding

    Get PDF
    This thesis introduces a novel method for accurate pitch detection and speech segmentation, named Multi-feature, Autocorrelation (ACR) and Wavelet Technique (MAWT). MAWT uses feature extraction, and ACR applied on Linear Predictive Coding (LPC) residuals, with a wavelet-based refinement step. MAWT opens the way for a unique approach to modeling: although speech is divided into segments, the success of voicing decisions is not crucial. Experiments demonstrate the superiority of MAWT in pitch period detection accuracy over existing methods, and illustrate its advantages for speech segmentation. These advantages are more pronounced for gain-varying and transitional speech, and under noisy conditions

    On the color of voices:the relationship between cochlear implant users’ voice cue perception and speech intelligibility in cocktail-party scenarios

    Get PDF
    Cochlear implants (CIs) are neuroprosthetic devices that are surgically implanted to restore functional hearing in deaf and hard-of-hearing individuals. Most CI users can understand speech well in quiet situations, yet, it becomes quite challenging for them to understand speech in crowded environments, especially when multiple people are speaking simultaneously. This dissertation investigated whether such difficulties are related to the poor representation of voice cues in the implant arising from degraded spectral and temporal resolution from signal processing strategies. Human voices are characterized by their pitch (F0), in addition to a second dimension called the vocal-tract length (VTL). This dimension directly scales with the size of the speaker and, therefore, plays a crucial role in the distinction between male and female talkers, or between adults and children. The research questions were: whether CI users’ speech intelligibility in the presence of a competing talker (speech-on-speech; SoS) is related to their sensitivity to the F0 and VTL differences between the speakers, whether this relationship is influenced by the spectral resolution in the implant, and whether optimizing signal processing algorithms could improve the perception of such voice cues. The data showed that CI users’ SoS intelligibility was related to how sensitive they were to both F0 and VTL differences, and that this relationship was influenced by the spectral resolution in the implant. The data also provided evidence that CI users can draw a benefit from voice differences between male and female speakers, but not between female speakers and children. In addition, spectral enhancement techniques and optimization of some implant parameters were both shown to contribute to an improvement in SoS intelligibility and VTL sensitivity, respectively. These findings lay the foundations for future optimizations of the implant to improve CI users’ speech intelligibility in noisy settings

    Speech Compression Techniques: An Overview

    Get PDF
    Speech is the natural phenomena of human for communication purpose. The aim of speech coding is to compress the speech signal to the highest possible compression ratio but maintaining user acceptability. In this paper basically two major approaches for speech compression techniques are discussed like waveform coder: pulse code modulation, adaptive differential pulse code modulation, Sub-band coding, transform coding and vocoder: linear predictive coder, formant coder/synthesis

    Voice technology and BBN

    Get PDF
    The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described

    The development of speech coding and the first standard coder for public mobile telephony

    Get PDF
    This thesis describes in its core chapter (Chapter 4) the original algorithmic and design features of the ??rst coder for public mobile telephony, the GSM full-rate speech coder, as standardized in 1988. It has never been described in so much detail as presented here. The coder is put in a historical perspective by two preceding chapters on the history of speech production models and the development of speech coding techniques until the mid 1980s, respectively. In the epilogue a brief review is given of later developments in speech coding. The introductory Chapter 1 starts with some preliminaries. It is de- ??ned what speech coding is and the reader is introduced to speech coding standards and the standardization institutes which set them. Then, the attributes of a speech coder playing a role in standardization are explained. Subsequently, several applications of speech coders - including mobile telephony - will be discussed and the state of the art in speech coding will be illustrated on the basis of some worldwide recognized standards. Chapter 2 starts with a summary of the features of speech signals and their source, the human speech organ. Then, historical models of speech production which form the basis of di??erent kinds of modern speech coders are discussed. Starting with a review of ancient mechanical models, we will arrive at the electrical source-??lter model of the 1930s. Subsequently, the acoustic-tube models as they arose in the 1950s and 1960s are discussed. Finally the 1970s are reviewed which brought the discrete-time ??lter model on the basis of linear prediction. In a unique way the logical sequencing of these models is exposed, and the links are discussed. Whereas the historical models are discussed in a narrative style, the acoustic tube models and the linear prediction tech nique as applied to speech, are subject to more mathematical analysis in order to create a sound basis for the treatise of Chapter 4. This trend continues in Chapter 3, whenever instrumental in completing that basis. In Chapter 3 the reader is taken by the hand on a guided tour through time during which successive speech coding methods pass in review. In an original way special attention is paid to the evolutionary aspect. Speci??cally, for each newly proposed method it is discussed what it added to the known techniques of the time. After presenting the relevant predecessors starting with Pulse Code Modulation (PCM) and the early vocoders of the 1930s, we will arrive at Residual-Excited Linear Predictive (RELP) coders, Analysis-by-Synthesis systems and Regular- Pulse Excitation in 1984. The latter forms the basis of the GSM full-rate coder. In Chapter 4, which constitutes the core of this thesis, explicit forms of Multi-Pulse Excited (MPE) and Regular-Pulse Excited (RPE) analysis-by-synthesis coding systems are developed. Starting from current pulse-amplitude computation methods in 1984, which included solving sets of equations (typically of order 10-16) two hundred times a second, several explicit-form designs are considered by which solving sets of equations in real time is avoided. Then, the design of a speci??c explicitform RPE coder and an associated eÆcient architecture are described. The explicit forms and the resulting architectural features have never been published in so much detail as presented here. Implementation of such a codec enabled real-time operation on a state-of-the-art singlechip digital signal processor of the time. This coder, at a bit rate of 13 kbit/s, has been selected as the Full-Rate GSM standard in 1988. Its performance is recapitulated. Chapter 5 is an epilogue brie y reviewing the major developments in speech coding technology after 1988. Many speech coding standards have been set, for mobile telephony as well as for other applications, since then. The chapter is concluded by an outlook
    • …
    corecore