Search CORE

555 research outputs found

An introduction to statistical parametric speech synthesis

Author: King Simon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2011
Field of study

Edinburgh Research Explorer

Low bit rate digital apeech signal processing systems

Author: Ahmadi S.
Ahmadi S.
Publication venue: Department of Electrical Engineering, Imperial College London
Publication date: 01/01/1980
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Non linear frequency compression with particular reference to helium speech.

Author: Al-Sulaifanie Bayez K.
Publication venue
Publication date: 01/01/1984
Field of study

OPUS

Novel Pitch Detection Algorithm With Application to Speech Coding

Author: Kura Vijay
Publication venue: ScholarWorks@UNO
Publication date: 19/12/2003
Field of study

This thesis introduces a novel method for accurate pitch detection and speech segmentation, named Multi-feature, Autocorrelation (ACR) and Wavelet Technique (MAWT). MAWT uses feature extraction, and ACR applied on Linear Predictive Coding (LPC) residuals, with a wavelet-based refinement step. MAWT opens the way for a unique approach to modeling: although speech is divided into segments, the success of voicing decisions is not crucial. Experiments demonstrate the superiority of MAWT in pitch period detection accuracy over existing methods, and illustrate its advantages for speech segmentation. These advantages are more pronounced for gain-varying and transitional speech, and under noisy conditions

University of New Orleans

On the color of voices:the relationship between cochlear implant users’ voice cue perception and speech intelligibility in cocktail-party scenarios

Author: El Boghdady Nawal
Publication venue: 'University of Groningen Press'
Publication date: 01/01/2019
Field of study

University of Groningen

Dissertations of the University of Groningen

On the color of voices:the relationship between cochlear implant users’ voice cue perception and speech intelligibility in cocktail-party scenarios

Author: El Boghdady Nawal
Publication venue: 'University of Groningen Press'
Publication date: 01/01/2019
Field of study

Cochlear implants (CIs) are neuroprosthetic devices that are surgically implanted to restore functional hearing in deaf and hard-of-hearing individuals. Most CI users can understand speech well in quiet situations, yet, it becomes quite challenging for them to understand speech in crowded environments, especially when multiple people are speaking simultaneously. This dissertation investigated whether such difficulties are related to the poor representation of voice cues in the implant arising from degraded spectral and temporal resolution from signal processing strategies. Human voices are characterized by their pitch (F0), in addition to a second dimension called the vocal-tract length (VTL). This dimension directly scales with the size of the speaker and, therefore, plays a crucial role in the distinction between male and female talkers, or between adults and children. The research questions were: whether CI users’ speech intelligibility in the presence of a competing talker (speech-on-speech; SoS) is related to their sensitivity to the F0 and VTL differences between the speakers, whether this relationship is influenced by the spectral resolution in the implant, and whether optimizing signal processing algorithms could improve the perception of such voice cues. The data showed that CI users’ SoS intelligibility was related to how sensitive they were to both F0 and VTL differences, and that this relationship was influenced by the spectral resolution in the implant. The data also provided evidence that CI users can draw a benefit from voice differences between male and female speakers, but not between female speakers and children. In addition, spectral enhancement techniques and optimization of some implant parameters were both shown to contribute to an improvement in SoS intelligibility and VTL sensitivity, respectively. These findings lay the foundations for future optimizations of the implant to improve CI users’ speech intelligibility in noisy settings

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Speech Compression Techniques: An Overview

Author: Juhi singh
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/05/2017
Field of study

Speech is the natural phenomena of human for communication purpose. The aim of speech coding is to compress the speech signal to the highest possible compression ratio but maintaining user acceptability. In this paper basically two major approaches for speech compression techniques are discussed like waveform coder: pulse code modulation, adaptive differential pulse code modulation, Sub-band coding, transform coding and vocoder: linear predictive coder, formant coder/synthesis

International Journal on Recent and Innovation Trends in Computing and Communication

Voice technology and BBN

Author: Wolf Jared J.
Publication venue
Publication date
Field of study

The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described

NASA Technical Reports Server

The development of speech coding and the first standard coder for public mobile telephony

Author: Sluijter R.J.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2005
Field of study

This thesis describes in its core chapter (Chapter 4) the original algorithmic and design features of the ??rst coder for public mobile telephony, the GSM full-rate speech coder, as standardized in 1988. It has never been described in so much detail as presented here. The coder is put in a historical perspective by two preceding chapters on the history of speech production models and the development of speech coding techniques until the mid 1980s, respectively. In the epilogue a brief review is given of later developments in speech coding. The introductory Chapter 1 starts with some preliminaries. It is de- ??ned what speech coding is and the reader is introduced to speech coding standards and the standardization institutes which set them. Then, the attributes of a speech coder playing a role in standardization are explained. Subsequently, several applications of speech coders - including mobile telephony - will be discussed and the state of the art in speech coding will be illustrated on the basis of some worldwide recognized standards. Chapter 2 starts with a summary of the features of speech signals and their source, the human speech organ. Then, historical models of speech production which form the basis of di??erent kinds of modern speech coders are discussed. Starting with a review of ancient mechanical models, we will arrive at the electrical source-??lter model of the 1930s. Subsequently, the acoustic-tube models as they arose in the 1950s and 1960s are discussed. Finally the 1970s are reviewed which brought the discrete-time ??lter model on the basis of linear prediction. In a unique way the logical sequencing of these models is exposed, and the links are discussed. Whereas the historical models are discussed in a narrative style, the acoustic tube models and the linear prediction tech nique as applied to speech, are subject to more mathematical analysis in order to create a sound basis for the treatise of Chapter 4. This trend continues in Chapter 3, whenever instrumental in completing that basis. In Chapter 3 the reader is taken by the hand on a guided tour through time during which successive speech coding methods pass in review. In an original way special attention is paid to the evolutionary aspect. Speci??cally, for each newly proposed method it is discussed what it added to the known techniques of the time. After presenting the relevant predecessors starting with Pulse Code Modulation (PCM) and the early vocoders of the 1930s, we will arrive at Residual-Excited Linear Predictive (RELP) coders, Analysis-by-Synthesis systems and Regular- Pulse Excitation in 1984. The latter forms the basis of the GSM full-rate coder. In Chapter 4, which constitutes the core of this thesis, explicit forms of Multi-Pulse Excited (MPE) and Regular-Pulse Excited (RPE) analysis-by-synthesis coding systems are developed. Starting from current pulse-amplitude computation methods in 1984, which included solving sets of equations (typically of order 10-16) two hundred times a second, several explicit-form designs are considered by which solving sets of equations in real time is avoided. Then, the design of a speci??c explicitform RPE coder and an associated eÆcient architecture are described. The explicit forms and the resulting architectural features have never been published in so much detail as presented here. Implementation of such a codec enabled real-time operation on a state-of-the-art singlechip digital signal processor of the time. This coder, at a bit rate of 13 kbit/s, has been selected as the Full-Rate GSM standard in 1988. Its performance is recapitulated. Chapter 5 is an epilogue brie y reviewing the major developments in speech coding technology after 1988. Many speech coding standards have been set, for mobile telephony as well as for other applications, since then. The chapter is concluded by an outlook

Repository TU/e

Pure OAI Repository