Search CORE

778 research outputs found

Neural Dynamics of Phonetic Trading Relations for Variable-Rate CV Syllables

Author: Boardman Ian
Cohen Michael
Grossberg Stephen
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/12/1994
Field of study

The perception of CV syllables exhibits a trading relationship between voice onset time (VOT) of a consonant and duration of a vowel. Percepts of [ba] and [wa] can, for example, depend on the durations of the consonant and vowel segments, with an increase in the duration of the subsequent vowel switching the percept of the preceding consonant from [w] to [b]. A neural model, called PHONET, is proposed to account for these findings. In the model, C and V inputs are filtered by parallel auditory streams that respond preferentially to transient and sustained properties of the acoustic signal, as in vision. These streams are represented by working memories that adjust their processing rates to cope with variable acoustic input rates. More rapid transient inputs can cause greater activation of the transient stream which, in turn, can automatically gain control the processing rate in the sustained stream. An invariant percept obtains when the relative activations of C and V representations in the two streams remain uncha.nged. The trading relation may be simulated as a result of how different experimental manipulations affect this ratio. It is suggested that the brain can use duration of a subsequent vowel to make the [b]/[w] distinction because the speech code is a resonant event that emerges between working mernory activation patterns and the nodes that categorize them.Advanced Research Projects Agency (90-0083); Air Force Office of Scientific Reseearch (F19620-92-J-0225); Pacific Sierra Research Corporation (91-6075-2

Boston University Institutional Repository (OpenBU)

Quantisation mechanisms in multi-protoype waveform coding

Author: Pham Duong Hong
Publication venue: Department of Electrical and Computer Engineering
Publication date: 01/01/1996
Field of study

Prototype Waveform Coding is one of the most promising methods for speech coding at low bit rates over telecommunications networks. This thesis investigates quantisation mechanisms in Multi-Prototype Waveform (MPW) coding, and two prototype waveform quantisation algorithms for speech coding at bit rates of 2.4kb/s are proposed. Speech coders based on these algorithms have been found to be capable of producing coded speech with equivalent perceptual quality to that generated by the US 1016 Federal Standard CELP-4.8kb/s algorithm. The two proposed prototype waveform quantisation algorithms are based on Prototype Waveform Interpolation (PWI). The first algorithm is in an open loop architecture (Open Loop Quantisation). In this algorithm, the speech residual is represented as a series of prototype waveforms (PWs). The PWs are extracted in both voiced and unvoiced speech, time aligned and quantised and, at the receiver, the excitation is reconstructed by smooth interpolation between them. For low bit rate coding, the PW is decomposed into a slowly evolving waveform (SEW) and a rapidly evolving waveform (REW). The SEW is coded using vector quantisation on both magnitude and phase spectra. The SEW codebook search is based on the best matching of the SEW and the SEW codebook vector. The REW phase spectra is not quantised, but it is recovered using Gaussian noise. The REW magnitude spectra, on the other hand, can be either quantised with a certain update rate or only derived according to SEW behaviours

Research Online

A Range of Low and High Delay CELP Speech Codecs between 8 and 4 kbits/s

Author: Anderson
Bastiaan Kleijn
Chen
Chen
Dall'Agnol
Furui
Gersho
Gerson
Gerson
Gerson
Griffin
Hanzo
J.P. Woodard
Kondoz
Kroon
L. Hanzo
O'Shaughnessy
Press
Salami
Salami
Wang
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

A multiband excited waveform-interpolated 2.35-kbps speech codec for bandlimited channels

Author: F.C.A. Brooks
L. Hanzo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

New Directions in Subband Coding

Author: Cox R. V.
Grant Steven L.
Jayant N. S.
Quackenbush S. R.
Seshadri N.
Shoham Y.
Publication venue: Scholars\u27 Mine
Publication date: 01/01/1988
Field of study

Two very different subband coders are described. The first is a modified dynamic bit-allocation-subband coder (D-SBC) designed for variable rate coding situations and easily adaptable to noisy channel environments. It can operate at rates as low as 12 kb/s and still give good quality speech. The second coder is a 16-kb/s waveform coder, based on a combination of subband coding and vector quantization (VQ-SBC). The key feature of this coder is its short coding delay, which makes it suitable for real-time communication networks. The speech quality of both coders has been enhanced by adaptive postfiltering. The coders have been implemented on a single AT&T DSP32 signal processo

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Subband adaptive filtering for acoustic echo control using allpass polyphase IIR filterbanks

Author: Constantinides AG
Naylor PA
Tanrikulu O
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/1998
Field of study

Published versio

Spiral - Imperial College Digital Repository

Hybrid techniques for speech coding

Author: Burnett I. S.
Publication venue
Publication date: 01/01/1992
Field of study

OPUS

Pitch synchronous waveform interpolation for very low bit rate speech coding.

Author: Bun. Choi Hung
Publication venue
Publication date
Field of study

University of Liverpool Repository

Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach

Author: Feldbauer Christian
Kleijn W Bastiaan
Kubin Gernot
Publication venue
Publication date: 01/01/2005
Field of study

Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features that are most relevant to the listener. The aim of this paper is to provide a tutorial on perceptual speech and audio coding using an invertible auditory model. In this approach, the audio signal is converted into an auditory representation using an invertible auditory model. The auditory representation is quantized and coded. Upon decoding, it is then transformed back into the acoustic domain. This transformation converts a complex distortion criterion into a simple one, thus facilitating quantization with low complexity. We briefly review past work on auditory models and describe in more detail the components of our invertible model and its inversion procedure, that is, the method to reconstruct the signal from the output of the auditory model. We summarize attempts to use the auditory representation for low-bit-rate coding. Our approach also allows the exploitation of the inherent redundancy of the human auditory system for the purpose of multiple description (joint source-channel) coding

Springer - Publisher Connector

Directory of Open Access Journals

TUGraz OPEN Library

Phone-based speech synthesis using neural network with articulatory control.

Author
Publication venue: Department of Cultural and Religious Studies, The Chinese University of Hong Kong
Publication date: 01/01/1996
Field of study

by Lo Wai Kit.Thesis (M.Phil.)--Chinese University of Hong Kong, 1996.Includes bibliographical references (leaves 151-160).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Applications of Speech Synthesis --- p.2Chapter 1.1.1 --- Human Machine Interface --- p.2Chapter 1.1.2 --- Speech Aids --- p.3Chapter 1.1.3 --- Text-To-Speech (TTS) system --- p.4Chapter 1.1.4 --- Speech Dialogue System --- p.4Chapter 1.2 --- Current Status in Speech Synthesis --- p.6Chapter 1.2.1 --- Concatenation Based --- p.6Chapter 1.2.2 --- Parametric Based --- p.7Chapter 1.2.3 --- Articulatory Based --- p.7Chapter 1.2.4 --- Application of Neural Network in Speech Synthesis --- p.8Chapter 1.3 --- The Proposed Neural Network Speech Synthesis --- p.9Chapter 1.3.1 --- Motivation --- p.9Chapter 1.3.2 --- Objectives --- p.9Chapter 1.4 --- Thesis outline --- p.11Chapter 2 --- Linguistic Basics for Speech Synthesis --- p.12Chapter 2.1 --- Relations between Linguistic and Speech Synthesis --- p.12Chapter 2.2 --- Basic Phonology and Phonetics --- p.14Chapter 2.2.1 --- Phonology --- p.14Chapter 2.2.2 --- Phonetics --- p.15Chapter 2.2.3 --- Prosody --- p.16Chapter 2.3 --- Transcription Systems --- p.17Chapter 2.3.1 --- The Employed Transcription System --- p.18Chapter 2.4 --- Cantonese Phonology --- p.20Chapter 2.4.1 --- Some Properties of Cantonese --- p.20Chapter 2.4.2 --- Initial --- p.21Chapter 2.4.3 --- Final --- p.23Chapter 2.4.4 --- Lexical Tone --- p.25Chapter 2.4.5 --- Variations --- p.26Chapter 2.5 --- The Vowel Quadrilaterals --- p.29Chapter 3 --- Speech Synthesis Technology --- p.32Chapter 3.1 --- The Human Speech Production --- p.32Chapter 3.2 --- Important Issues in Speech Synthesis System --- p.34Chapter 3.2.1 --- Controllability --- p.34Chapter 3.2.2 --- Naturalness --- p.34Chapter 3.2.3 --- Complexity --- p.35Chapter 3.2.4 --- Information Storage --- p.35Chapter 3.3 --- Units for Synthesis --- p.37Chapter 3.4 --- Type of Synthesizer --- p.40Chapter 3.4.1 --- Copy Concatenation --- p.40Chapter 3.4.2 --- Vocoder --- p.41Chapter 3.4.3 --- Articulatory Synthesis --- p.44Chapter 4 --- Neural Network Speech Synthesis with Articulatory Control --- p.47Chapter 4.1 --- Neural Network Approximation --- p.48Chapter 4.1.1 --- The Approximation Problem --- p.48Chapter 4.1.2 --- Network Approach for Approximation --- p.49Chapter 4.2 --- Artificial Neural Network for Phone-based Speech Synthesis --- p.53Chapter 4.2.1 --- Network Approximation for Speech Signal Synthesis --- p.53Chapter 4.2.2 --- Feed forward Backpropagation Neural Network --- p.56Chapter 4.2.3 --- Radial Basis Function Network --- p.58Chapter 4.2.4 --- Parallel Operating Synthesizer Networks --- p.59Chapter 4.3 --- Template Storage and Control for the Synthesizer Network --- p.61Chapter 4.3.1 --- Implicit Template Storage --- p.61Chapter 4.3.2 --- Articulatory Control Parameters --- p.61Chapter 4.4 --- Summary --- p.65Chapter 5 --- Prototype Implementation of the Synthesizer Network --- p.66Chapter 5.1 --- Implementation of the Synthesizer Network --- p.66Chapter 5.1.1 --- Network Architectures --- p.68Chapter 5.1.2 --- Spectral Templates for Training --- p.74Chapter 5.1.3 --- System requirement --- p.76Chapter 5.2 --- Subjective Listening Test --- p.79Chapter 5.2.1 --- Sample Selection --- p.79Chapter 5.2.2 --- Test Procedure --- p.81Chapter 5.2.3 --- Result --- p.83Chapter 5.2.4 --- Analysis --- p.86Chapter 5.3 --- Summary --- p.88Chapter 6 --- Simplified Articulatory Control for the Synthesizer Network --- p.89Chapter 6.1 --- Coarticulatory Effect in Speech Production --- p.90Chapter 6.1.1 --- Acoustic Effect --- p.90Chapter 6.1.2 --- Prosodic Effect --- p.91Chapter 6.2 --- Control in various Synthesis Techniques --- p.92Chapter 6.2.1 --- Copy Concatenation --- p.92Chapter 6.2.2 --- Formant Synthesis --- p.93Chapter 6.2.3 --- Articulatory synthesis --- p.93Chapter 6.3 --- Articulatory Control Model based on Vowel Quad --- p.94Chapter 6.3.1 --- Modeling of Variations with the Articulatory Control Model --- p.95Chapter 6.4 --- Voice Correspondence : --- p.97Chapter 6.4.1 --- For Nasal Sounds ´ؤ Inter-Network Correspondence --- p.98Chapter 6.4.2 --- In Flat-Tongue Space - Intra-Network Correspondence --- p.101Chapter 6.5 --- Summary --- p.108Chapter 7 --- Pause Duration Properties in Cantonese Phrases --- p.109Chapter 7.1 --- The Prosodic Feature - Inter-Syllable Pause --- p.110Chapter 7.2 --- Experiment for Measuring Inter-Syllable Pause of Cantonese Phrases --- p.111Chapter 7.2.1 --- Speech Material Selection --- p.111Chapter 7.2.2 --- Experimental Procedure --- p.112Chapter 7.2.3 --- Result --- p.114Chapter 7.3 --- Characteristics of Inter-Syllable Pause in Cantonese Phrases --- p.117Chapter 7.3.1 --- Pause Duration Characteristics for Initials after Pause --- p.117Chapter 7.3.2 --- Pause Duration Characteristic for Finals before Pause --- p.119Chapter 7.3.3 --- General Observations --- p.119Chapter 7.3.4 --- Other Observations --- p.121Chapter 7.4 --- Application of Pause-duration Statistics to the Synthesis System --- p.124Chapter 7.5 --- Summary --- p.126Chapter 8 --- Conclusion and Further Work --- p.127Chapter 8.1 --- Conclusion --- p.127Chapter 8.2 --- Further Extension Work --- p.130Chapter 8.2.1 --- Regularization Network Optimized on ISD --- p.130Chapter 8.2.2 --- Incorporation of Non-Articulatory Parameters to Control Space --- p.130Chapter 8.2.3 --- Experiment on Other Prosodic Features --- p.131Chapter 8.2.4 --- Application of Voice Correspondence to Cantonese Coda Discrim- ination --- p.131Chapter A --- Cantonese Initials and Finals --- p.132Chapter A.1 --- Tables of All Cantonese Initials and Finals --- p.132Chapter B --- Using Distortion Measure as Error Function in Neural Network --- p.135Chapter B.1 --- Formulation of Itakura-Saito Distortion Measure for Neural Network Error Function --- p.135Chapter B.2 --- Formulation of a Modified Itakura-Saito Distortion (MISD) Measure for Neural Network Error Function --- p.137Chapter C --- Orthogonal Least Square Algorithm for RBFNet Training --- p.138Chapter C.l --- Orthogonal Least Squares Learning Algorithm for Radial Basis Function Network Training --- p.138Chapter D --- Phrase Lists --- p.140Chapter D.1 --- Two-Syllable Phrase List for the Pause Duration Experiment --- p.140Chapter D.1.1 --- 兩字詞 --- p.140Chapter D.2 --- Three/Four-Syllable Phrase List for the Pause Duration Experiment --- p.144Chapter D.2.1 --- 片語 --- p.14

CUHK Digital Repository