Search CORE

85 research outputs found

Mel cepstral coefficient modification based on the Glimpse Proportion measure for improving the intelligibility of HMM-generated synthetic speech in noise

Author: King S.
Valentini-Botinhao C.
Yamagishi J.
Publication venue
Publication date: 01/09/2012
Field of study

Edinburgh Research Explorer

Using an intelligibility measure to create noise robust cepstral coefficients for HMM-based speech synthesis

Author: King S.
Valentini-Botinhao C.
Yamagishi J.
Publication venue
Publication date: 01/05/2012
Field of study

Edinburgh Research Explorer

Speech intelligibility in cars: the effect of speaking style, noise and listener age

Author: Valentini Botinhao Cassia
Yamagishi Junichi
Publication venue: 'International Speech Communication Association'
Publication date: 24/08/2017
Field of study

Crossref

Edinburgh Research Explorer

Using neighbourhood density and selective SNR boosting to increase the intelligibility of synthetic speech in noise

Author: King Simon
Valentini-Botinhao Cassia
Wester Mirjam
Yamagishi Junichi
Publication venue
Publication date: 01/08/2013
Field of study

Edinburgh Research Explorer

Detection and analysis of attention errors in sequence-to-sequence text-to-speech

Author: King Simon
Valentini-Botinhao Cassia
Publication venue: 'International Speech Communication Association'
Publication date: 30/08/2021
Field of study

Edinburgh Research Explorer

Combining perceptually-motivated spectral shaping with loudness and duration modification for intelligibility enhancement of HMM-based synthetic speech in noise

Author: King S.
Stylianou Y.
Valentini-Botinhao C.
Yamagishi J.
Publication venue
Publication date: 01/08/2013
Field of study

Edinburgh Research Explorer

Cepstral analysis based on the Glimpse proportion measure for improving the intelligibility of HMM-based synthetic speech in noise

Author: King S.
Maia R.
Valentini-Botinhao C.
Yamagishi J.
Zen H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

In this paper we introduce a new cepstral coefficient extraction method based on an intelligibility measure for speech in noise, the Glimpse Proportion measure. This new method aims to increase the intelligibility of speech in noise by modifying the clean speech, and has applications in scenarios such as public announcement and car navigation systems. We first explain how the Glimpse Proportion measure operates and further show how we approximated it to integrate it into an existing spectral envelope parameter extraction method commonly used in the HMM-based speech synthesis framework. We then demonstrate how this new method changes the modelled spectrum according to the characteristics of the noise and show results for a listening test with vocoded and HMM-based synthetic speech. The test indicates that the proposed method can significantly improve intelligibility of synthetic speech in speech shaped noise. Index Terms — cepstral coefficient extraction, objective measure for speech intelligibility, Lombard speech, HMM-based speech synthesis 1

CiteSeerX

Crossref

Edinburgh Research Explorer

Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices

Author: Valentini-Botinhao Cassia
Watts Oliver
Wihlborg Lovisa
Publication venue
Publication date: 25/11/2022
Field of study

We present a neural vocoder designed with low-powered Alternative and Augmentative Communication devices in mind. By combining elements of successful modern vocoders with established ideas from an older generation of technology, our system is able to produce high quality synthetic speech at 48kHz on devices where neural vocoders are otherwise prohibitively complex. The system is trained adversarially using differentiable pitch synchronous overlap add, and reduces complexity by relying on pitch synchronous Inverse Short-Time Fourier Transform (ISTFT) to generate speech samples. Our system achieves comparable quality with a strong (HiFi-GAN) baseline while using only a fraction of the compute. We present results of a perceptual evaluation as well as an analysis of system complexity.Comment: ICASSP 2023 submissio

arXiv.org e-Print Archive

Edinburgh Research Explorer

Speech Waveform Reconstruction using Convolutional Neural Networks with Noise and Periodic Inputs

Author: King Simon
Valentini Botinhao Cassia
Watts Oliver
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/04/2019
Field of study

Edinburgh Research Explorer

Evaluating speech intelligibility enhancement for HMM-based synthetic speech in noise

Author: King Simon
Valentini-Botinhao Cassia
Yamagishi Junichi
Publication venue
Publication date: 01/01/2012
Field of study

It is possible to increase the intelligibility of speech in noise by enhancing the clean speech signal. In this paper we demonstrate the effects of modifying the spectral envelope of synthetic speech according to the environmental noise. To achieve this, we modify Mel cepstral coefficients according to an intelligibility measure that accounts for glimpses of speech in noise: the Glimpse Proportion measure. We evaluate this method against a baseline synthetic voice trained only with normal speech and a topline voice trained with Lombard speech, as well as natural speech. The intelligibility of these voices was measured when mixed with speech-shaped noise and with a competing speaker at three different levels. The Lombard voices, both natural and synthetic, were more intelligible than the normal voices in all conditions. For speechshaped noise, the proposed modified voice was as intelligible as the Lombard synthetic voice without requiring any recordings of Lombard speech, which are hard to obtain. However, in the case of competing talker noise, the Lombard synthetic voice was more intelligible than the proposed modified voice. Index Terms: HMM-based speech synthesis, intelligibility of speech in noise, Lombard speec

CiteSeerX

Edinburgh Research Explorer