6 research outputs found
Speech Waveform Synthesis From MFCC Sequences With Generative Adversarial Networks
This paper proposes a method for generating speech from filterbank mel
frequency cepstral coefficients (MFCC), which are widely used in speech
applications, such as ASR, but are generally considered unusable for speech
synthesis. First, we predict fundamental frequency and voicing information from
MFCCs with an autoregressive recurrent neural net. Second, the spectral
envelope information contained in MFCCs is converted to all-pole filters, and a
pitch-synchronous excitation model matched to these filters is trained.
Finally, we introduce a generative adversarial network -based noise model to
add a realistic high-frequency stochastic component to the modeled excitation
signal. The results show that high quality speech reconstruction can be
obtained, given only MFCC information at test time
MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning
In this paper, we introduce MFCCGAN as a novel speech synthesizer based on
adversarial learning that adopts MFCCs as input and generates raw speech
waveforms. Benefiting the GAN model capabilities, it produces speech with
higher intelligibility than a rule-based MFCC-based speech synthesizer WORLD.
We evaluated the model based on a popular intrusive objective speech
intelligibility measure (STOI) and quality (NISQA score). Experimental results
show that our proposed system outperforms Librosa MFCC- inversion (by an
increase of about 26% up to 53% in STOI and 16% up to 78% in NISQA score) and a
rise of about 10% in intelligibility and about 4% in naturalness in comparison
with conventional rule-based vocoder WORLD that used in the CycleGAN-VC family.
However, WORLD needs additional data like F0. Finally, using perceptual loss in
discriminators based on STOI could improve the quality more. WebMUSHRA-based
subjective tests also show the quality of the proposed approach.Comment: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP
Intelligent Instruction-Based IoT Framework for Smart Home Applications using Speech Recognition
Design of a smart home using Internet of Things (IoT) and Machine Learning technology has been presented in this paper. This design is primarily based on LoRaWAN protocol and the main objective of this work was to establish an IoT network that is based on integration of sensors, gateway, network server and data visualization system. More importantly, intelligent speech recognition system is designed and presented here in detail as part of this work to achieve a novel futuristic smart home system design framework with intelligent instruction-based operation mechanism. In the case of low noise, the success rate of speaker recognition is above 90% based on THCHS-30 dataset