Search CORE

6 research outputs found

Speech Waveform Synthesis From MFCC Sequences With Generative Adversarial Networks

Author: Airaksinen Manu
Alku Paavo
Bollepalli Bajibabu
Juvela Lauri
Kameoka Hirokazu
Wang Xin
Yamagishi Junichi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/04/2018
Field of study

This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis. First, we predict fundamental frequency and voicing information from MFCCs with an autoregressive recurrent neural net. Second, the spectral envelope information contained in MFCCs is converted to all-pole filters, and a pitch-synchronous excitation model matched to these filters is trained. Finally, we introduce a generative adversarial network -based noise model to add a realistic high-frequency stochastic component to the modeled excitation signal. The results show that high quality speech reconstruction can be obtained, given only MFCC information at test time

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Aaltodoc Publication Archive

MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning

Author: Gharavian Mohammad Reza Hasanabadi Majid Behdad Davood
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/06/2023
Field of study

In this paper, we introduce MFCCGAN as a novel speech synthesizer based on adversarial learning that adopts MFCCs as input and generates raw speech waveforms. Benefiting the GAN model capabilities, it produces speech with higher intelligibility than a rule-based MFCC-based speech synthesizer WORLD. We evaluated the model based on a popular intrusive objective speech intelligibility measure (STOI) and quality (NISQA score). Experimental results show that our proposed system outperforms Librosa MFCC- inversion (by an increase of about 26% up to 53% in STOI and 16% up to 78% in NISQA score) and a rise of about 10% in intelligibility and about 4% in naturalness in comparison with conventional rule-based vocoder WORLD that used in the CycleGAN-VC family. However, WORLD needs additional data like F0. Finally, using perceptual loss in discriminators based on STOI could improve the quality more. WebMUSHRA-based subjective tests also show the quality of the proposed approach.Comment: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP

arXiv.org e-Print Archive

Intelligent Instruction-Based IoT Framework for Smart Home Applications using Speech Recognition

Author: Abbasi Qammer
Abdulghani Amir
Ansari Shuja
Ge Yao
Imran Muhammad Ali
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/09/2020
Field of study

Design of a smart home using Internet of Things (IoT) and Machine Learning technology has been presented in this paper. This design is primarily based on LoRaWAN protocol and the main objective of this work was to establish an IoT network that is based on integration of sensors, gateway, network server and data visualization system. More importantly, intelligent speech recognition system is designed and presented here in detail as part of this work to achieve a novel futuristic smart home system design framework with intelligent instruction-based operation mechanism. In the case of low noise, the success rate of speaker recognition is above 90% based on THCHS-30 dataset

Crossref

Enlighten