6 research outputs found

    Speech Waveform Synthesis From MFCC Sequences With Generative Adversarial Networks

    Get PDF
    This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis. First, we predict fundamental frequency and voicing information from MFCCs with an autoregressive recurrent neural net. Second, the spectral envelope information contained in MFCCs is converted to all-pole filters, and a pitch-synchronous excitation model matched to these filters is trained. Finally, we introduce a generative adversarial network -based noise model to add a realistic high-frequency stochastic component to the modeled excitation signal. The results show that high quality speech reconstruction can be obtained, given only MFCC information at test time

    MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning

    Full text link
    In this paper, we introduce MFCCGAN as a novel speech synthesizer based on adversarial learning that adopts MFCCs as input and generates raw speech waveforms. Benefiting the GAN model capabilities, it produces speech with higher intelligibility than a rule-based MFCC-based speech synthesizer WORLD. We evaluated the model based on a popular intrusive objective speech intelligibility measure (STOI) and quality (NISQA score). Experimental results show that our proposed system outperforms Librosa MFCC- inversion (by an increase of about 26% up to 53% in STOI and 16% up to 78% in NISQA score) and a rise of about 10% in intelligibility and about 4% in naturalness in comparison with conventional rule-based vocoder WORLD that used in the CycleGAN-VC family. However, WORLD needs additional data like F0. Finally, using perceptual loss in discriminators based on STOI could improve the quality more. WebMUSHRA-based subjective tests also show the quality of the proposed approach.Comment: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP

    Intelligent Instruction-Based IoT Framework for Smart Home Applications using Speech Recognition

    Get PDF
    Design of a smart home using Internet of Things (IoT) and Machine Learning technology has been presented in this paper. This design is primarily based on LoRaWAN protocol and the main objective of this work was to establish an IoT network that is based on integration of sensors, gateway, network server and data visualization system. More importantly, intelligent speech recognition system is designed and presented here in detail as part of this work to achieve a novel futuristic smart home system design framework with intelligent instruction-based operation mechanism. In the case of low noise, the success rate of speaker recognition is above 90% based on THCHS-30 dataset
    corecore