49 research outputs found

    MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning

    Full text link
    In this paper, we introduce MFCCGAN as a novel speech synthesizer based on adversarial learning that adopts MFCCs as input and generates raw speech waveforms. Benefiting the GAN model capabilities, it produces speech with higher intelligibility than a rule-based MFCC-based speech synthesizer WORLD. We evaluated the model based on a popular intrusive objective speech intelligibility measure (STOI) and quality (NISQA score). Experimental results show that our proposed system outperforms Librosa MFCC- inversion (by an increase of about 26% up to 53% in STOI and 16% up to 78% in NISQA score) and a rise of about 10% in intelligibility and about 4% in naturalness in comparison with conventional rule-based vocoder WORLD that used in the CycleGAN-VC family. However, WORLD needs additional data like F0. Finally, using perceptual loss in discriminators based on STOI could improve the quality more. WebMUSHRA-based subjective tests also show the quality of the proposed approach.Comment: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP

    An investigation into glottal waveform based speech coding

    Get PDF
    Coding of voiced speech by extraction of the glottal waveform has shown promise in improving the efficiency of speech coding systems. This thesis describes an investigation into the performance of such a system. The effect of reverberation on the radiation impedance at the lips is shown to be negligible under normal conditions. Also, the accuracy of the Image Method for adding artificial reverberation to anechoic speech recordings is established. A new algorithm, Pre-emphasised Maximum Likelihood Epoch Detection (PMLED), for Glottal Closure Instant detection is proposed. The algorithm is tested on natural speech and is shown to be both accurate and robust. Two techniques for giottai waveform estimation, Closed Phase Inverse Filtering (CPIF) and Iterative Adaptive Inverse Filtering (IAIF), are compared. In tandem with an LF model fitting procedure, both techniques display a high degree of accuracy However, IAIF is found to be slightly more robust. Based on these results, a Glottal Excited Linear Predictive (GELP) coding system for voiced speech is proposed and tested. Using a differential LF parameter quantisation scheme, the system achieves speech quality similar to that of U S Federal Standard 1016 CELP at a lower mean bit rate while incurring no extra delay

    Speech coding at medium bit rates using analysis by synthesis techniques

    Get PDF
    Speech coding at medium bit rates using analysis by synthesis technique

    The Impact Of The Development Of ICT In Several Hungarian Economic Sectors

    Get PDF
    As the author could not find a reassuring mathematical and statistical method in the literature for studying the effect of information communication technology on enterprises, the author suggested a new research and analysis method that he also used to study the Hungarian economic sectors. The question of what factors have an effect on their net income is vital for enterprises. At first, the author studied some potential indicators related to economic sectors, then those indicators were compared to the net income of the surveyed enterprises. The resulting data showed that the growing penetration of electronic marketplaces contributed to the change of the net income of enterprises to the greatest extent. Furthermore, among all the potential indicators, it was the only indicator directly influencing the net income of enterprises. With the help of the compound indicator and the financial data of the studied economic sectors, the author made an attempt to find a connection between the development level of ICT and profitability. Profitability and productivity are influenced by a lot of other factors as well. As the effect of the other factors could not be measured, the results – shown in a coordinate system - are not full but informative. The highest increment of specific Gross Value Added was produced by the fields of ‘Manufacturing’, ‘Electricity, gas and water supply’, ‘Transport, storage and communication’ and ‘Financial intermediation’. With the exception of ‘Electricity, gas and water supply’, the other economic sectors belong to the group of underdeveloped branches (below 50 percent). On the other hand, ‘Construction’, ‘Health and social work’ and ‘Hotels and restaurants’ can be seen as laggards, so they got into the lower left part of the coordinate system. ‘Agriculture, hunting and forestry’ can also be classified as a laggard economic sector, but as the effect of the compound indicator on the increment of Gross Value Added was less significant, it can be found in the upper left part of the coordinate system. Drawing a trend line on the points, it can be made clear that it shows a positive gradient, that is, the higher the usage of ICT devices, the higher improvement can be detected in the specific Gross Value Added

    Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)

    Get PDF
    Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    Sparsity in Linear Predictive Coding of Speech

    Get PDF
    nrpages: 197status: publishe
    corecore