Search CORE

49 research outputs found

MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning

Author: Gharavian Mohammad Reza Hasanabadi Majid Behdad Davood
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/06/2023
Field of study

In this paper, we introduce MFCCGAN as a novel speech synthesizer based on adversarial learning that adopts MFCCs as input and generates raw speech waveforms. Benefiting the GAN model capabilities, it produces speech with higher intelligibility than a rule-based MFCC-based speech synthesizer WORLD. We evaluated the model based on a popular intrusive objective speech intelligibility measure (STOI) and quality (NISQA score). Experimental results show that our proposed system outperforms Librosa MFCC- inversion (by an increase of about 26% up to 53% in STOI and 16% up to 78% in NISQA score) and a rise of about 10% in intelligibility and about 4% in naturalness in comparison with conventional rule-based vocoder WORLD that used in the CycleGAN-VC family. However, WORLD needs additional data like F0. Finally, using perceptual loss in discriminators based on STOI could improve the quality more. WebMUSHRA-based subjective tests also show the quality of the proposed approach.Comment: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP

arXiv.org e-Print Archive

An investigation into glottal waveform based speech coding

Author: Bleakley Christopher J.
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/01/1995
Field of study

Coding of voiced speech by extraction of the glottal waveform has shown promise in improving the efficiency of speech coding systems. This thesis describes an investigation into the performance of such a system. The effect of reverberation on the radiation impedance at the lips is shown to be negligible under normal conditions. Also, the accuracy of the Image Method for adding artificial reverberation to anechoic speech recordings is established. A new algorithm, Pre-emphasised Maximum Likelihood Epoch Detection (PMLED), for Glottal Closure Instant detection is proposed. The algorithm is tested on natural speech and is shown to be both accurate and robust. Two techniques for giottai waveform estimation, Closed Phase Inverse Filtering (CPIF) and Iterative Adaptive Inverse Filtering (IAIF), are compared. In tandem with an LF model fitting procedure, both techniques display a high degree of accuracy However, IAIF is found to be slightly more robust. Based on these results, a Glottal Excited Linear Predictive (GELP) coding system for voiced speech is proposed and tested. Using a differential LF parameter quantisation scheme, the system achieves speech quality similar to that of U S Federal Standard 1016 CELP at a lower mean bit rate while incurring no extra delay

DCU Online Research Access Service

Artificial voicing of whispered speech

Author: Patrícia Cristina Ramalho de Oliveira
Publication venue
Publication date: 23/07/2015
Field of study

Repositório Aberto da Universidade do Porto

Speech coding at medium bit rates using analysis by synthesis techniques

Author: Nikolaos Gouvianakis (7201001)
Publication venue
Publication date: 12/08/2019
Field of study

Speech coding at medium bit rates using analysis by synthesis technique

Loughborough University Institutional Repository

The Impact Of The Development Of ICT In Several Hungarian Economic Sectors

Author: Sasvári Péter László
Publication venue
Publication date: 01/01/2011
Field of study

As the author could not find a reassuring mathematical and statistical method in the literature for studying the effect of information communication technology on enterprises, the author suggested a new research and analysis method that he also used to study the Hungarian economic sectors. The question of what factors have an effect on their net income is vital for enterprises. At first, the author studied some potential indicators related to economic sectors, then those indicators were compared to the net income of the surveyed enterprises. The resulting data showed that the growing penetration of electronic marketplaces contributed to the change of the net income of enterprises to the greatest extent. Furthermore, among all the potential indicators, it was the only indicator directly influencing the net income of enterprises. With the help of the compound indicator and the financial data of the studied economic sectors, the author made an attempt to find a connection between the development level of ICT and profitability. Profitability and productivity are influenced by a lot of other factors as well. As the effect of the other factors could not be measured, the results – shown in a coordinate system - are not full but informative. The highest increment of specific Gross Value Added was produced by the fields of ‘Manufacturing’, ‘Electricity, gas and water supply’, ‘Transport, storage and communication’ and ‘Financial intermediation’. With the exception of ‘Electricity, gas and water supply’, the other economic sectors belong to the group of underdeveloped branches (below 50 percent). On the other hand, ‘Construction’, ‘Health and social work’ and ‘Hotels and restaurants’ can be seen as laggards, so they got into the lower left part of the coordinate system. ‘Agriculture, hunting and forestry’ can also be classified as a laggard economic sector, but as the effect of the compound indicator on the increment of Gross Value Added was less significant, it can be found in the upper left part of the coordinate system. Drawing a trend line on the points, it can be made clear that it shows a positive gradient, that is, the higher the usage of ICT devices, the higher improvement can be detected in the specific Gross Value Added

Repository of the Academy's Library

Compensation for missing voice coding frames in packet transmission systems

Author: Kohler Mary Antoinette
Publication venue
Publication date: 01/05/2000
Field of study

SHAREOK repository

Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)

Author: Huck R. W.
Rafferty William
Reekie D. Hugh M.
Publication venue
Publication date
Field of study

Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression

NASA Technical Reports Server

Glottal-synchronous speech processing

Author: Thomas Mark R P
Thomas Mark R P
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2010
Field of study

Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

Spiral - Imperial College Digital Repository

OpenGrey Repository

Sparsity in Linear Predictive Coding of Speech

Author: Giacobello Daniele
Publication venue: Multimedia Information and Signal Processing, Institute of Electronic Systems, Aalborg University
Publication date: 01/01/2010
Field of study

nrpages: 197status: publishe

Lirias

VBN

Towards a general model for secure speech communications

Author: Anderson William Robert
Publication venue: 'University of Waterloo'
Publication date: 01/01/1997
Field of study

University of Waterloo's Institutional Repository