49 research outputs found
MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning
In this paper, we introduce MFCCGAN as a novel speech synthesizer based on
adversarial learning that adopts MFCCs as input and generates raw speech
waveforms. Benefiting the GAN model capabilities, it produces speech with
higher intelligibility than a rule-based MFCC-based speech synthesizer WORLD.
We evaluated the model based on a popular intrusive objective speech
intelligibility measure (STOI) and quality (NISQA score). Experimental results
show that our proposed system outperforms Librosa MFCC- inversion (by an
increase of about 26% up to 53% in STOI and 16% up to 78% in NISQA score) and a
rise of about 10% in intelligibility and about 4% in naturalness in comparison
with conventional rule-based vocoder WORLD that used in the CycleGAN-VC family.
However, WORLD needs additional data like F0. Finally, using perceptual loss in
discriminators based on STOI could improve the quality more. WebMUSHRA-based
subjective tests also show the quality of the proposed approach.Comment: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP
An investigation into glottal waveform based speech coding
Coding of voiced speech by extraction of the glottal waveform has shown promise in improving the efficiency of speech coding systems. This thesis describes an investigation into the performance of such a system.
The effect of reverberation on the radiation impedance at the lips is shown to be negligible under normal conditions. Also, the accuracy of the Image Method for adding artificial reverberation to anechoic speech recordings is established.
A new algorithm, Pre-emphasised Maximum Likelihood Epoch Detection (PMLED), for Glottal Closure Instant detection is proposed. The algorithm is tested on natural speech and is shown to be both accurate and robust.
Two techniques for giottai waveform estimation, Closed Phase Inverse Filtering (CPIF) and Iterative Adaptive Inverse Filtering (IAIF), are compared. In tandem with an LF model fitting procedure, both techniques display a high degree of accuracy However, IAIF is found to be slightly more robust.
Based on these results, a Glottal Excited Linear Predictive (GELP) coding system for voiced speech is proposed and tested. Using a differential LF parameter quantisation scheme, the system achieves speech quality similar to that of U S Federal Standard 1016 CELP at a lower mean bit rate while incurring no extra delay
Speech coding at medium bit rates using analysis by synthesis techniques
Speech coding at medium bit rates using analysis by synthesis technique
The Impact Of The Development Of ICT In Several Hungarian Economic Sectors
As the author could not find a reassuring mathematical and
statistical method in the literature for studying the effect of
information communication technology on enterprises, the author
suggested a new research and analysis method that he also used to study the Hungarian economic sectors. The question of what
factors have an effect on their net income is vital for enterprises. At first, the author studied some potential indicators related to economic sectors, then those indicators were compared to the net income of the surveyed enterprises. The resulting data showed that the growing penetration of electronic marketplaces contributed to the change of the net income of enterprises to the greatest extent.
Furthermore, among all the potential indicators, it was the only indicator directly influencing the net income of enterprises.
With the help of the compound indicator and the financial data
of the studied economic sectors, the author made an attempt to find a connection between the development level of ICT and
profitability. Profitability and productivity are influenced by a lot of other factors as well. As the effect of the other factors could not be measured, the results â shown in a coordinate system - are not full but informative.
The highest increment of specific Gross Value Added was
produced by the fields of âManufacturingâ, âElectricity, gas and water supplyâ, âTransport, storage and communicationâ and
âFinancial intermediationâ. With the exception of âElectricity, gas and water supplyâ, the other economic sectors belong to the group of underdeveloped branches (below 50 percent).
On the other hand, âConstructionâ, âHealth and social workâ and
âHotels and restaurantsâ can be seen as laggards, so they got into the lower left part of the coordinate system.
âAgriculture, hunting and forestryâ can also be classified as a
laggard economic sector, but as the effect of the compound
indicator on the increment of Gross Value Added was less
significant, it can be found in the upper left part of the coordinate system. Drawing a trend line on the points, it can be made clear that it shows a positive gradient, that is, the higher the usage of ICT devices, the higher improvement can be detected in the specific Gross Value Added
Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)
Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression
Glottal-synchronous speech processing
Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity
of voiced speech is exploited. Traditionally, speech processing involves segmenting
and processing short speech frames of predefined length; this may fail to exploit the inherent
periodic structure of voiced speech which glottal-synchronous speech frames have
the potential to harness. Glottal-synchronous frames are often derived from the glottal
closure instants (GCIs) and glottal opening instants (GOIs).
The SIGMA algorithm was developed for the detection of GCIs and GOIs from
the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and
GOI detection from speech signals, the YAGA algorithm provides a measured accuracy
of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to
reverberation than single-channel algorithms.
The GCIs are applied to real-world applications including speech dereverberation,
where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance
of voicing detection in glottal-synchronous algorithms is demonstrated by subjective
testing. The GCIs are further exploited in a new area of data-driven speech modelling,
providing new insights into speech production and a set of tools to aid deployment into
real-world applications. The technique is shown to be applicable in areas of speech coding,
identification and artificial bandwidth extension of telephone speec