3,323 research outputs found
Coding overcomplete representations of audio using the MCLT
We propose a system for audio coding using the modulated complex
lapped transform (MCLT). In general, it is difficult to encode signals using
overcomplete representations without avoiding a penalty in rate-distortion
performance. We show that the penalty can be significantly reduced for
MCLT-based representations, without the need for iterative methods of
sparsity reduction. We achieve that via a magnitude-phase polar quantization
and the use of magnitude and phase prediction. Compared to systems based
on quantization of orthogonal representations such as the modulated lapped
transform (MLT), the new system allows for reduced warbling artifacts and
more precise computation of frequency-domain auditory masking functions
Digital Signal Processing
Contains research objectives and reports on sixteen research projects.U.S. Navy - Office of Naval Research (Contract N00014-75-C-0852)National Science Foundation FellowshipNational Science Foundation (Grant ENG76-24117)U.S. Navy - Office of Naval Research (Contract N00014-77-C-0257)U.S. Air Force (Contract F19628-80-C-0002)U.S. Navy - Office of Naval Research (Contract N00014-75-C-0951)Schlumberger-Doll Research Center FellowshipHertz Foundation FellowshipGovernment of Pakistan ScholarshipU.S. Navy - Office of Naval Research (Contract N00014-77-C-0196
Expediting TTS Synthesis with Adversarial Vocoding
Recent approaches in text-to-speech (TTS) synthesis employ neural network
strategies to vocode perceptually-informed spectrogram representations directly
into listenable waveforms. Such vocoding procedures create a computational
bottleneck in modern TTS pipelines. We propose an alternative approach which
utilizes generative adversarial networks (GANs) to learn mappings from
perceptually-informed spectrograms to simple magnitude spectrograms which can
be heuristically vocoded. Through a user study, we show that our approach
significantly outperforms na\"ive vocoding strategies while being hundreds of
times faster than neural network vocoders used in state-of-the-art TTS systems.
We also show that our method can be used to achieve state-of-the-art results in
unsupervised synthesis of individual words of speech.Comment: Published as a conference paper at INTERSPEECH 201
Comparison of Signal Reconstruction Methods for the Azimuth Discrimination and Resynthesis Algorithm
The Azimuth Discrimination and Resynthesis algorithm, (ADRess), has been shown to produce high quality sound
source separation results for intensity panned stereo recordings. There are however, artifacts such as phasiness
which become apparent in the separated signals under certain conditions. This is largely due to the fact that only the
magnitude spectra for the separated sources are estimated. Each source is then resynthesised using the phase
information obtained from the original mixture. This paper describes the nature and origin of the associated artifacts
and proposes alternative techniques for resynthesising the separated signals. A comparison of each technique is then
presented
- …