3,323 research outputs found

    Coding overcomplete representations of audio using the MCLT

    Get PDF
    We propose a system for audio coding using the modulated complex lapped transform (MCLT). In general, it is difficult to encode signals using overcomplete representations without avoiding a penalty in rate-distortion performance. We show that the penalty can be significantly reduced for MCLT-based representations, without the need for iterative methods of sparsity reduction. We achieve that via a magnitude-phase polar quantization and the use of magnitude and phase prediction. Compared to systems based on quantization of orthogonal representations such as the modulated lapped transform (MLT), the new system allows for reduced warbling artifacts and more precise computation of frequency-domain auditory masking functions

    Digital Signal Processing

    Get PDF
    Contains research objectives and reports on sixteen research projects.U.S. Navy - Office of Naval Research (Contract N00014-75-C-0852)National Science Foundation FellowshipNational Science Foundation (Grant ENG76-24117)U.S. Navy - Office of Naval Research (Contract N00014-77-C-0257)U.S. Air Force (Contract F19628-80-C-0002)U.S. Navy - Office of Naval Research (Contract N00014-75-C-0951)Schlumberger-Doll Research Center FellowshipHertz Foundation FellowshipGovernment of Pakistan ScholarshipU.S. Navy - Office of Naval Research (Contract N00014-77-C-0196

    Expediting TTS Synthesis with Adversarial Vocoding

    Get PDF
    Recent approaches in text-to-speech (TTS) synthesis employ neural network strategies to vocode perceptually-informed spectrogram representations directly into listenable waveforms. Such vocoding procedures create a computational bottleneck in modern TTS pipelines. We propose an alternative approach which utilizes generative adversarial networks (GANs) to learn mappings from perceptually-informed spectrograms to simple magnitude spectrograms which can be heuristically vocoded. Through a user study, we show that our approach significantly outperforms na\"ive vocoding strategies while being hundreds of times faster than neural network vocoders used in state-of-the-art TTS systems. We also show that our method can be used to achieve state-of-the-art results in unsupervised synthesis of individual words of speech.Comment: Published as a conference paper at INTERSPEECH 201

    Comparison of Signal Reconstruction Methods for the Azimuth Discrimination and Resynthesis Algorithm

    Get PDF
    The Azimuth Discrimination and Resynthesis algorithm, (ADRess), has been shown to produce high quality sound source separation results for intensity panned stereo recordings. There are however, artifacts such as phasiness which become apparent in the separated signals under certain conditions. This is largely due to the fact that only the magnitude spectra for the separated sources are estimated. Each source is then resynthesised using the phase information obtained from the original mixture. This paper describes the nature and origin of the associated artifacts and proposes alternative techniques for resynthesising the separated signals. A comparison of each technique is then presented
    corecore