4,514 research outputs found
Unifying background models over complex audio using entropy
In this paper we extend an existing audio background modelling technique, leading to a more robust application to complex audio environments. The determination of background audio is used as an initial stage in the analysis of audio for surveillance and monitoring applications. Knowledge of the background serves to highlight unusual or infrequent sounds. An existing modelling approach uses an online, adaptive Gaussian Mixture model technique that uses multiple distributions to model variations in the background. The method used to determine the background distributions of the GMM leads to a failure mode of the existing technique when applied to complex audio. We propose a method incorporating further information, the proximity of distributions determined using entropy, to determine a more complete background model. The method was successful in more robustly modelling the background for complex audio scenes.<br /
Crossmodal Attentive Skill Learner
This paper presents the Crossmodal Attentive Skill Learner (CASL), integrated
with the recently-introduced Asynchronous Advantage Option-Critic (A2OC)
architecture [Harb et al., 2017] to enable hierarchical reinforcement learning
across multiple sensory inputs. We provide concrete examples where the approach
not only improves performance in a single task, but accelerates transfer to new
tasks. We demonstrate the attention mechanism anticipates and identifies useful
latent features, while filtering irrelevant sensor modalities during execution.
We modify the Arcade Learning Environment [Bellemare et al., 2013] to support
audio queries, and conduct evaluations of crossmodal learning in the Atari 2600
game Amidar. Finally, building on the recent work of Babaeizadeh et al. [2017],
we open-source a fast hybrid CPU-GPU implementation of CASL.Comment: International Conference on Autonomous Agents and Multiagent Systems
(AAMAS) 2018, NIPS 2017 Deep Reinforcement Learning Symposiu
Learning Deep Structured Models
Many problems in real-world applications involve predicting several random
variables which are statistically related. Markov random fields (MRFs) are a
great mathematical tool to encode such relationships. The goal of this paper is
to combine MRFs with deep learning algorithms to estimate complex
representations while taking into account the dependencies between the output
random variables. Towards this goal, we propose a training algorithm that is
able to learn structured models jointly with deep features that form the MRF
potentials. Our approach is efficient as it blends learning and inference and
makes use of GPU acceleration. We demonstrate the effectiveness of our
algorithm in the tasks of predicting words from noisy images, as well as
multi-class classification of Flickr photographs. We show that joint learning
of the deep features and the MRF parameters results in significant performance
gains.Comment: 11 pages including referenc
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions
Enhancing speech signal quality in adverse acoustic environments is a
persistent challenge in speech processing. Existing deep learning based
enhancement methods often struggle to effectively remove background noise and
reverberation in real-world scenarios, hampering listening experiences. To
address these challenges, we propose a novel approach that uses pre-trained
generative methods to resynthesize clean, anechoic speech from degraded inputs.
This study leverages pre-trained vocoder or codec models to synthesize
high-quality speech while enhancing robustness in challenging scenarios.
Generative methods effectively handle information loss in speech signals,
resulting in regenerated speech that has improved fidelity and reduced
artifacts. By harnessing the capabilities of pre-trained models, we achieve
faithful reproduction of the original speech in adverse conditions.
Experimental evaluations on both simulated datasets and realistic samples
demonstrate the effectiveness and robustness of our proposed methods.
Especially by leveraging codec, we achieve superior subjective scores for both
simulated and realistic recordings. The generated speech exhibits enhanced
audio quality, reduced background noise, and reverberation. Our findings
highlight the potential of pre-trained generative techniques in speech
processing, particularly in scenarios where traditional methods falter. Demos
are available at https://whmrtm.github.io/SoundResynthesis.Comment: Paper in submissio
- …