Search CORE

2 research outputs found

The IOA System for Deep Noise Suppression Challenge using a Framework Combining Dynamic Attention and Recursive Learning

Author: Cheng Linjuan
Li Andong
Li Xiaodong
Peng Renhua
Zheng Chengshi
Publication venue
Publication date: 12/05/2020
Field of study

This technical report describes our system that is submitted to the Deep Noise Suppression Challenge and presents the results for the non-real-time track. To refine the estimation results stage by stage, we utilize recursive learning, a type of training protocol which aggravates the information through multiple stages with a memory mechanism. The attention generator network is designed to dynamically control the feature distribution of the noise reduction network. To improve the phase recovery accuracy, we take the complex spectral mapping procedure by decoding both real and imaginary spectra. For the final blind test set, the average MOS improvements of the submitted system in noreverb, reverb, and realrec categories are 0.49, 0.24, and 0.36, respectively.Comment: 4 pages, 2 figure

arXiv.org e-Print Archive

Kernel-based Sensor Fusion with Application to Audio-Visual Voice Activity Detection

Author: Cohen Israel
Dov David
Talmon Ronen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/04/2016
Field of study

In this paper, we address the problem of multiple view data fusion in the presence of noise and interferences. Recent studies have approached this problem using kernel methods, by relying particularly on a product of kernels constructed separately for each view. From a graph theory point of view, we analyze this fusion approach in a discrete setting. More specifically, based on a statistical model for the connectivity between data points, we propose an algorithm for the selection of the kernel bandwidth, a parameter, which, as we show, has important implications on the robustness of this fusion approach to interferences. Then, we consider the fusion of audio-visual speech signals measured by a single microphone and by a video camera pointed to the face of the speaker. Specifically, we address the task of voice activity detection, i.e., the detection of speech and non-speech segments, in the presence of structured interferences such as keyboard taps and office noise. We propose an algorithm for voice activity detection based on the audio-visual signal. Simulation results show that the proposed algorithm outperforms competing fusion and voice activity detection approaches. In addition, we demonstrate that a proper selection of the kernel bandwidth indeed leads to improved performance

arXiv.org e-Print Archive