19 research outputs found
Acoustic echo and noise canceller for personal hands-free video IP phone
This paper presents implementation and evaluation of a proposed acoustic echo and noise canceller (AENC) for videotelephony-enabled personal hands-free Internet protocol (IP) phones. This canceller has the following features: noise-robust performance, low processing delay, and low computational complexity. The AENC employs an adaptive digital filter (ADF) and noise reduction (NR) methods that can effectively eliminate undesired acoustic echo and background noise included in a microphone signal even in a noisy environment. The ADF method uses the step-size control approach according to the level of disturbance such as background noise; it can minimize the effect of disturbance in a noisy environment. The NR method estimates the noise level under an assumption that the noise amplitude spectrum is constant in a short period, which cannot be applied to the amplitude spectrum of speech. In addition, this paper presents the method for decreasing the computational complexity of the ADF process without increasing the processing delay to make the processing suitable for real-time implementation. The experimental results demonstrate that the proposed AENC suppresses echo and noise sufficiently in a noisy environment; thus, resulting in natural-sounding speech
DNN-Based Source Enhancement to Increase Objective Sound Quality Assessment Score
We propose a training method for deep neural network (DNN)-based source enhancement to increase objective sound quality assessment (OSQA) scores such as the perceptual evaluation of speech quality (PESQ). In many conventional studies, DNNs have been used as a mapping function to estimate time-frequency masks and trained to minimize an analytically tractable objective function such as the mean squared error (MSE). Since OSQA scores have been used widely for soundquality evaluation, constructing DNNs to increase OSQA scores would be better than using the minimum-MSE to create highquality output signals. However, since most OSQA scores are not analytically tractable, i.e., they are black boxes, the gradient of the objective function cannot be calculated by simply applying back-propagation. To calculate the gradient of the OSQA-based objective function, we formulated a DNN optimization scheme on the basis of black-box optimization, which is used for training a computer that plays a game. For a black-box-optimization scheme, we adopt the policy gradient method for calculating the gradient on the basis of a sampling algorithm. To simulate output signals using the sampling algorithm, DNNs are used to estimate the probability-density function of the output signals that maximize OSQA scores. The OSQA scores are calculated from the simulated output signals, and the DNNs are trained to increase the probability of generating the simulated output signals that achieve high OSQA scores. Through several experiments, we found that OSQA scores significantly increased by applying the proposed method, even though the MSE was not minimized
音の波数領域信号処理 —平面波・円調和・球面調和関数展開とアレー信号処理—
マイクロホン素子やスピーカ素子を複数個並べたアレー信号処理は,音の空間情報を扱えるため,様々な応用において重要な役割を果たしている.本稿では,直線上,円周上,球面上に配置された素子位置に対する空間フーリエ変換の基礎について,波動方程式の解から出発して概説する.また,波面合成や指向性制御といった応用を波数領域で行う方法についても合わせて説明するArray signal processing with multiple microphones or loudspeakers is important for various applications because it can use spatial information. This article explains the fundamentals of the spatial Fourier transforms for the linear array, circular array, and spherical array from the viewpoint of the wave equation. Array processing methods in the wavenumber domain for wave field synthesis and directivity control are also described
Two-dimensional exterior sound field reproduction using two rigid circular loudspeaker arrays
In exterior sound field reproduction using loudspeaker arrays, such as a single circular array, there is a trade-off between the reproduction accuracy and the filter gain of the loudspeaker array. With the aim of reproducing complex sound fields with a lower filter gain, an asymmetrical array geometry with reflections between two or more rigid arrays is introduced. This paper proposes a sound field reproduction method using two rigid circular loudspeaker arrays in a circular harmonic domain. Transfer functions that consider the multiple scattering between two rigid baffles can be represented in the circular harmonic domain. By repeatedly transforming the expansion coefficient between two coordinate systems, the circular harmonic expansion was applied to the reproduced sound field in a mixed coordinate system. Then, the driving function of the loudspeaker arrays was derived through a mode expansion. Numerical simulations were conducted to verify the accuracy of the reproduced sound field