37 research outputs found
Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function
This paper addresses the problems of blind channel identification and
multichannel equalization for speech dereverberation and noise reduction. The
time-domain cross-relation method is not suitable for blind room impulse
response identification, due to the near-common zeros of the long impulse
responses. We extend the cross-relation method to the short-time Fourier
transform (STFT) domain, in which the time-domain impulse responses are
approximately represented by the convolutive transfer functions (CTFs) with
much less coefficients. The CTFs suffer from the common zeros caused by the
oversampled STFT. We propose to identify CTFs based on the STFT with the
oversampled signals and the critical sampled CTFs, which is a good compromise
between the frequency aliasing of the signals and the common zeros problem of
CTFs. In addition, a normalization of the CTFs is proposed to remove the gain
ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for
multichannel equalization, in which the sparsity of speech signals is
exploited. We propose to perform inverse filtering by minimizing the
-norm of the source signal with the relaxed -norm fitting error
between the micophone signals and the convolution of the estimated source
signal and the CTFs used as a constraint. This method is advantageous in that
the noise can be reduced by relaxing the -norm to a tolerance
corresponding to the noise power, and the tolerance can be automatically set.
The experiments confirm the efficiency of the proposed method even under
conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table
System Identification with Applications in Speech Enhancement
As the increasing popularity of integrating hands-free telephony on mobile portable devices
and the rapid development of voice over internet protocol, identification of acoustic
systems has become desirable for compensating distortions introduced to speech signals
during transmission, and hence enhancing the speech quality. The objective of this research
is to develop system identification algorithms for speech enhancement applications
including network echo cancellation and speech dereverberation.
A supervised adaptive algorithm for sparse system identification is developed for
network echo cancellation. Based on the framework of selective-tap updating scheme
on the normalized least mean squares algorithm, the MMax and sparse partial update
tap-selection strategies are exploited in the frequency domain to achieve fast convergence
performance with low computational complexity. Through demonstrating how
the sparseness of the network impulse response varies in the transformed domain, the
multidelay filtering structure is incorporated to reduce the algorithmic delay.
Blind identification of SIMO acoustic systems for speech dereverberation in the
presence of common zeros is then investigated. First, the problem of common zeros is
defined and extended to include the presence of near-common zeros. Two clustering algorithms
are developed to quantify the number of these zeros so as to facilitate the study
of their effect on blind system identification and speech dereverberation. To mitigate such
effect, two algorithms are developed where the two-stage algorithm based on channel
decomposition identifies common and non-common zeros sequentially; and the forced
spectral diversity approach combines spectral shaping filters and channel undermodelling
for deriving a modified system that leads to an improved dereverberation performance.
Additionally, a solution to the scale factor ambiguity problem in subband-based blind system identification is developed, which motivates further research on subbandbased
dereverberation techniques. Comprehensive simulations and discussions demonstrate
the effectiveness of the aforementioned algorithms. A discussion on possible directions
of prospective research on system identification techniques concludes this thesis
Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function
This paper addresses the problem of speech separation and enhancement from
multichannel convolutive and noisy mixtures, \emph{assuming known mixing
filters}. We propose to perform the speech separation and enhancement task in
the short-time Fourier transform domain, using the convolutive transfer
function (CTF) approximation. Compared to time-domain filters, CTF has much
less taps, consequently it has less near-common zeros among channels and less
computational complexity. The work proposes three speech-source recovery
methods, namely: i) the multichannel inverse filtering method, i.e. the
multiple input/output inverse theorem (MINT), is exploited in the CTF domain,
and for the multi-source case, ii) a beamforming-like multichannel inverse
filtering method applying single source MINT and using power minimization,
which is suitable whenever the source CTFs are not all known, and iii) a
constrained Lasso method, where the sources are recovered by minimizing the
-norm to impose their spectral sparsity, with the constraint that the
-norm fitting cost, between the microphone signals and the mixing model
involving the unknown source signals, is less than a tolerance. The noise can
be reduced by setting a tolerance onto the noise power. Experiments under
various acoustic conditions are carried out to evaluate the three proposed
methods. The comparison between them as well as with the baseline methods is
presented.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language
Processin
Multichannel Online Dereverberation based on Spectral Magnitude Inverse Filtering
This paper addresses the problem of multichannel online dereverberation. The
proposed method is carried out in the short-time Fourier transform (STFT)
domain, and for each frequency band independently. In the STFT domain, the
time-domain room impulse response is approximately represented by the
convolutive transfer function (CTF). The multichannel CTFs are adaptively
identified based on the cross-relation method, and using the recursive least
square criterion. Instead of the complex-valued CTF convolution model, we use a
nonnegative convolution model between the STFT magnitude of the source signal
and the CTF magnitude, which is just a coarse approximation of the former
model, but is shown to be more robust against the CTF perturbations. Based on
this nonnegative model, we propose an online STFT magnitude inverse filtering
method. The inverse filters of the CTF magnitude are formulated based on the
multiple-input/output inverse theorem (MINT), and adaptively estimated based on
the gradient descent criterion. Finally, the inverse filtering is applied to
the STFT magnitude of the microphone signals, obtaining an estimate of the STFT
magnitude of the source signal. Experiments regarding both speech enhancement
and automatic speech recognition are conducted, which demonstrate that the
proposed method can effectively suppress reverberation, even for the difficult
case of a moving speaker.Comment: Paper submitted to IEEE/ACM Transactions on Audio, Speech and
Language Processing. IEEE Signal Processing Letters, 201
Adaptive inverse filtering of room acoustics
Equalization techniques for high order, multichannel, FIR systems are important for dereverberation of speech observed in reverberation using multiple microphones. In this case the multichannel system represents the room impulse responses (RIRs). The existence of near-common zeros in multichannel RIRs can slow down the convergence rate of adaptive inverse filtering algorithms. In this paper, the effect of common and near-common zeros on both the closed-form and the adaptive inverse filtering algorithms is studied. An adaptive shortening algorithm of room acoustics is presented based on this study. 1
Energy-efficient wideband transceiver with per-band equalisation and synchronisation
To emit in the TV white space (TVWS) spectrum, the regulator has requested very strict spectral masks, which can be fulfilled using a FFT-modulated filter-bank multi-carrier system (FBMC) to extract one or several TVWS channels in the 470--790MHz range. Such a system reduces the channel dispersion, but even with near-perfectly reconstructing filter bank, the need for equalisation and synchronisation remains. In this work, we propose a per-band equalisation and synchronisation approach, performed by a constant modulus algorithms running concurrently with a direction-directed adaptation process for faster convergence and reduced phase ambiguity. We compare symbol- and fractionally-spaced versions, and investigate their fixed-point implementation on an FPGA. We compare the performance of the different systems in terms of mean squared error, computational cost, and robustness towards noise
Filter Optimization for Personal Sound Zones Systems
[ES] Los sistemas de zonas de sonido personal (o sus siglas en inglés PSZ) utilizan altavoces y técnicas de procesado de señal para reproducir sonidos distintos en diferentes zonas de un mismo espacio compartido. Estos sistemas se han popularizado en los últimos años debido a la amplia gama de aplicaciones que podrÃan verse beneficiadas por la generación de zonas de escucha individuales. El diseño de los filtros utilizados para procesar las señales de sonido es uno de los aspectos más importantes de los sistemas PSZ, al menos para las frecuencias bajas y medias. En la literatura se han propuesto diversos algoritmos para calcular estos filtros, cada uno de ellos con sus ventajas e inconvenientes. En el presente trabajo se revisan los algoritmos para sistemas PSZ propuestos en la literatura y se evalúa experimentalmente su rendimiento en un entorno reverberante. Los distintos algoritmos se comparan teniendo en cuenta aspectos como el aislamiento acústico entre zonas, el error de reproducción, la energÃa de los filtros y el retardo del sistema. Además, se estudian estrategias computacionalmente eficientes para obtener los filtros y también se compara su complejidad computacional. Los resultados experimentales obtenidos revelan que las soluciones existentes no pueden ofrecer una complejidad computacional baja y al mismo tiempo un buen rendimiento con baja latencia. Por ello se propone un nuevo algoritmo basado en el filtrado subbanda, y se demuestra experimentalmente que este algoritmo mitiga las limitaciones de los algoritmos existentes. Asimismo, este algoritmo ofrece una mayor versatilidad que los algoritmos existentes, ya que se pueden utilizar configuraciones distintas en cada subbanda, como por ejemplo, diferentes longitudes de filtro o distintos conjuntos de altavoces. Por último, se estudia la influencia de las respuestas objetivo en la optimización de los filtros y se propone un nuevo método en el que se aplica una ventana temporal a estas respuestas. El método propuesto se evalúa experimentalmente en dos salas con diferentes tiempos de reverberación y los resultados obtenidos muestran que se puede reducir la energÃa de las interferencias entre zonas gracias al efecto de la ventana temporal.[CA] Els sistemes de zones de so personal (o les seves sigles en anglés PSZ) fan servir altaveus i tècniques de processament de senyal per a reproduir sons distints en diferents zones d'un mateix espai compartit. Aquests sistemes s'han popularitzat en els últims anys a causa de l'à mplia gamma d'aplicacions que podrien veure's beneficiades per la generació de zones d'escolta individuals. El disseny dels filtres utilitzats per a processar els senyals de so és un dels aspectes més importants dels sistemes PSZ, particularment per a les freqüències baixes i mitjanes. En la literatura s'han proposat diversos algoritmes per a calcular aquests filtres, cadascun d'ells amb els seus avantatges i inconvenients. En aquest treball es revisen els algoritmes proposats en la literatura per a sistemes PSZ i s'avalua experimentalment el seu rendiment en un entorn reverberant. Els distints algoritmes es comparen tenint en compte aspectes com l'aïllament acústic entre zones, l'error de reproducció, l'energia dels filtres i el retard del sistema. A més, s'estudien estratègies de còmput eficient per obtindre els filtres i també es comparen les seves complexitats computacionals. Els resultats experimentals obtinguts revelen que les solucions existents no poder oferir al mateix temps una complexitat computacional baixa i un bon rendiment amb latència baixa. Per això es proposa un nou algoritme basat en el filtrat subbanda que mitiga aquestes limitacions. A més, l'algoritme proposat ofereix una major versatilitat que els algoritmes existents, ja que en cada subbanda el sistema pot utilitzar configuracions diferents, com per exemple, distintes longituds de filtre o distints conjunts d'altaveus. L'algoritme proposat s'avalua experimentalment en un entorn reverberant, i es mostra com pot mitigar satisfactòriament les limitacions dels algoritmes existents. Finalment, s'estudia la influència de les respostes objectiu en l'optimització dels filtres i es proposa un nou mètode en el que s'aplica una finestra temporal a les respostes objectiu. El mètode proposat s'avalua experimentalment en dues sales amb diferents temps de reverberació i els resultats obtinguts mostren que es pot reduir el nivell d'interferència entre zones grècies a l'efecte de la finestra temporal.[EN] Personal Sound Zones (PSZ) systems deliver different sounds to a number of listeners sharing an acoustic space through the use of loudspeakers together with signal processing techniques. These systems have attracted a lot of attention in recent years because of the wide range of applications that would benefit from the generation of individual listening zones, e.g., domestic or automotive audio applications. A key aspect of PSZ systems, at least for low and mid frequencies, is the optimization of the filters used to process the sound signals. Different algorithms have been proposed in the literature for computing those filters, each exhibiting some advantages and disadvantages. In this work, the state-of-the-art algorithms for PSZ systems are reviewed, and their performance in a reverberant environment is evaluated. Aspects such as the acoustic isolation between zones, the reproduction error, the energy of the filters, and the delay of the system are considered in the evaluations. Furthermore, computationally efficient strategies to obtain the filters are studied, and their computational complexity is compared too. The performance and computational evaluations reveal the main limitations of the state-of-the-art algorithms. In particular, the existing solutions can not offer low computational complexity and at the same time good performance for short system delays. Thus, a novel algorithm based on subband filtering that mitigates these limitations is proposed for PSZ systems. In addition, the proposed algorithm offers more versatility than the existing algorithms, since different system configurations, such as different filter lengths or sets of loudspeakers, can be used in each subband. The proposed algorithm is experimentally evaluated and tested in a reverberant environment, and its efficacy to mitigate the limitations of the existing solutions is demonstrated. Finally, the effect of the target responses in the optimization is discussed, and a novel approach that is based on windowing the target responses is proposed. The proposed approach is experimentally evaluated in two rooms with different reverberation levels. The evaluation results reveal that an appropriate windowing of the target responses can reduce the interference level between zones.Molés Cases, V. (2022). Filter Optimization for Personal Sound Zones Systems [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18611
Residual echo signal in critically sampled subband acoustic echo cancellers based on IIR and FIR filter banks
Published versio