37 research outputs found

    Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function

    Get PDF
    This paper addresses the problems of blind channel identification and multichannel equalization for speech dereverberation and noise reduction. The time-domain cross-relation method is not suitable for blind room impulse response identification, due to the near-common zeros of the long impulse responses. We extend the cross-relation method to the short-time Fourier transform (STFT) domain, in which the time-domain impulse responses are approximately represented by the convolutive transfer functions (CTFs) with much less coefficients. The CTFs suffer from the common zeros caused by the oversampled STFT. We propose to identify CTFs based on the STFT with the oversampled signals and the critical sampled CTFs, which is a good compromise between the frequency aliasing of the signals and the common zeros problem of CTFs. In addition, a normalization of the CTFs is proposed to remove the gain ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for multichannel equalization, in which the sparsity of speech signals is exploited. We propose to perform inverse filtering by minimizing the â„“1\ell_1-norm of the source signal with the relaxed â„“2\ell_2-norm fitting error between the micophone signals and the convolution of the estimated source signal and the CTFs used as a constraint. This method is advantageous in that the noise can be reduced by relaxing the â„“2\ell_2-norm to a tolerance corresponding to the noise power, and the tolerance can be automatically set. The experiments confirm the efficiency of the proposed method even under conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table

    System Identification with Applications in Speech Enhancement

    No full text
    As the increasing popularity of integrating hands-free telephony on mobile portable devices and the rapid development of voice over internet protocol, identification of acoustic systems has become desirable for compensating distortions introduced to speech signals during transmission, and hence enhancing the speech quality. The objective of this research is to develop system identification algorithms for speech enhancement applications including network echo cancellation and speech dereverberation. A supervised adaptive algorithm for sparse system identification is developed for network echo cancellation. Based on the framework of selective-tap updating scheme on the normalized least mean squares algorithm, the MMax and sparse partial update tap-selection strategies are exploited in the frequency domain to achieve fast convergence performance with low computational complexity. Through demonstrating how the sparseness of the network impulse response varies in the transformed domain, the multidelay filtering structure is incorporated to reduce the algorithmic delay. Blind identification of SIMO acoustic systems for speech dereverberation in the presence of common zeros is then investigated. First, the problem of common zeros is defined and extended to include the presence of near-common zeros. Two clustering algorithms are developed to quantify the number of these zeros so as to facilitate the study of their effect on blind system identification and speech dereverberation. To mitigate such effect, two algorithms are developed where the two-stage algorithm based on channel decomposition identifies common and non-common zeros sequentially; and the forced spectral diversity approach combines spectral shaping filters and channel undermodelling for deriving a modified system that leads to an improved dereverberation performance. Additionally, a solution to the scale factor ambiguity problem in subband-based blind system identification is developed, which motivates further research on subbandbased dereverberation techniques. Comprehensive simulations and discussions demonstrate the effectiveness of the aforementioned algorithms. A discussion on possible directions of prospective research on system identification techniques concludes this thesis

    Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

    Get PDF
    This paper addresses the problem of speech separation and enhancement from multichannel convolutive and noisy mixtures, \emph{assuming known mixing filters}. We propose to perform the speech separation and enhancement task in the short-time Fourier transform domain, using the convolutive transfer function (CTF) approximation. Compared to time-domain filters, CTF has much less taps, consequently it has less near-common zeros among channels and less computational complexity. The work proposes three speech-source recovery methods, namely: i) the multichannel inverse filtering method, i.e. the multiple input/output inverse theorem (MINT), is exploited in the CTF domain, and for the multi-source case, ii) a beamforming-like multichannel inverse filtering method applying single source MINT and using power minimization, which is suitable whenever the source CTFs are not all known, and iii) a constrained Lasso method, where the sources are recovered by minimizing the â„“1\ell_1-norm to impose their spectral sparsity, with the constraint that the â„“2\ell_2-norm fitting cost, between the microphone signals and the mixing model involving the unknown source signals, is less than a tolerance. The noise can be reduced by setting a tolerance onto the noise power. Experiments under various acoustic conditions are carried out to evaluate the three proposed methods. The comparison between them as well as with the baseline methods is presented.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processin

    Multichannel Online Dereverberation based on Spectral Magnitude Inverse Filtering

    Full text link
    This paper addresses the problem of multichannel online dereverberation. The proposed method is carried out in the short-time Fourier transform (STFT) domain, and for each frequency band independently. In the STFT domain, the time-domain room impulse response is approximately represented by the convolutive transfer function (CTF). The multichannel CTFs are adaptively identified based on the cross-relation method, and using the recursive least square criterion. Instead of the complex-valued CTF convolution model, we use a nonnegative convolution model between the STFT magnitude of the source signal and the CTF magnitude, which is just a coarse approximation of the former model, but is shown to be more robust against the CTF perturbations. Based on this nonnegative model, we propose an online STFT magnitude inverse filtering method. The inverse filters of the CTF magnitude are formulated based on the multiple-input/output inverse theorem (MINT), and adaptively estimated based on the gradient descent criterion. Finally, the inverse filtering is applied to the STFT magnitude of the microphone signals, obtaining an estimate of the STFT magnitude of the source signal. Experiments regarding both speech enhancement and automatic speech recognition are conducted, which demonstrate that the proposed method can effectively suppress reverberation, even for the difficult case of a moving speaker.Comment: Paper submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing. IEEE Signal Processing Letters, 201

    Adaptive inverse filtering of room acoustics

    Full text link
    Equalization techniques for high order, multichannel, FIR systems are important for dereverberation of speech observed in reverberation using multiple microphones. In this case the multichannel system represents the room impulse responses (RIRs). The existence of near-common zeros in multichannel RIRs can slow down the convergence rate of adaptive inverse filtering algorithms. In this paper, the effect of common and near-common zeros on both the closed-form and the adaptive inverse filtering algorithms is studied. An adaptive shortening algorithm of room acoustics is presented based on this study. 1

    Application of channel shortening to acoustic channel equalization in the presence of noise and estimation error

    Full text link

    Energy-efficient wideband transceiver with per-band equalisation and synchronisation

    Get PDF
    To emit in the TV white space (TVWS) spectrum, the regulator has requested very strict spectral masks, which can be fulfilled using a FFT-modulated filter-bank multi-carrier system (FBMC) to extract one or several TVWS channels in the 470--790MHz range. Such a system reduces the channel dispersion, but even with near-perfectly reconstructing filter bank, the need for equalisation and synchronisation remains. In this work, we propose a per-band equalisation and synchronisation approach, performed by a constant modulus algorithms running concurrently with a direction-directed adaptation process for faster convergence and reduced phase ambiguity. We compare symbol- and fractionally-spaced versions, and investigate their fixed-point implementation on an FPGA. We compare the performance of the different systems in terms of mean squared error, computational cost, and robustness towards noise

    Filter Optimization for Personal Sound Zones Systems

    Full text link
    [ES] Los sistemas de zonas de sonido personal (o sus siglas en inglés PSZ) utilizan altavoces y técnicas de procesado de señal para reproducir sonidos distintos en diferentes zonas de un mismo espacio compartido. Estos sistemas se han popularizado en los últimos años debido a la amplia gama de aplicaciones que podrían verse beneficiadas por la generación de zonas de escucha individuales. El diseño de los filtros utilizados para procesar las señales de sonido es uno de los aspectos más importantes de los sistemas PSZ, al menos para las frecuencias bajas y medias. En la literatura se han propuesto diversos algoritmos para calcular estos filtros, cada uno de ellos con sus ventajas e inconvenientes. En el presente trabajo se revisan los algoritmos para sistemas PSZ propuestos en la literatura y se evalúa experimentalmente su rendimiento en un entorno reverberante. Los distintos algoritmos se comparan teniendo en cuenta aspectos como el aislamiento acústico entre zonas, el error de reproducción, la energía de los filtros y el retardo del sistema. Además, se estudian estrategias computacionalmente eficientes para obtener los filtros y también se compara su complejidad computacional. Los resultados experimentales obtenidos revelan que las soluciones existentes no pueden ofrecer una complejidad computacional baja y al mismo tiempo un buen rendimiento con baja latencia. Por ello se propone un nuevo algoritmo basado en el filtrado subbanda, y se demuestra experimentalmente que este algoritmo mitiga las limitaciones de los algoritmos existentes. Asimismo, este algoritmo ofrece una mayor versatilidad que los algoritmos existentes, ya que se pueden utilizar configuraciones distintas en cada subbanda, como por ejemplo, diferentes longitudes de filtro o distintos conjuntos de altavoces. Por último, se estudia la influencia de las respuestas objetivo en la optimización de los filtros y se propone un nuevo método en el que se aplica una ventana temporal a estas respuestas. El método propuesto se evalúa experimentalmente en dos salas con diferentes tiempos de reverberación y los resultados obtenidos muestran que se puede reducir la energía de las interferencias entre zonas gracias al efecto de la ventana temporal.[CA] Els sistemes de zones de so personal (o les seves sigles en anglés PSZ) fan servir altaveus i tècniques de processament de senyal per a reproduir sons distints en diferents zones d'un mateix espai compartit. Aquests sistemes s'han popularitzat en els últims anys a causa de l'àmplia gamma d'aplicacions que podrien veure's beneficiades per la generació de zones d'escolta individuals. El disseny dels filtres utilitzats per a processar els senyals de so és un dels aspectes més importants dels sistemes PSZ, particularment per a les freqüències baixes i mitjanes. En la literatura s'han proposat diversos algoritmes per a calcular aquests filtres, cadascun d'ells amb els seus avantatges i inconvenients. En aquest treball es revisen els algoritmes proposats en la literatura per a sistemes PSZ i s'avalua experimentalment el seu rendiment en un entorn reverberant. Els distints algoritmes es comparen tenint en compte aspectes com l'aïllament acústic entre zones, l'error de reproducció, l'energia dels filtres i el retard del sistema. A més, s'estudien estratègies de còmput eficient per obtindre els filtres i també es comparen les seves complexitats computacionals. Els resultats experimentals obtinguts revelen que les solucions existents no poder oferir al mateix temps una complexitat computacional baixa i un bon rendiment amb latència baixa. Per això es proposa un nou algoritme basat en el filtrat subbanda que mitiga aquestes limitacions. A més, l'algoritme proposat ofereix una major versatilitat que els algoritmes existents, ja que en cada subbanda el sistema pot utilitzar configuracions diferents, com per exemple, distintes longituds de filtre o distints conjunts d'altaveus. L'algoritme proposat s'avalua experimentalment en un entorn reverberant, i es mostra com pot mitigar satisfactòriament les limitacions dels algoritmes existents. Finalment, s'estudia la influència de les respostes objectiu en l'optimització dels filtres i es proposa un nou mètode en el que s'aplica una finestra temporal a les respostes objectiu. El mètode proposat s'avalua experimentalment en dues sales amb diferents temps de reverberació i els resultats obtinguts mostren que es pot reduir el nivell d'interferència entre zones grècies a l'efecte de la finestra temporal.[EN] Personal Sound Zones (PSZ) systems deliver different sounds to a number of listeners sharing an acoustic space through the use of loudspeakers together with signal processing techniques. These systems have attracted a lot of attention in recent years because of the wide range of applications that would benefit from the generation of individual listening zones, e.g., domestic or automotive audio applications. A key aspect of PSZ systems, at least for low and mid frequencies, is the optimization of the filters used to process the sound signals. Different algorithms have been proposed in the literature for computing those filters, each exhibiting some advantages and disadvantages. In this work, the state-of-the-art algorithms for PSZ systems are reviewed, and their performance in a reverberant environment is evaluated. Aspects such as the acoustic isolation between zones, the reproduction error, the energy of the filters, and the delay of the system are considered in the evaluations. Furthermore, computationally efficient strategies to obtain the filters are studied, and their computational complexity is compared too. The performance and computational evaluations reveal the main limitations of the state-of-the-art algorithms. In particular, the existing solutions can not offer low computational complexity and at the same time good performance for short system delays. Thus, a novel algorithm based on subband filtering that mitigates these limitations is proposed for PSZ systems. In addition, the proposed algorithm offers more versatility than the existing algorithms, since different system configurations, such as different filter lengths or sets of loudspeakers, can be used in each subband. The proposed algorithm is experimentally evaluated and tested in a reverberant environment, and its efficacy to mitigate the limitations of the existing solutions is demonstrated. Finally, the effect of the target responses in the optimization is discussed, and a novel approach that is based on windowing the target responses is proposed. The proposed approach is experimentally evaluated in two rooms with different reverberation levels. The evaluation results reveal that an appropriate windowing of the target responses can reduce the interference level between zones.Molés Cases, V. (2022). Filter Optimization for Personal Sound Zones Systems [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18611
    corecore