    Clustering Inverse Beamforming and multi-domain acoustic imaging approaches for vehicles NVH

    Il rumore percepito all’interno della cabina di un veicolo è un aspetto molto rilevante nella valutazione della sua qualità complessiva. Metodi sperimentali di acoustic imaging, quali beamforming e olografia acustica, sono usati per identificare le principali sorgenti che contribuiscono alla rumorosità percepita all’interno del veicolo. L’obiettivo della tesi proposta è di fornire strumenti per effettuare dettagliate analisi quantitative tramite tali tecniche, ad oggi relegate alle fasi di studio preliminare, proponendo un approccio modulare che si avvale di analisi dei fenomeni vibro-acustici nel dominio della frequenza, del tempo e dell’angolo di rotazione degli elementi rotanti tipicamente presenti in un veicolo. Ciò permette di ridurre tempi e costi della progettazione, garantendo, al contempo, una maggiore qualità del pacchetto vibro-acustico. L’innovativo paradigma proposto prevede l’uso combinato di algoritmi di pre- e post- processing con tecniche inverse di acoustic imaging per lo studio di rilevanti problematiche quali l’identificazione di sorgenti sonore esterne o interne all’abitacolo e del rumore prodotto da dispositivi rotanti. Principale elemento innovativo della tesi è la tecnica denominata Clustering Inverse Beamforming. Essa si basa su un approccio statistico che permette di incrementare l’accuratezza (range dinamico, localizzazione e quantificazione) di una immagine acustica tramite la combinazione di soluzioni, del medesimo problema inverso, ottenute considerando diversi sotto-campioni dell’informazione sperimentale disponibile, variando, in questo modo, in maniera casuale la sua formulazione matematica. Tale procedimento garantisce la ricostruzione nel dominio della frequenza e del tempo delle sorgenti sonore identificate. Un metodo innovativo è stato inoltre proposto per la ricostruzione, ove necessario, di sorgenti sonore nel dominio dell’angolo. I metodi proposti sono stati supportati da argomentazioni teoriche e validazioni sperimentali su scala accademica e industriale.The interior sound perceived in vehicle cabins is a very important attribute for the user. Experimental acoustic imaging methods such as beamforming and Near-field Acoustic Holography are used in vehicles noise and vibration studies because they are capable of identifying the noise sources contributing to the overall noise perceived inside the cabin. However these techniques are often relegated to the troubleshooting phase, thus requiring additional experiments for more detailed NVH analyses. It is therefore desirable that such methods evolve towards more refined solutions capable of providing a larger and more detailed information. This thesis proposes a modular and multi-domain approach involving direct and inverse acoustic imaging techniques for providing quantitative and accurate results in frequency, time and angle domain, thus targeting three relevant types of problems in vehicles NVH: identification of exterior sources affecting interior noise, interior noise source identification, analysis of noise sources produced by rotating machines. The core finding of this thesis is represented by a novel inverse acoustic imaging method named Clustering Inverse Beamforming (CIB). The method grounds on a statistical processing based on an Equivalent Source Method formulation. In this way, an accurate localization, a reliable ranking of the identified sources in frequency domain and their separation into uncorrelated phenomena is obtained. CIB is also exploited in this work for allowing the reconstruction of the time evolution of the sources sought. Finally a methodology for decomposing the acoustic image of the sound field generated by a rotating machine as a function of the angular evolution of the machine shaft is proposed. This set of findings aims at contributing to the advent of a new paradigm of acoustic imaging applications in vehicles NVH, supporting all the stages of the vehicle design with time-saving and cost-efficient experimental techniques. The proposed innovative approaches are validated on several simulated and real experiments

    Robust Multichannel Microphone Beamforming

    In this thesis, a method for the design and implementation of a spatially robust multichannel microphone beamforming system is presented. A set of spatial correlation functions are derived for 2D and 3D far-field/near-field scenarios based on von Mises(-Fisher), Gaussian, and uniform source location distributions. These correlation functions are used to design spatially robust beamformers and blocking beamformers (nullformers) designed to enhance or suppress a known source, where the target source location is not perfectly known due to either an incorrect location estimate or movement of the target while the beamformers are active. The spatially robust beam/null-formers form signal and interferer plus noise references which can be further processed via a blind source separation algorithm to remove mutual components - removing the interference and sensor noise from the signal path and vice versa. The noise reduction performance of the combined beamforming and blind source separation system approaches that of a perfect information MVDR beamformer under reverberant conditions. It is demonstrated that the proposed algorithm can be implemented on low-power hardware with good performance on hardware similar to current mobile platforms using a four-element microphone array

    Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

    Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks

    Perceptually motivated blind source separation of convolutive audio mixtures

    Array signal processing robust to pointing errors

    The objective of this thesis is to design computationally efficient DOA (direction-of- arrival) estimation algorithms and beamformers robust to pointing errors, by harnessing the antenna geometrical information and received signals. Initially, two fast root-MUSIC-type DOA estimation algorithms are developed, which can be applied in arbitrary arrays. Instead of computing all roots, the first proposed iterative algorithm calculates the wanted roots only. The second IDFT-based method obtains the DOAs by scanning a few circles in parallel and thus the rooting is avoided. Both proposed algorithms, with less computational burden, have the asymptotically similar performance to the extended root-MUSIC. The second main contribution in this thesis is concerned with the matched direction beamformer (MDB), without using the interference subspace. The manifold vector of the desired signal is modeled as a vector lying in a known linear subspace, but the associated linear combination vector is otherwise unknown due to pointing errors. This vector can be found by computing the principal eigen-vector of a certain rank-one matrix. Then a MDB is constructed which is robust to both pointing errors and overestimation of the signal subspace dimension. Finally, an interference cancellation beamformer robust to pointing errors is considered. By means of vector space projections, much of the pointing error can be eliminated. A one-step power estimation is derived by using the theory of covariance fitting. Then an estimate-and-subtract interference canceller beamformer is proposed, in which the power inversion problem is avoided and the interferences can be cancelled completely

    Distributed audio network for speech enhancement in challenging noise backgrounds

    This paper presents a new approach to enhance speech based on a distributed microphone network. Each microphone is used to simultaneously classify the input into either one of the noise types or as speech. For enhancing the speech signal a modified spectral subtraction approach is used that utilise the sound information of the entire network to update the noise model even during speech. This improves the reduction of the ambient noise, especially for non-stationary noise types such as street or beach noise. Experiments demonstrate the effectiveness of the proposed system

    Subband beamforming with higher order statistics for distant speech recognition

    This dissertation presents novel beamforming methods for distant speech recognition (DSR). Such techniques can relieve users from the necessity of putting on close talking microphones. DSR systems are useful in many applications such as humanoid robots, voice control systems for automobiles, automatic meeting transcription systems and so on. A main problem in DSR is that recognition performance is seriously degraded when a speaker is far from the microphones. In order to avoid the degradation, noise and reverberation should be removed from signals received with the microphones. Acoustic beamforming techniques have a potential to enhance speech from the far field with little distortion since they can maintain a distortionless constraint for a look direction. In beamforming, multiple signals propagating from a position are captured with multiple microphones. Typical conventional beamformers then adjust their weights so as to minimize the variance of their own outputs subject to a distortionless constraint in a look direction. The variance is the average of the second power (square) of the beamformer\u27s outputs. Accordingly, it is considered that the conventional beamformer uses second orderstatistics (SOS) of the beamformer\u27s outputs. The conventional beamforming techniques can effectively place a null on any source of interference. However, the desired signal is also canceled in reverberant environments, which is known as the signal cancellation problem. To avoid that problem, many algorithms have been developed. However, none of the algorithms can essentially solve the signal cancellation problem in reverberant environments. While many efforts have been made in order to overcome the signal cancellation problem in the field of acoustic beamforming, researchers have addressed another research issue with the microphone array, that is, blind source separation (BSS) [1]. The BSS techniques aim at separating sources from the mixture of signals without information about the geometry of the microphone array and positions of sources. It is achieved by multiplying an un-mixing matrix with input signals. The un-mixing matrix is constructed so that the outputs are stochastically independent. Measuring the stochastic independence of the signals is based on the theory of the independent component analysis (ICA) [1]. The field of ICA is based on the fact that distributions of information-bearing signals are not Gaussian and distributions of sums of various signals are close to Gaussian. There are two popular criteria for measuring the degree of the non-Gaussianity, namely, kurtosis and negentropy. As described in detail in this thesis, both criteria use more than the second moment. Accordingly, it is referred to as higher order statistics (HOS) in contrast to SOS. HOS is not considered in the field of acoustic beamforming well although Arai et al. showed the similarity between acoustic beamforming and BSS [2]. This thesis investigates new beamforming algorithms which take into consideration higher-order statistics (HOS). The new beamforming methods adjust the beamformer\u27s weights based on one of the following criteria: • minimum mutual information of the two beamformer\u27s outputs, • maximum negentropy of the beamformer\u27s outputs and • maximum kurtosis of the beamformer\u27s outputs. Those algorithms do not suffer from the signal cancellation, which is shown in this thesis. Notice that the new beamforming techniques can keep the distortionless constraint for the direction of interest in contrast to the BSS algorithms. The effectiveness of the new techniques is finally demonstrated through a series of distant automatic speech recognition experiments on real data recorded with real sensors unlike other work where signals artificially convolved with measured impulse responses are considered. Significant improvements are achieved by the beamforming algorithms proposed here.Diese Dissertation präsentiert neue Methoden zur Spracherkennung auf Entfernung. Mit diesen Methoden ist es möglich auf Nahbesprechungsmikrofone zu verzichten. Spracherkennungssysteme, die auf Nahbesprechungsmikrofone verzichten, sind in vielen Anwendungen nützlich, wie zum Beispiel bei Humanoiden-Robotern, in Voice Control Systemen für Autos oder bei automatischen Transcriptionssystemen von Meetings. Ein Hauptproblem in der Spracherkennung auf Entfernung ist, dass mit zunehmendem Abstand zwischen Sprecher und Mikrofon, die Genauigkeit der Spracherkennung stark abnimmt. Aus diesem Grund ist es elementar die Störungen, nämlich Hintergrundgeräusche, Hall und Echo, aus den Mikrofonsignalen herauszurechnen. Durch den Einsatz von mehreren Mikrofonen ist eine räumliche Trennung des Nutzsignals von den Störungen möglich. Diese Methode wird als akustisches Beamformen bezeichnet. Konventionelle akustische Beamformer passen ihre Gewichte so an, dass die Varianz des Ausgangssignals minimiert wird, wobei das Signal in "Blickrichtung" die Bedingung der Verzerrungsfreiheit erfüllen muss. Die Varianz ist definiert als das quadratische Mittel des Ausgangssignals.Somit werden bei konventionellen Beamformingmethoden Second-Order Statistics (SOS) des Ausgangssignals verwendet. Konventionelle Beamformer können Störquellen effizient unterdrücken, aber leider auch das Nutzsignal. Diese unerwünschte Unterdrückung des Nutzsignals wird im Englischen signal cancellation genannt und es wurden bereits viele Algorithmen entwickelt um dies zu vermeiden. Keiner dieser Algorithmen, jedoch, funktioniert effektiv in verhallter Umgebung. Eine weitere Methode das Nutzsignal von den Störungen zu trennen, diesesmal jedoch ohne die geometrische Information zu nutzen, wird Blind Source Separation (BSS) [1] genannt. Hierbei wird eine Matrixmultiplikation mit dem Eingangssignal durchgeführt. Die Matrix muss so konstruiert werden, dass die Ausgangssignale statistisch unabhängig voneinander sind. Die statistische Unabhängigkeit wird mit der Theorie der Independent Component Analysis (ICA) gemessen [1]. Die ICA nimmt an, dass informationstragende Signale, wie z.B. Sprache, nicht gaußverteilt sind, wohingegen die Summe der Signale, z.B. das Hintergrundrauschen, gaußverteilt sind. Es gibt zwei gängige Arten um den Grad der Nichtgaußverteilung zu bestimmen, Kurtosis und Negentropy. Wie in dieser Arbeit beschrieben, werden hierbei höhere Momente als das zweite verwendet und somit werden diese Methoden als Higher-Order Statistics (HOS) bezeichnet. Obwohl Arai et al. zeigten, dass sich Beamforming und BSS ähnlich sind, werden HOS beim akustischen Beamforming bisher nicht verwendet [2] und beruhen weiterhin auf SOS. In der hier vorliegenden Dissertation werden neue Beamformingalgorithmen entwickelt und evaluiert, die auf HOS basieren. Die neuen Beamformingmethoden passen ihre Gewichte anhand eines der folgenden Kriterien an: • Minimum Mutual Information zweier Beamformer Ausgangssignale • Maximum Negentropy der Beamformer Ausgangssignale und • Maximum Kurtosis der Beamformer Ausgangssignale. Es wird anhand von Spracherkennerexperimenten (gemessen in Wortfehlerrate) gezeigt, dass die hier entwickelten Beamformingtechniken auch erfolgreich Störquellen in verhallten Umgebungen unterdrücken, was ein klarer Vorteil gegenüber den herkömmlichen Methoden ist
