92 research outputs found

    Spherical microphone array acoustic rake receivers

    Get PDF
    Several signal independent acoustic rake receivers are proposed for speech dereverberation using spherical microphone arrays. The proposed rake designs take advantage of multipaths, by separately capturing and combining early reflections with the direct path. We investigate several approaches in combining reflections with the direct path source signal, including the development of beam patterns that point nulls at all preceding reflections. The proposed designs are tested in experimental simulations and their dereverberation performances evaluated using objective measures. For the tested configuration, the proposed designs achieve higher levels of dereverberation compared to conventional signal independent beamforming systems; achieving up to 3.6 dB improvement in the direct-to-reverberant ratio over the plane-wave decomposition beamformer

    Implementation and evaluation of a low complexity microphone array for speaker recognition

    Get PDF
    Includes bibliographical references (leaves 83-86).This thesis discusses the application of a microphone array employing a noise canceling beamforming technique for improving the robustness of speaker recognition systems in a diffuse noise field

    Robust Multichannel Microphone Beamforming

    No full text
    In this thesis, a method for the design and implementation of a spatially robust multichannel microphone beamforming system is presented. A set of spatial correlation functions are derived for 2D and 3D far-field/near-field scenarios based on von Mises(-Fisher), Gaussian, and uniform source location distributions. These correlation functions are used to design spatially robust beamformers and blocking beamformers (nullformers) designed to enhance or suppress a known source, where the target source location is not perfectly known due to either an incorrect location estimate or movement of the target while the beamformers are active. The spatially robust beam/null-formers form signal and interferer plus noise references which can be further processed via a blind source separation algorithm to remove mutual components - removing the interference and sensor noise from the signal path and vice versa. The noise reduction performance of the combined beamforming and blind source separation system approaches that of a perfect information MVDR beamformer under reverberant conditions. It is demonstrated that the proposed algorithm can be implemented on low-power hardware with good performance on hardware similar to current mobile platforms using a four-element microphone array

    Subband beamforming with higher order statistics for distant speech recognition

    Get PDF
    This dissertation presents novel beamforming methods for distant speech recognition (DSR). Such techniques can relieve users from the necessity of putting on close talking microphones. DSR systems are useful in many applications such as humanoid robots, voice control systems for automobiles, automatic meeting transcription systems and so on. A main problem in DSR is that recognition performance is seriously degraded when a speaker is far from the microphones. In order to avoid the degradation, noise and reverberation should be removed from signals received with the microphones. Acoustic beamforming techniques have a potential to enhance speech from the far field with little distortion since they can maintain a distortionless constraint for a look direction. In beamforming, multiple signals propagating from a position are captured with multiple microphones. Typical conventional beamformers then adjust their weights so as to minimize the variance of their own outputs subject to a distortionless constraint in a look direction. The variance is the average of the second power (square) of the beamformer\u27s outputs. Accordingly, it is considered that the conventional beamformer uses second orderstatistics (SOS) of the beamformer\u27s outputs. The conventional beamforming techniques can effectively place a null on any source of interference. However, the desired signal is also canceled in reverberant environments, which is known as the signal cancellation problem. To avoid that problem, many algorithms have been developed. However, none of the algorithms can essentially solve the signal cancellation problem in reverberant environments. While many efforts have been made in order to overcome the signal cancellation problem in the field of acoustic beamforming, researchers have addressed another research issue with the microphone array, that is, blind source separation (BSS) [1]. The BSS techniques aim at separating sources from the mixture of signals without information about the geometry of the microphone array and positions of sources. It is achieved by multiplying an un-mixing matrix with input signals. The un-mixing matrix is constructed so that the outputs are stochastically independent. Measuring the stochastic independence of the signals is based on the theory of the independent component analysis (ICA) [1]. The field of ICA is based on the fact that distributions of information-bearing signals are not Gaussian and distributions of sums of various signals are close to Gaussian. There are two popular criteria for measuring the degree of the non-Gaussianity, namely, kurtosis and negentropy. As described in detail in this thesis, both criteria use more than the second moment. Accordingly, it is referred to as higher order statistics (HOS) in contrast to SOS. HOS is not considered in the field of acoustic beamforming well although Arai et al. showed the similarity between acoustic beamforming and BSS [2]. This thesis investigates new beamforming algorithms which take into consideration higher-order statistics (HOS). The new beamforming methods adjust the beamformer\u27s weights based on one of the following criteria: • minimum mutual information of the two beamformer\u27s outputs, • maximum negentropy of the beamformer\u27s outputs and • maximum kurtosis of the beamformer\u27s outputs. Those algorithms do not suffer from the signal cancellation, which is shown in this thesis. Notice that the new beamforming techniques can keep the distortionless constraint for the direction of interest in contrast to the BSS algorithms. The effectiveness of the new techniques is finally demonstrated through a series of distant automatic speech recognition experiments on real data recorded with real sensors unlike other work where signals artificially convolved with measured impulse responses are considered. Significant improvements are achieved by the beamforming algorithms proposed here.Diese Dissertation präsentiert neue Methoden zur Spracherkennung auf Entfernung. Mit diesen Methoden ist es möglich auf Nahbesprechungsmikrofone zu verzichten. Spracherkennungssysteme, die auf Nahbesprechungsmikrofone verzichten, sind in vielen Anwendungen nützlich, wie zum Beispiel bei Humanoiden-Robotern, in Voice Control Systemen für Autos oder bei automatischen Transcriptionssystemen von Meetings. Ein Hauptproblem in der Spracherkennung auf Entfernung ist, dass mit zunehmendem Abstand zwischen Sprecher und Mikrofon, die Genauigkeit der Spracherkennung stark abnimmt. Aus diesem Grund ist es elementar die Störungen, nämlich Hintergrundgeräusche, Hall und Echo, aus den Mikrofonsignalen herauszurechnen. Durch den Einsatz von mehreren Mikrofonen ist eine räumliche Trennung des Nutzsignals von den Störungen möglich. Diese Methode wird als akustisches Beamformen bezeichnet. Konventionelle akustische Beamformer passen ihre Gewichte so an, dass die Varianz des Ausgangssignals minimiert wird, wobei das Signal in "Blickrichtung" die Bedingung der Verzerrungsfreiheit erfüllen muss. Die Varianz ist definiert als das quadratische Mittel des Ausgangssignals.Somit werden bei konventionellen Beamformingmethoden Second-Order Statistics (SOS) des Ausgangssignals verwendet. Konventionelle Beamformer können Störquellen effizient unterdrücken, aber leider auch das Nutzsignal. Diese unerwünschte Unterdrückung des Nutzsignals wird im Englischen signal cancellation genannt und es wurden bereits viele Algorithmen entwickelt um dies zu vermeiden. Keiner dieser Algorithmen, jedoch, funktioniert effektiv in verhallter Umgebung. Eine weitere Methode das Nutzsignal von den Störungen zu trennen, diesesmal jedoch ohne die geometrische Information zu nutzen, wird Blind Source Separation (BSS) [1] genannt. Hierbei wird eine Matrixmultiplikation mit dem Eingangssignal durchgeführt. Die Matrix muss so konstruiert werden, dass die Ausgangssignale statistisch unabhängig voneinander sind. Die statistische Unabhängigkeit wird mit der Theorie der Independent Component Analysis (ICA) gemessen [1]. Die ICA nimmt an, dass informationstragende Signale, wie z.B. Sprache, nicht gaußverteilt sind, wohingegen die Summe der Signale, z.B. das Hintergrundrauschen, gaußverteilt sind. Es gibt zwei gängige Arten um den Grad der Nichtgaußverteilung zu bestimmen, Kurtosis und Negentropy. Wie in dieser Arbeit beschrieben, werden hierbei höhere Momente als das zweite verwendet und somit werden diese Methoden als Higher-Order Statistics (HOS) bezeichnet. Obwohl Arai et al. zeigten, dass sich Beamforming und BSS ähnlich sind, werden HOS beim akustischen Beamforming bisher nicht verwendet [2] und beruhen weiterhin auf SOS. In der hier vorliegenden Dissertation werden neue Beamformingalgorithmen entwickelt und evaluiert, die auf HOS basieren. Die neuen Beamformingmethoden passen ihre Gewichte anhand eines der folgenden Kriterien an: • Minimum Mutual Information zweier Beamformer Ausgangssignale • Maximum Negentropy der Beamformer Ausgangssignale und • Maximum Kurtosis der Beamformer Ausgangssignale. Es wird anhand von Spracherkennerexperimenten (gemessen in Wortfehlerrate) gezeigt, dass die hier entwickelten Beamformingtechniken auch erfolgreich Störquellen in verhallten Umgebungen unterdrücken, was ein klarer Vorteil gegenüber den herkömmlichen Methoden ist

    Distributed Microphone Array System for Two-way Audio Communication

    Get PDF
    Tässä työssä esitellään hajautettu mikrofoniryhmäjärjestelmä kahdensuuntaisessa äänikommunikaatiossa. Järjestelmän tavoitteena on paikallistaa hallitseva puhuja ja tallentaa puhesignaali mahdollisimman korkealaatuisesti. Työssä esiteltävässä järjestelmässä jokainen mikrofoniryhmä toimii polynomirakenteella parametrisoituna keilanmuodostajana (PBF), joka mahdollistaa jatkuvan keilanohjauksen. Hallitsevan puhelähteen suunta päätellään PBF:n jokaisen keilan ulostulotehoista. Lopuksi yhdistämällä jokaisen PBF:n kaikkien keilojen ulostulotehot muodostetaan avaruudellinen todennäköisyysfunktio (SLF), jonka suurin arvo määrää puhujan paikan. Puhesignaali tallennetaan ohjaamalla puhujaa lähinnä olevan PBF:n keila puhujan suuntaan. Tässä työssä esiteltävän järjestelmän toiminta arvioitiin simuloidulla ja mitatulla datalla. Arvionti näyttää, että toteutettu järjestelmä pystyy paikallistamaan puhujan noin 40 cm paikannustarkkuudella ja järjestelmä vaimentaa muista suunnista tulevia häiriölähteitä noin 15 dB. Lopuksi järjestelmä toteutettiin reaaliakaisena systeeminä Pure Data signaalinkäsittelyympäristössä.In this work a distributed microphone array system for two-way audio communication is presented. The goal of the system is to locate the dominant speaker and capture the speech signal with highest possible quality. In the presented system each microphone array works as a Polynomial Beamformer (PBF) thus enabling continuous beam steering. The output power of each PBF beam is used to determine the direction of the dominant speech source. Finally, a Spatial Likelihood Function (SLF) is formed by combining the output beam powers of each microphone array and the speaker is determined to be in the point that has highest value of SLF. The audio signal capture is done by steering the closest microphone array to the direction of the speaker. The presented audio capture front-end was evaluated with simulated and measured data. The evaluation shows that the implemented system gives approximately 40 cm localization accuracy and 15 dB attenuation of interference sources. Finally the system was implemented to run in real-time in the Pure Data signal processing environment

    MVDR broadband beamforming using polynomial matrix techniques

    Get PDF
    This thesis addresses the formulation of and solution to broadband minimum variance distortionless response (MVDR) beamforming. Two approaches to this problem are considered, namely, generalised sidelobe canceller (GSC) and Capon beamformers. These are examined based on a novel technique which relies on polynomial matrix formulations. The new scheme is based on the second order statistics of the array sensor measurements in order to estimate a space-time covariance matrix. The beamforming problem can be formulated based on this space-time covariance matrix. Akin to the narrowband problem, where an optimum solution can be derived from the eigenvalue decomposition (EVD) of a constant covariance matrix, this utility is here extended to the broadband case. The decoupling of the space-time covariance matrix in this case is provided by means of a polynomial matrix EVD. The proposed approach is initially exploited to design a GSC beamformer for a uniform linear array, and then extended to the constrained MVDR, or Capon, beamformer and also the GSC with an arbitrary array structure. The uniqueness of the designed GSC comes from utilising the polynomial matrix technique, and its ability to steer the array beam towards an off-broadside direction without the pre-steering stage that is associated with conventional approaches to broadband beamformers. To solve the broadband beamforming problem, this thesis addresses a number of additional tools. A first one is the accurate construction of both the steering vectors based on fractional delay filters, which are required for the broadband constraint formulation of a beamformer, as for the construction of the quiescent beamformer. In the GSC case, we also discuss how a block matrix can be obtained, and introduce a novel paraunitary matrix completion algorithm. For the Capon beamformer, the polynomial extension requires the inversion of a polynomial matrix, for which a residue-based method is proposed that offers better accuracy compared to previously utilised approaches. These proposed polynomial matrix techniques are evaluated in a number of simulations. The results show that the polynomial broadband beamformer (PBBF) steersthe main beam towards the direction of the signal of interest (SoI) and protects the signal over the specified bandwidth, and at the same time suppresses unwanted signals by placing nulls in their directions. In addition to that, the PBBF is compared to the standard time domain broadband beamformer in terms of their mean square error performance, beam-pattern, and computation complexity. This comparison shows that the PBBF can offer a significant reduction in computation complexity compared to its standard counterpart. Overall, the main benefits of this approach include beam steering towards an arbitrary look direction with no need for pre-steering step, and a potentially significant reduction in computational complexity due to the decoupling of dependencies of the quiescent beamformer, blocking matrix, and the adaptive filter compared to a standard broadband beamformer implementation.This thesis addresses the formulation of and solution to broadband minimum variance distortionless response (MVDR) beamforming. Two approaches to this problem are considered, namely, generalised sidelobe canceller (GSC) and Capon beamformers. These are examined based on a novel technique which relies on polynomial matrix formulations. The new scheme is based on the second order statistics of the array sensor measurements in order to estimate a space-time covariance matrix. The beamforming problem can be formulated based on this space-time covariance matrix. Akin to the narrowband problem, where an optimum solution can be derived from the eigenvalue decomposition (EVD) of a constant covariance matrix, this utility is here extended to the broadband case. The decoupling of the space-time covariance matrix in this case is provided by means of a polynomial matrix EVD. The proposed approach is initially exploited to design a GSC beamformer for a uniform linear array, and then extended to the constrained MVDR, or Capon, beamformer and also the GSC with an arbitrary array structure. The uniqueness of the designed GSC comes from utilising the polynomial matrix technique, and its ability to steer the array beam towards an off-broadside direction without the pre-steering stage that is associated with conventional approaches to broadband beamformers. To solve the broadband beamforming problem, this thesis addresses a number of additional tools. A first one is the accurate construction of both the steering vectors based on fractional delay filters, which are required for the broadband constraint formulation of a beamformer, as for the construction of the quiescent beamformer. In the GSC case, we also discuss how a block matrix can be obtained, and introduce a novel paraunitary matrix completion algorithm. For the Capon beamformer, the polynomial extension requires the inversion of a polynomial matrix, for which a residue-based method is proposed that offers better accuracy compared to previously utilised approaches. These proposed polynomial matrix techniques are evaluated in a number of simulations. The results show that the polynomial broadband beamformer (PBBF) steersthe main beam towards the direction of the signal of interest (SoI) and protects the signal over the specified bandwidth, and at the same time suppresses unwanted signals by placing nulls in their directions. In addition to that, the PBBF is compared to the standard time domain broadband beamformer in terms of their mean square error performance, beam-pattern, and computation complexity. This comparison shows that the PBBF can offer a significant reduction in computation complexity compared to its standard counterpart. Overall, the main benefits of this approach include beam steering towards an arbitrary look direction with no need for pre-steering step, and a potentially significant reduction in computational complexity due to the decoupling of dependencies of the quiescent beamformer, blocking matrix, and the adaptive filter compared to a standard broadband beamformer implementation

    Akustinen monimikrofonikeilanmuodostus käytännöllisessä laitteessa

    Get PDF
    Signal quality and noiselessness is important for Automatic Speech Recognition and communications. These can be achieved with the help of microphone arrays and beamforming. Multiple microphones can be used to pick sound coming from one direction while attenuating other directions. In this thesis, three arrays and three beamformers are compared. The target is to see how different structures affect the output signal and if one of the arrays is superior to the others. The results show that the baffled array appears bigger than the reference array or the array on the top surface of a cylinder, but it adds complexity to the beamformers. For Automatic Speech Recognition, all three arrays performed the same, but there were clear differences between the beamforming algorithms. Thus the array to use depends on the use-case and especially on the industrial design of the device.Äänenlaatu ja kohinattomuus ovat tärkeitä automaattisen puheentunnistuksen ja puheviestinnän toimivuuden kannalta. Mikrofoniryhmiä ja keilanmuodostusta voidaan käyttää parantamaan signaalin laatua ja vähentämään kohinaa nauhoittamalla vain yhdestä suunnasta tulevaa ääntä. Tässä työssä vertaillaan kolmea mikrofoniryhmää ja kolmea keilanmuodostusalgoritmia. Tarkoituksena on selvittää miten erilaiset rakenteet, joihin mikrofoniryhmä on kiinnitettynä vaikuttaa nauhoitettuun signaaliin, ja jos jokin ryhmistä on merkittävästi parempi kuin muut. Tulokset osoittavat, että sylinterin ulkoreunalle kiinnitetty mikrofoniryhmä vaikuttaa akustisesti suuremmalta kuin referenssi mikrofoniryhmä tai sylinterin päälipinnalle kiinnitetty mikrofoniryhmä, mutta se lisää keilanmuodostusalgoritmin monimutkaisuutta. Automaattisessa puheentunnistuksessa kaikki kolme mikrofoniryhmää suoriutuivat yhtä hyvin, mutta keilanmuodostusalgoritmien suorituskyvyssä oli selkeät erot. Käytännössä siis mikrofoniryhmä kannattaa valita käyttötarkoitukseen ja etenkin lopullisen tuotteen teollisen muotoiluun sopivaksi

    Broadband adaptive beamforming with low complexity and frequency invariant response

    No full text
    This thesis proposes different methods to reduce the computational complexity as well as increasing the adaptation rate of adaptive broadband beamformers. This is performed exemplarily for the generalised sidelobe canceller (GSC) structure. The GSC is an alternative implementation of the linearly constrained minimum variance beamformer, which can utilise well-known adaptive filtering algorithms, such as the least mean square (LMS) or the recursive least squares (RLS) to perform unconstrained adaptive optimisation.A direct DFT implementation, by which broadband signals are decomposed into frequency bins and processed by independent narrowband beamforming algorithms, is thought to be computationally optimum. However, this setup fail to converge to the time domain minimum mean square error (MMSE) if signal components are not aligned to frequency bins, resulting in a large worst case error. To mitigate this problem of the so-called independent frequency bin (IFB) processor, overlap-save based GSC beamforming structures have been explored. This system address the minimisation of the time domain MMSE, with a significant reduction in computational complexity when compared to time-domain implementations, and show a better convergence behaviour than the IFB beamformer. By studying the effects that the blocking matrix has on the adaptive process for the overlap-save beamformer, several modifications are carried out to enhance both the simplicity of the algorithm as well as its convergence speed. These modifications result in the GSC beamformer utilising a significantly lower computational complexity compare to the time domain approach while offering similar convergence characteristics.In certain applications, especially in the areas of acoustics, there is a need to maintain constant resolution across a wide operating spectrum that may extend across several octaves. To attain constant beamwidth is difficult, particularly if uniformly spaced linear sensor array are employed for beamforming, since spatial resolution is reciprocally proportional to both the array aperture and the frequency. A scaled aperture arrangement is introduced for the subband based GSC beamformer to achieve near uniform resolution across a wide spectrum, whereby an octave-invariant design is achieved. This structure can also be operated in conjunction with adaptive beamforming algorithms. Frequency dependent tapering of the sensor signals is proposed in combination with the overlap-save GSC structure in order to achieve an overall frequency-invariant characteristic. An adaptive version is proposed for frequency-invariant overlap-save GSC beamformer. Broadband adaptive beamforming algorithms based on the family of least mean squares (LMS) algorithms are known to exhibit slow convergence if the input signal is correlated. To improve the convergence of the GSC when based on LMS-type algorithms, we propose the use of a broadband eigenvalue decomposition (BEVD) to decorrelate the input of the adaptive algorithm in the spatial dimension, for which an increase in convergence speed can be demonstrated over other decorrelating measures, such as the Karhunen-Loeve transform. In order to address the remaining temporal correlation after BEVD processing, this approach is combined with subband decomposition through the use of oversampled filter banks. The resulting spatially and temporally decorrelated GSC beamformer provides further enhanced convergence speed over spatial or temporal decorrelation methods on their own

    Acoustic event detection and localization using distributed microphone arrays

    Get PDF
    Automatic acoustic scene analysis is a complex task that involves several functionalities: detection (time), localization (space), separation, recognition, etc. This thesis focuses on both acoustic event detection (AED) and acoustic source localization (ASL), when several sources may be simultaneously present in a room. In particular, the experimentation work is carried out with a meeting-room scenario. Unlike previous works that either employed models of all possible sound combinations or additionally used video signals, in this thesis, the time overlapping sound problem is tackled by exploiting the signal diversity that results from the usage of multiple microphone array beamformers. The core of this thesis work is a rather computationally efficient approach that consists of three processing stages. In the first, a set of (null) steering beamformers is used to carry out diverse partial signal separations, by using multiple arbitrarily located linear microphone arrays, each of them composed of a small number of microphones. In the second stage, each of the beamformer output goes through a classification step, which uses models for all the targeted sound classes (HMM-GMM, in the experiments). Then, in a third stage, the classifier scores, either being intra- or inter-array, are combined using a probabilistic criterion (like MAP) or a machine learning fusion technique (fuzzy integral (FI), in the experiments). The above-mentioned processing scheme is applied in this thesis to a set of complexity-increasing problems, which are defined by the assumptions made regarding identities (plus time endpoints) and/or positions of sounds. In fact, the thesis report starts with the problem of unambiguously mapping the identities to the positions, continues with AED (positions assumed) and ASL (identities assumed), and ends with the integration of AED and ASL in a single system, which does not need any assumption about identities or positions. The evaluation experiments are carried out in a meeting-room scenario, where two sources are temporally overlapped; one of them is always speech and the other is an acoustic event from a pre-defined set. Two different databases are used, one that is produced by merging signals actually recorded in the UPCÂżs department smart-room, and the other consists of overlapping sound signals directly recorded in the same room and in a rather spontaneous way. From the experimental results with a single array, it can be observed that the proposed detection system performs better than either the model based system or a blind source separation based system. Moreover, the product rule based combination and the FI based fusion of the scores resulting from the multiple arrays improve the accuracies further. On the other hand, the posterior position assignment is performed with a very small error rate. Regarding ASL and assuming an accurate AED system output, the 1-source localization performance of the proposed system is slightly better than that of the widely-used SRP-PHAT system, working in an event-based mode, and it even performs significantly better than the latter one in the more complex 2-source scenario. Finally, though the joint system suffers from a slight degradation in terms of classification accuracy with respect to the case where the source positions are known, it shows the advantage of carrying out the two tasks, recognition and localization, with a single system, and it allows the inclusion of information about the prior probabilities of the source positions. It is worth noticing also that, although the acoustic scenario used for experimentation is rather limited, the approach and its formalism were developed for a general case, where the number and identities of sources are not constrained
    • …
    corecore