49 research outputs found

    Filter Bank Design for Subband Adaptive Beamforming and Application to Speech Recognition

    Get PDF
    \begin{abstract} We present a new filter bank design method for subband adaptive beamforming. Filter bank design for adaptive filtering poses many problems not encountered in more traditional applications such as subband coding of speech or music. The popular class of perfect reconstruction filter banks is not well-suited for applications involving adaptive filtering because perfect reconstruction is achieved through alias cancellation, which functions correctly only if the outputs of individual subbands are \emph{not} subject to arbitrary magnitude scaling and phase shifts. In this work, we design analysis and synthesis prototypes for modulated filter banks so as to minimize each aliasing term individually. We then show that the \emph{total response error} can be driven to zero by constraining the analysis and synthesis prototypes to be \emph{Nyquist(MM)} filters. We show that the proposed filter banks are more robust for aliasing caused by adaptive beamforming than conventional methods. Furthermore, we demonstrate the effectiveness of our design technique through a set of automatic speech recognition experiments on the multi-channel, far-field speech data from the \emph{PASCAL Speech Separation Challenge}. In our system, speech signals are first transformed into the subband domain with the proposed filter banks, and thereafter the subband components are processed with a beamforming algorithm. Following beamforming, post-filtering and binary masking are performed to further enhance the speech by removing residual noise and undesired speech. The experimental results prove that our beamforming system with the proposed filter banks achieves the best recognition performance, a 39.6\% word error rate (WER), with half the amount of computation of that of the conventional filter banks while the perfect reconstruction filter banks provided a 44.4\% WER. \end{abstract

    Subband beamforming with higher order statistics for distant speech recognition

    Get PDF
    This dissertation presents novel beamforming methods for distant speech recognition (DSR). Such techniques can relieve users from the necessity of putting on close talking microphones. DSR systems are useful in many applications such as humanoid robots, voice control systems for automobiles, automatic meeting transcription systems and so on. A main problem in DSR is that recognition performance is seriously degraded when a speaker is far from the microphones. In order to avoid the degradation, noise and reverberation should be removed from signals received with the microphones. Acoustic beamforming techniques have a potential to enhance speech from the far field with little distortion since they can maintain a distortionless constraint for a look direction. In beamforming, multiple signals propagating from a position are captured with multiple microphones. Typical conventional beamformers then adjust their weights so as to minimize the variance of their own outputs subject to a distortionless constraint in a look direction. The variance is the average of the second power (square) of the beamformer\u27s outputs. Accordingly, it is considered that the conventional beamformer uses second orderstatistics (SOS) of the beamformer\u27s outputs. The conventional beamforming techniques can effectively place a null on any source of interference. However, the desired signal is also canceled in reverberant environments, which is known as the signal cancellation problem. To avoid that problem, many algorithms have been developed. However, none of the algorithms can essentially solve the signal cancellation problem in reverberant environments. While many efforts have been made in order to overcome the signal cancellation problem in the field of acoustic beamforming, researchers have addressed another research issue with the microphone array, that is, blind source separation (BSS) [1]. The BSS techniques aim at separating sources from the mixture of signals without information about the geometry of the microphone array and positions of sources. It is achieved by multiplying an un-mixing matrix with input signals. The un-mixing matrix is constructed so that the outputs are stochastically independent. Measuring the stochastic independence of the signals is based on the theory of the independent component analysis (ICA) [1]. The field of ICA is based on the fact that distributions of information-bearing signals are not Gaussian and distributions of sums of various signals are close to Gaussian. There are two popular criteria for measuring the degree of the non-Gaussianity, namely, kurtosis and negentropy. As described in detail in this thesis, both criteria use more than the second moment. Accordingly, it is referred to as higher order statistics (HOS) in contrast to SOS. HOS is not considered in the field of acoustic beamforming well although Arai et al. showed the similarity between acoustic beamforming and BSS [2]. This thesis investigates new beamforming algorithms which take into consideration higher-order statistics (HOS). The new beamforming methods adjust the beamformer\u27s weights based on one of the following criteria: • minimum mutual information of the two beamformer\u27s outputs, • maximum negentropy of the beamformer\u27s outputs and • maximum kurtosis of the beamformer\u27s outputs. Those algorithms do not suffer from the signal cancellation, which is shown in this thesis. Notice that the new beamforming techniques can keep the distortionless constraint for the direction of interest in contrast to the BSS algorithms. The effectiveness of the new techniques is finally demonstrated through a series of distant automatic speech recognition experiments on real data recorded with real sensors unlike other work where signals artificially convolved with measured impulse responses are considered. Significant improvements are achieved by the beamforming algorithms proposed here.Diese Dissertation präsentiert neue Methoden zur Spracherkennung auf Entfernung. Mit diesen Methoden ist es möglich auf Nahbesprechungsmikrofone zu verzichten. Spracherkennungssysteme, die auf Nahbesprechungsmikrofone verzichten, sind in vielen Anwendungen nützlich, wie zum Beispiel bei Humanoiden-Robotern, in Voice Control Systemen für Autos oder bei automatischen Transcriptionssystemen von Meetings. Ein Hauptproblem in der Spracherkennung auf Entfernung ist, dass mit zunehmendem Abstand zwischen Sprecher und Mikrofon, die Genauigkeit der Spracherkennung stark abnimmt. Aus diesem Grund ist es elementar die Störungen, nämlich Hintergrundgeräusche, Hall und Echo, aus den Mikrofonsignalen herauszurechnen. Durch den Einsatz von mehreren Mikrofonen ist eine räumliche Trennung des Nutzsignals von den Störungen möglich. Diese Methode wird als akustisches Beamformen bezeichnet. Konventionelle akustische Beamformer passen ihre Gewichte so an, dass die Varianz des Ausgangssignals minimiert wird, wobei das Signal in "Blickrichtung" die Bedingung der Verzerrungsfreiheit erfüllen muss. Die Varianz ist definiert als das quadratische Mittel des Ausgangssignals.Somit werden bei konventionellen Beamformingmethoden Second-Order Statistics (SOS) des Ausgangssignals verwendet. Konventionelle Beamformer können Störquellen effizient unterdrücken, aber leider auch das Nutzsignal. Diese unerwünschte Unterdrückung des Nutzsignals wird im Englischen signal cancellation genannt und es wurden bereits viele Algorithmen entwickelt um dies zu vermeiden. Keiner dieser Algorithmen, jedoch, funktioniert effektiv in verhallter Umgebung. Eine weitere Methode das Nutzsignal von den Störungen zu trennen, diesesmal jedoch ohne die geometrische Information zu nutzen, wird Blind Source Separation (BSS) [1] genannt. Hierbei wird eine Matrixmultiplikation mit dem Eingangssignal durchgeführt. Die Matrix muss so konstruiert werden, dass die Ausgangssignale statistisch unabhängig voneinander sind. Die statistische Unabhängigkeit wird mit der Theorie der Independent Component Analysis (ICA) gemessen [1]. Die ICA nimmt an, dass informationstragende Signale, wie z.B. Sprache, nicht gaußverteilt sind, wohingegen die Summe der Signale, z.B. das Hintergrundrauschen, gaußverteilt sind. Es gibt zwei gängige Arten um den Grad der Nichtgaußverteilung zu bestimmen, Kurtosis und Negentropy. Wie in dieser Arbeit beschrieben, werden hierbei höhere Momente als das zweite verwendet und somit werden diese Methoden als Higher-Order Statistics (HOS) bezeichnet. Obwohl Arai et al. zeigten, dass sich Beamforming und BSS ähnlich sind, werden HOS beim akustischen Beamforming bisher nicht verwendet [2] und beruhen weiterhin auf SOS. In der hier vorliegenden Dissertation werden neue Beamformingalgorithmen entwickelt und evaluiert, die auf HOS basieren. Die neuen Beamformingmethoden passen ihre Gewichte anhand eines der folgenden Kriterien an: • Minimum Mutual Information zweier Beamformer Ausgangssignale • Maximum Negentropy der Beamformer Ausgangssignale und • Maximum Kurtosis der Beamformer Ausgangssignale. Es wird anhand von Spracherkennerexperimenten (gemessen in Wortfehlerrate) gezeigt, dass die hier entwickelten Beamformingtechniken auch erfolgreich Störquellen in verhallten Umgebungen unterdrücken, was ein klarer Vorteil gegenüber den herkömmlichen Methoden ist

    To separate speech! a system for recognizing simultaneous speech

    Get PDF
    Abstract. The PASCAL Speech Separation Challenge (SSC) is based on a corpus of sentences from the Wall Street Journal task read by two speakers simultaneously and captured with two circular eight-channel microphone arrays. This work describes our system for the recognition of such simultaneous speech. Our system has four principal components: A person tracker returns the locations of both active speakers, as well as segmentation information for each utterance, which are often of unequal length; two beamformers in generalized sidelobe canceller (GSC) configuration separate the simultaneous speech by setting their active weight vectors according to a minimum mutual information (MMI) criterion; a postfilter and binary mask operating on the outputs of the beamformers further enhance the separated speech; and finally an automatic speech recognition (ASR) engine based on a weighted finite-state transducer (WFST) returns the most likely word hypotheses for the separated streams. In addition to optimizing each of these components, we investigated the effect of the filter bank design used to perform subband analysis and synthesis during beamforming. On the SSC development data, our system achieved a word error rate of 39.6%

    Minimum Mutual Information Beamforming for Simultaneous Active Speakers

    Get PDF
    In this work, we consider an acoustic beamforming application where two speakers are simultaneously active. We construct one subband-domain beamformer in \emph{generalized sidelobe canceller} (GSC) configuration for each source. In contrast to normal practice, we then jointly optimize the \emph{active weight vectors} of both GSCs to obtain two output signals with \emph{minimum mutual information} (MMI). Assuming that the subband snapshots are Gaussian-distributed, this MMI criterion reduces to the requirement that the \emph{cross-correlation coefficient} of the subband outputs of the two GSCs vanishes. We also compare separation performance under the Gaussian assumption with that obtained from several super-Gaussian probability density functions (pdfs), namely, the Laplace, K0K_0, and Γ\Gamma pdfs. Our proposed technique provides effective nulling of the undesired source, but without the signal cancellation problems seen in conventional beamforming. Moreover, our technique does not suffer from the source permutation and scaling ambiguities encountered in conventional blind source separation algorithms. We demonstrate the effectiveness of our proposed technique through a series of far-field automatic speech recognition experiments on data from the \emph{PASCAL Speech Separation Challenge} (SSC). On the SSC development data, the simple delay-and-sum beamformer achieves a word error rate (WER) of 70.4\%. The MMI beamformer under a Gaussian assumption achieves a 55.2\% WER, which is further reduced to 52.0\% with a K0K_0 pdf, whereas the WER for data recorded with a close-talking microphone is 21.6\%

    Wideband data-independent beamforming for subarrays

    Get PDF
    The desire to operate large antenna arrays for e.g. RADAR applications over a wider frequency range is currently limited by the hardware, which due to weight, cost and size only permits complex multipliers behind each element. In contrast, wideband processing would have to rely on tap delay lines enabling digital filters for every element.As an intermediate step, in this thesis we consider a design where elements are grouped into subarrays, within which elements are still individually controlled by narrowband complex weights, but where each subarray output is given a tap delay line or finite impulse response digital filter for further wideband processing. Firstly, this thesis explores how a tap delay line attached to every subarray can be designed as a delay-and-sum beamformer. This filter is set to realised a fractional delay design based on a windowed sinc function. At the element level, we show that designing a narrowband beam w.r.t. a centre frequency of wideband operation is suboptimal,and suggest an optimisation technique that can yield sufficiently accurate gain over a frequency band of interest for an arbitrary look direction, which however comes at the cost of reduced aperture efficiency, as well as significantly increased sidelobes. We also suggest an adaptive method to enhance the frequency characteristic of a partial wideband array design, by utilising subarrays pointing in different directions in different frequency bands - resolved by means of a filter bank - to adaptively suppress undesired components in the beam patterns of the subarrays. Finally, the thesis proposes a novel array design approach obtained by rotational tiling of subarrays such that the overall array aperture is densely constructed from the same geometric subarray by rotation and translation only. Since the grating lobes of differently oriented subarrays do not necessarily align, an effective grating lobe attenuation w.r.t. the main beam is achieved. Based on a review of findings from geometry,a number of designs are highlight and transformed into numerical examples, and the theoretically expected grating lobe suppression is compared to uniformly spaced arrays.Supported by a number of models and simulations, the thesis thus suggests various numerical and hardware design techniques, mainly the addition of tap-delay-line per subarray and some added processing overhead, that can help to construct a large partial wideband array close in wideband performance to currently existing hardware.The desire to operate large antenna arrays for e.g. RADAR applications over a wider frequency range is currently limited by the hardware, which due to weight, cost and size only permits complex multipliers behind each element. In contrast, wideband processing would have to rely on tap delay lines enabling digital filters for every element.As an intermediate step, in this thesis we consider a design where elements are grouped into subarrays, within which elements are still individually controlled by narrowband complex weights, but where each subarray output is given a tap delay line or finite impulse response digital filter for further wideband processing. Firstly, this thesis explores how a tap delay line attached to every subarray can be designed as a delay-and-sum beamformer. This filter is set to realised a fractional delay design based on a windowed sinc function. At the element level, we show that designing a narrowband beam w.r.t. a centre frequency of wideband operation is suboptimal,and suggest an optimisation technique that can yield sufficiently accurate gain over a frequency band of interest for an arbitrary look direction, which however comes at the cost of reduced aperture efficiency, as well as significantly increased sidelobes. We also suggest an adaptive method to enhance the frequency characteristic of a partial wideband array design, by utilising subarrays pointing in different directions in different frequency bands - resolved by means of a filter bank - to adaptively suppress undesired components in the beam patterns of the subarrays. Finally, the thesis proposes a novel array design approach obtained by rotational tiling of subarrays such that the overall array aperture is densely constructed from the same geometric subarray by rotation and translation only. Since the grating lobes of differently oriented subarrays do not necessarily align, an effective grating lobe attenuation w.r.t. the main beam is achieved. Based on a review of findings from geometry,a number of designs are highlight and transformed into numerical examples, and the theoretically expected grating lobe suppression is compared to uniformly spaced arrays.Supported by a number of models and simulations, the thesis thus suggests various numerical and hardware design techniques, mainly the addition of tap-delay-line per subarray and some added processing overhead, that can help to construct a large partial wideband array close in wideband performance to currently existing hardware

    Modelling the nonstationarity of speech in the maximum negentropy beamformer

    Get PDF
    State-of-the-art automatic speech recognition (ASR) systems can achieve very low word error rates (WERs) of below 5% on data recorded with headsets. However, in many situations such as ASR at meetings or in the car, far field microphones on the table, walls or devices such as laptops are preferable to microphones that have to be worn close to the user\u27s mouths. Unfortunately, the distance between speakers and microphones introduces significant noise and reverberation, and as a consequence the WERs of current ASR systems on this data tend to be unacceptably high (30-50% upwards). The use of a microphone array, i.e. several microphones, can alleviate the problem somewhat by performing spatial filtering: beamforming techniques combine the sensors\u27 output in a way that focuses the processing on a particular direction. Assuming that the signal of interest comes from a different direction than the noise, this can improve the signal quality and reduce the WER by filtering out sounds coming from non-relevant directions. Historically, array processing techniques developed from research on non-speech data, e.g. in the fields of sonar and radar, and as a consequence most techniques were not created to specifically address beamforming in the context of ASR. While this generality can be seen as an advantage in theory, it also means that these methods ignore characteristics which could be used to improve the process in a way that benefits ASR. An example of beamforming adapted to speech processing is the recently proposed maximum negentropy beamformer (MNB), which exploits the statistical characteristics of speech as follows. "Clean" headset speech differs from noisy or reverberant speech in its statistical distribution, which is much less Gaussian in the clean case. Since negentropy is a measure of non-Gaussianity, choosing beamformer weights that maximise the negentropy of the output leads to speech that is closer to clean speech in its distribution, and this in turn has been shown to lead to improved WERs [Kumatani et al., 2009]. In this thesis several refinements of the MNB algorithm are proposed and evaluated. Firstly, a number of modifications to the original MNB configuration are proposed based on theoretical or practical concerns. These changes concern the probability density function (pdf) used to model speech, the estimation of the pdf parameters, and the method of calculating the negentropy. Secondly, a further step is taken to reflect the characteristics of speech by introducing time-varying pdf parameters. The original MNB uses fixed estimates per utterance, which do not account for the nonstationarity of speech. Several time-dependent variance estimates are therefore proposed, beginning with a simple moving average window and including the HMM-MNB, which derives the variance estimate from a set of auxiliary hidden Markov models. All beamformer algorithms presented in this thesis are evaluated through far-field ASR experiments on the Multi-Channel Wall Street Journal Audio-Visual Corpus, a database of utterances captured with real far-field sensors, in a realistic acoustic environment, and spoken by real speakers. While the proposed methods do not lead to an improvement in ASR performance, a more efficient MNB algorithm is developed, and it is shown that comparable results can be achieved with significantly less data than all frames of the utterance, a result which is of particular relevance for real-time implementations.Automatische Spracherkennungssysteme können heutzutage sehr niedrige Wortfehlerraten (WER) unter 5% erreichen, wenn die Sprachdaten mit einem Headset oder anderem Nahbesprechungsmikrofon aufgezeichnet wurden. Allerdings hat das Tragen eines mundnahen Mikrofons in vielen Situationen, wie z.B. der Spracherkennung im Auto oder während einer Besprechung, praktische Nachteile, und ein auf dem Tisch, an der Wand oder am Laptop befestigtes Mikrofon wäre in dem Fall vorteilhaft. Bei einer größeren Distanz zwischen Mikrofon und Sprecher werden andererseits aber verstärkt Hintergrundgeräusche und Hall aufgenommen, wodurch die Wortfehlerraten häufig in einen unakzeptablen Bereich von 30—50% und höher steigen. Ein Mikrofonarray, d.h. eine Gruppe von Mikrofonen, kann hierbei durch räumliches Filtern in gewissem Maße Abhilfe schaffen: sogenannte Beamforming-Methoden können die Daten der einzelnen Sensoren so kombinieren, dass der Fokus auf eine bestimmte Richtung gerichtet wird. Wenn nun ein Zielsignal aus einer anderen Richtung als die Störgeräusche kommt, kann dieser Prozess die Signalqualität erhöhen und WER-Werte reduzieren, indem die Geräusche aus den nicht-relevanten Richtungen herausgefiltert werden. Da Beamforming-Techniken sich aus der Forschung an nicht-sprachlichen Daten wie Sonar und Radar entwickelt haben, sind die wenigsten Methoden in diesem Bereich speziell auf das Problem der Spracherkennung ausgerichtet. Während eine Anwendungsunabhängigkeit von Vorteil sein kann, bedeutet sie aber auch, dass Eigenschaften der Spracherkennung ignoriert werden, die zur Verbesserung des Ergebnisses genutzt werden könnten. Ein Beispiel für einen Beamforming-Algorithmus, der speziell für die Verarbeitung von Sprache entwickelt wurde, ist der Maximum Negentropy Beamformer (MNB). Der MNB nutzt die Tatsache, dass "saubere" Sprache, die mit einem Nahbesprechungsmikrofon aufgenommen wurde, eine andere Wahrscheinlichkeitsverteilung aufweist als verrauschte oder verhallte Sprache: Die Verteilung sauberer Sprache unterscheidet sich von der Normalverteilung sehr viel stärker als die von fern aufgezeichneter Sprache. Der MNB wählt Beamforming-Gewichte, die den Negentropy-Wert maximieren, und da Negentropy misst, wie sehr sich eine Verteilung von der Normalverteilung unterscheidet, ähnelt die vom MNB produzierte Sprache statistisch gesehen sauberer Sprache, was zu verbesserten WER-Werten geführt hat [Kumatani et al., 2009]. Das Thema dieser Dissertation ist die Entwicklung und Evaluierung von verschiedenen Modifikationen des MNB. Erstens wird eine Anzahl von praktisch und theoretisch motivierten Veränderungen vorgeschlagen, die die Form der Wahrscheinlichkeitsverteilung zur Sprachmodellierung, die Schätzung der Parameter dieser Verteilung und die Berechnung der Negentropy-Werte betreffen. Zweitens wird ein weiterer Schritt zur Berücksichtigung der Eigenschaften von Sprache unternommen, indem die Zeitabhängigkeit der Verteilungsparameter eingeführt wird; im ursprünglichen MNB-Algorithmus sind diese für eine Äußerung konstant, was im Gegensatz zur nicht-konstanten Eigenschaft von Sprache steht. Mehrere zeitabhängige Varianz-Schätzungmethoden werden beschrieben und evaluiert, von einem einfachen gleitenden Durchschnittswert bis zum komplexeren HMM-MNB, der die Varianz aus Hidden-Markov-Modellen ableitet. Alle Beamforming-Algorithmen, die in dieser Arbeit vorgestellt werden, werden durch Spracherkennungsexperimente mit dem Multi-Channel Wall Street Journal Audio-Visual Corpus evaluiert. Dieser Korpus wurde nicht durch Simulation erstellt, sondern besteht aus Äußerungen von Personen, die mit echten Sensoren in einer realistischen akustischen Umgebung aufgenommen wurden. Die Ergebnisse zeigen, dass mit den bisher entwickelten Methoden keine Verbesserung der Wortfehlerrate erreicht werden kann. Allerdings wurde ein effizienterer MNB-Algorithmus entwickelt, der vergleichbare Erkennungsraten mit deutlich weniger Sprachdaten erreichen kann, was vor allem für eine Echtzeitimplementierung relevant ist

    MVDR broadband beamforming using polynomial matrix techniques

    Get PDF
    This thesis addresses the formulation of and solution to broadband minimum variance distortionless response (MVDR) beamforming. Two approaches to this problem are considered, namely, generalised sidelobe canceller (GSC) and Capon beamformers. These are examined based on a novel technique which relies on polynomial matrix formulations. The new scheme is based on the second order statistics of the array sensor measurements in order to estimate a space-time covariance matrix. The beamforming problem can be formulated based on this space-time covariance matrix. Akin to the narrowband problem, where an optimum solution can be derived from the eigenvalue decomposition (EVD) of a constant covariance matrix, this utility is here extended to the broadband case. The decoupling of the space-time covariance matrix in this case is provided by means of a polynomial matrix EVD. The proposed approach is initially exploited to design a GSC beamformer for a uniform linear array, and then extended to the constrained MVDR, or Capon, beamformer and also the GSC with an arbitrary array structure. The uniqueness of the designed GSC comes from utilising the polynomial matrix technique, and its ability to steer the array beam towards an off-broadside direction without the pre-steering stage that is associated with conventional approaches to broadband beamformers. To solve the broadband beamforming problem, this thesis addresses a number of additional tools. A first one is the accurate construction of both the steering vectors based on fractional delay filters, which are required for the broadband constraint formulation of a beamformer, as for the construction of the quiescent beamformer. In the GSC case, we also discuss how a block matrix can be obtained, and introduce a novel paraunitary matrix completion algorithm. For the Capon beamformer, the polynomial extension requires the inversion of a polynomial matrix, for which a residue-based method is proposed that offers better accuracy compared to previously utilised approaches. These proposed polynomial matrix techniques are evaluated in a number of simulations. The results show that the polynomial broadband beamformer (PBBF) steersthe main beam towards the direction of the signal of interest (SoI) and protects the signal over the specified bandwidth, and at the same time suppresses unwanted signals by placing nulls in their directions. In addition to that, the PBBF is compared to the standard time domain broadband beamformer in terms of their mean square error performance, beam-pattern, and computation complexity. This comparison shows that the PBBF can offer a significant reduction in computation complexity compared to its standard counterpart. Overall, the main benefits of this approach include beam steering towards an arbitrary look direction with no need for pre-steering step, and a potentially significant reduction in computational complexity due to the decoupling of dependencies of the quiescent beamformer, blocking matrix, and the adaptive filter compared to a standard broadband beamformer implementation.This thesis addresses the formulation of and solution to broadband minimum variance distortionless response (MVDR) beamforming. Two approaches to this problem are considered, namely, generalised sidelobe canceller (GSC) and Capon beamformers. These are examined based on a novel technique which relies on polynomial matrix formulations. The new scheme is based on the second order statistics of the array sensor measurements in order to estimate a space-time covariance matrix. The beamforming problem can be formulated based on this space-time covariance matrix. Akin to the narrowband problem, where an optimum solution can be derived from the eigenvalue decomposition (EVD) of a constant covariance matrix, this utility is here extended to the broadband case. The decoupling of the space-time covariance matrix in this case is provided by means of a polynomial matrix EVD. The proposed approach is initially exploited to design a GSC beamformer for a uniform linear array, and then extended to the constrained MVDR, or Capon, beamformer and also the GSC with an arbitrary array structure. The uniqueness of the designed GSC comes from utilising the polynomial matrix technique, and its ability to steer the array beam towards an off-broadside direction without the pre-steering stage that is associated with conventional approaches to broadband beamformers. To solve the broadband beamforming problem, this thesis addresses a number of additional tools. A first one is the accurate construction of both the steering vectors based on fractional delay filters, which are required for the broadband constraint formulation of a beamformer, as for the construction of the quiescent beamformer. In the GSC case, we also discuss how a block matrix can be obtained, and introduce a novel paraunitary matrix completion algorithm. For the Capon beamformer, the polynomial extension requires the inversion of a polynomial matrix, for which a residue-based method is proposed that offers better accuracy compared to previously utilised approaches. These proposed polynomial matrix techniques are evaluated in a number of simulations. The results show that the polynomial broadband beamformer (PBBF) steersthe main beam towards the direction of the signal of interest (SoI) and protects the signal over the specified bandwidth, and at the same time suppresses unwanted signals by placing nulls in their directions. In addition to that, the PBBF is compared to the standard time domain broadband beamformer in terms of their mean square error performance, beam-pattern, and computation complexity. This comparison shows that the PBBF can offer a significant reduction in computation complexity compared to its standard counterpart. Overall, the main benefits of this approach include beam steering towards an arbitrary look direction with no need for pre-steering step, and a potentially significant reduction in computational complexity due to the decoupling of dependencies of the quiescent beamformer, blocking matrix, and the adaptive filter compared to a standard broadband beamformer implementation

    Efficient Multiband Algorithms for Blind Source Separation

    Get PDF
    The problem of blind separation refers to recovering original signals, called source signals, from the mixed signals, called observation signals, in a reverberant environment. The mixture is a function of a sequence of original speech signals mixed in a reverberant room. The objective is to separate mixed signals to obtain the original signals without degradation and without prior information of the features of the sources. The strategy used to achieve this objective is to use multiple bands that work at a lower rate, have less computational cost and a quicker convergence than the conventional scheme. Our motivation is the competitive results of unequal-passbands scheme applications, in terms of the convergence speed. The objective of this research is to improve unequal-passbands schemes by improving the speed of convergence and reducing the computational cost. The first proposed work is a novel maximally decimated unequal-passbands scheme.This scheme uses multiple bands that make it work at a reduced sampling rate, and low computational cost. An adaptation approach is derived with an adaptation step that improved the convergence speed. The performance of the proposed scheme was measured in different ways. First, the mean square errors of various bands are measured and the results are compared to a maximally decimated equal-passbands scheme, which is currently the best performing method. The results show that the proposed scheme has a faster convergence rate than the maximally decimated equal-passbands scheme. Second, when the scheme is tested for white and coloured inputs using a low number of bands, it does not yield good results; but when the number of bands is increased, the speed of convergence is enhanced. Third, the scheme is tested for quick changes. It is shown that the performance of the proposed scheme is similar to that of the equal-passbands scheme. Fourth, the scheme is also tested in a stationary state. The experimental results confirm the theoretical work. For more challenging scenarios, an unequal-passbands scheme with over-sampled decimation is proposed; the greater number of bands, the more efficient the separation. The results are compared to the currently best performing method. Second, an experimental comparison is made between the proposed multiband scheme and the conventional scheme. The results show that the convergence speed and the signal-to-interference ratio of the proposed scheme are higher than that of the conventional scheme, and the computation cost is lower than that of the conventional scheme

    Source Separation for Hearing Aid Applications

    Get PDF
    corecore