198 research outputs found

    Spatial dissection of a soundfield using spherical harmonic decomposition

    Get PDF
    A real-world soundfield is often contributed by multiple desired and undesired sound sources. The performance of many acoustic systems such as automatic speech recognition, audio surveillance, and teleconference relies on its ability to extract the desired sound components in such a mixed environment. The existing solutions to the above problem are constrained by various fundamental limitations and require to enforce different priors depending on the acoustic condition such as reverberation and spatial distribution of sound sources. With the growing emphasis and integration of audio applications in diverse technologies such as smart home and virtual reality appliances, it is imperative to advance the source separation technology in order to overcome the limitations of the traditional approaches. To that end, we exploit the harmonic decomposition model to dissect a mixed soundfield into its underlying desired and undesired components based on source and signal characteristics. By analysing the spatial projection of a soundfield, we achieve multiple outcomes such as (i) soundfield separation with respect to distinct source regions, (ii) source separation in a mixed soundfield using modal coherence model, and (iii) direction of arrival (DOA) estimation of multiple overlapping sound sources through pattern recognition of the modal coherence of a soundfield. We first employ an array of higher order microphones for soundfield separation in order to reduce hardware requirement and implementation complexity. Subsequently, we develop novel mathematical models for modal coherence of noisy and reverberant soundfields that facilitate convenient ways for estimating DOA and power spectral densities leading to robust source separation algorithms. The modal domain approach to the soundfield/source separation allows us to circumvent several practical limitations of the existing techniques and enhance the performance and robustness of the system. The proposed methods are presented with several practical applications and performance evaluations using simulated and real-life dataset

    Real-time Microphone Array Processing for Sound-field Analysis and Perceptually Motivated Reproduction

    Get PDF
    This thesis details real-time implementations of sound-field analysis and perceptually motivated reproduction methods for visualisation and auralisation purposes. For the former, various methods for visualising the relative distribution of sound energy from one point in space are investigated and contrasted; including a novel reformulation of the cross-pattern coherence (CroPaC) algorithm, which integrates a new side-lobe suppression technique. Whereas for auralisation applications, listening tests were conducted to compare ambisonics reproduction with a novel headphone formulation of the directional audio coding (DirAC) method. The results indicate that the side-lobe suppressed CroPaC method offers greater spatial selectivity in reverberant conditions compared with other popular approaches, and that the new DirAC formulation yields higher perceived spatial accuracy when compared to the ambisonics method

    Epälineaarisen signaaliriippuvan akustisen keilanmuodostajan reaaliaikaimplementaatio

    Get PDF
    A real-time acoustical beamforming system incorporating the cross pattern coherence (CroPaC) post filtering method is implemented in this thesis. The real-time implementation consists of a signal-independent beamformer that is used for spatial discrimination of a sound field. The signal of the beamformer is post filtered by modulating it with a parameter that is derived from the cross-spectrum of two directional microphone signals. The post filter is implemented to enhance performance of beamforming (increase in signal-to-noise ratio), because beamformers are not efficient in environments with high level of reverberation. The post filtering method has been previously implemented in MATLAB for non-real-time use, and this system is the first real-time implementation of an acoustical beamforming system utilizing it. The implementation is programmed in the programming language C for the graphical signal processing program Max developed by Cycling '74. It utilizes a time-frequency domain processing, and the spherical Fourier transform for a decomposition of a sound field into spherical harmonic signals. The implementation can be used with microphone arrays with maximum of 32 microphone capsules, which are laid over rigid sphere with uniform or nearly-uniform arrangements. The real-time implementation can be utilized in many applications, which require algorithm to work in real-time, such as teleconferencing and acoustical cameras.Tässä diplomityössä implementoidaan reaaliaikainen akustinen keilanmuodostusjärjestelmä signaalien väliseen koherenssiin perustuvalla (CroPaC) jälkisuodatuksella. Reaaliaikaimplementaatio koostuu signaaliriippumattomasta keilanmuodostajasta, jota käytetään äänikentän spatiaaliseen suodatukseen. Keilanmuodostajan signaalia jälkisuodatetaan moduloimalla sitä parametrilla, joka johdetaan kahden suuntamikrofonin signaalin välisestä koherenssista. Jälkisuodatus implementoidaan keilanmuodostajan suorituskyvyn parantamiseksi (signaali-kohina-suhteen kasvu), sillä keilanmuodostajat eivät ole tehokkaita kaiuntaisissa ympäristöissä. Jälkisuodatusmetodi on aikaisemmin implementoitu MATLABissa ei-reaaliaikakäyttöä varten. Tämän työn implementaatio on ensimmäinen reaaliaikainen akustinen keilanmuodostusjärjestelmä, joka hyödyntää CroPaC-jälkisuodatusta. Implementaatio on ohjelmoitu C-ohjelmointikielellä graafiselle signaalinprosessointityökalulle Max, jonka on kehittänyt Cycling '74. Prosessointi tapahtuu aika-taajuustasossa ja siinä hyödynnetään äänikentän dekompositiota palloharmonisiin signaaleihin. Implementaatiota voidaan käyttää mikrofoniryhmällä, jossa on korkeintaan 32 mikrofonikapselia, jotka on asetettu jäykän pallon päälle tasavälein tai lähes tasavälein. Reaaliaikaimplementaatiota voidaan hyödyntää lukuisissa sovelluksissa, jotka edellyttävät algoritmin reaaliaikaista toimintaa, esimerkiksi puhelinkokouksissa ja akustisissa kameroissa

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc

    Spatial Multizone Soundfield Reproduction Design

    No full text
    It is desirable for people sharing a physical space to access different multimedia information streams simultaneously. For a good user experience, the interference of the different streams should be held to a minimum. This is straightforward for the video component but currently difficult for the audio sound component. Spatial multizone soundfield reproduction, which aims to provide an individual sound environment to each of a set of listeners without the use of physical isolation or headphones, has drawn significant attention of researchers in recent years. The realization of multizone soundfield reproduction is a conceptually challenging problem as currently most of the soundfield reproduction techniques concentrate on a single zone. This thesis considers the theory and design of a multizone soundfield reproduction system using arrays of loudspeakers in given complex environments. We first introduce a novel method for spatial multizone soundfield reproduction based on describing the desired multizone soundfield as an orthogonal expansion of formulated basis functions over the desired reproduction region. This provides the theoretical basis of both 2-D (height invariant) and 3-D soundfield reproduction for this work. We then extend the reproduction of the multizone soundfield over the desired region to reverberant environments, which is based on the identification of the acoustic transfer function (ATF) from the loudspeaker over the desired reproduction region using sparse methods. The simulation results confirm that the method leads to a significantly reduced number of required microphones for an accurate multizone sound reproduction compared with the state of the art, while it also facilitates the reproduction over a wide frequency range. In addition, we focus on the improvements of the proposed multizone reproduction system with regard to practical implementation. The so-called 2.5D multizone oundfield reproduction is considered to accurately reproduce the desired multizone soundfield over a selected 2-D plane at the height approximately level with the listener’s ears using a single array of loudspeakers with 3-D reverberant settings. Then, we propose an adaptive reverberation cancelation method for the multizone soundfield reproduction within the desired region and simplify the prior soundfield measurement process. Simulation results suggest that the proposed method provides a faster convergence rate than the comparative approaches under the same hardware provision. Finally, we conduct the real-world implementation based on the proposed theoretical work. The experimental results show that we can achieve a very noticeable acoustic energy contrast between the signals recorded in the bright zone and the quiet zone, especially for the system implementation with reverberation equalization

    Surround by Sound: A Review of Spatial Audio Recording and Reproduction

    Get PDF
    In this article, a systematic overview of various recording and reproduction techniques for spatial audio is presented. While binaural recording and rendering is designed to resemble the human two-ear auditory system and reproduce sounds specifically for a listener’s two ears, soundfield recording and reproduction using a large number of microphones and loudspeakers replicate an acoustic scene within a region. These two fundamentally different types of techniques are discussed in the paper. A recent popular area, multi-zone reproduction, is also briefly reviewed in the paper. The paper is concluded with a discussion of the current state of the field and open problemsThe authors acknowledge National Natural Science Foundation of China (NSFC) No. 61671380 and Australian Research Council Discovery Scheme DE 150100363

    Fundamental and Harmonic Ultrasound Image Joint Restoration

    Get PDF
    L'imagerie ultrasonore conserve sa place parmi les principales modalités d'imagerie en raison de ses capacités à révéler l'anatomie et à inspecter le mouvement des organes et le flux sanguin en temps réel, d'un manière non invasive et non ionisante, avec un faible coût, une facilité d'utilisation et une grande vitesse de reconstruction des images. Néanmoins, l'imagerie ultrasonore présente des limites intrinsèques en termes de résolution spatiale. L'amélioration de la résolution spatiale des images ultrasonores est un défi actuel et de nombreux travaux ont longtemps porté sur l'optimisation du dispositif d'acquisition. L'imagerie ultrasonore à haute résolution atteint cet objectif grâce à l'utilisation de sondes spécialisées, mais se confronte aujourd'hui à des limites physiques et technologiques. L'imagerie harmonique est la solution intuitive des spécialistes pour augmenter la résolution lors de l'acquisition. Cependant, elle souffre d'une atténuation en profondeur. Une solution alternative pour améliorer la résolution est de développer des techniques de post-traitement comme la restauration d'images ultrasonores. L'objectif de cette thèse est d'étudier la non-linéarité des échos ultrasonores dans le processus de restauration et de présenter l'intérêt d'incorporer des images US harmoniques dans ce processus. Par conséquent, nous présentons une nouvelle méthode de restauration d'images US qui utilise les composantes fondamentales et harmoniques de l'image observée. La plupart des méthodes existantes sont basées sur un modèle linéaire de formation d'image. Sous l'approximation de Born du premier ordre, l'image RF est supposée être une convolution 2D entre la fonction de réflectivité et la réponse impulsionelle du système. Par conséquent, un problème inverse résultant est formé et résolu en utilisant un algorithme de type ADMM. Plus précisément, nous proposons de récupérer la fonction de reflectivité inconnue en minimisant une fonction composée de deux termes de fidélité des données correspondant aux composantes linéaires (fondamentale) et non linéaires (première harmonique) de l'image observée, et d'un terme de régularisation basé sur la parcimonie afin de stabiliser la solution. Pour tenir compte de l'atténuation en profondeur des images harmoniques, un terme d'atténuation dans le modèle direct de l'image harmonique est proposé sur la base d'une analyse spectrale effectuée sur les signaux RF observés. La méthode proposée a d'abord été appliquée en deux étapes, en estimant d'abord la réponse impulsionelle, suivi par la fonction de réflectivité. Dans un deuxième temps, une solution pour estimer simultanément le réponse impulsionelle et la fonction de réflectivité est proposée, et une autre solution pour prendre en compte la variabilité spatiale du la réponse impulsionelle est présentée. L'intérêt de la méthode proposée est démontré par des résultats synthétiques et in vivo et comparé aux méthodes de restauration conventionnelles

    Subband beamforming with higher order statistics for distant speech recognition

    Get PDF
    This dissertation presents novel beamforming methods for distant speech recognition (DSR). Such techniques can relieve users from the necessity of putting on close talking microphones. DSR systems are useful in many applications such as humanoid robots, voice control systems for automobiles, automatic meeting transcription systems and so on. A main problem in DSR is that recognition performance is seriously degraded when a speaker is far from the microphones. In order to avoid the degradation, noise and reverberation should be removed from signals received with the microphones. Acoustic beamforming techniques have a potential to enhance speech from the far field with little distortion since they can maintain a distortionless constraint for a look direction. In beamforming, multiple signals propagating from a position are captured with multiple microphones. Typical conventional beamformers then adjust their weights so as to minimize the variance of their own outputs subject to a distortionless constraint in a look direction. The variance is the average of the second power (square) of the beamformer\u27s outputs. Accordingly, it is considered that the conventional beamformer uses second orderstatistics (SOS) of the beamformer\u27s outputs. The conventional beamforming techniques can effectively place a null on any source of interference. However, the desired signal is also canceled in reverberant environments, which is known as the signal cancellation problem. To avoid that problem, many algorithms have been developed. However, none of the algorithms can essentially solve the signal cancellation problem in reverberant environments. While many efforts have been made in order to overcome the signal cancellation problem in the field of acoustic beamforming, researchers have addressed another research issue with the microphone array, that is, blind source separation (BSS) [1]. The BSS techniques aim at separating sources from the mixture of signals without information about the geometry of the microphone array and positions of sources. It is achieved by multiplying an un-mixing matrix with input signals. The un-mixing matrix is constructed so that the outputs are stochastically independent. Measuring the stochastic independence of the signals is based on the theory of the independent component analysis (ICA) [1]. The field of ICA is based on the fact that distributions of information-bearing signals are not Gaussian and distributions of sums of various signals are close to Gaussian. There are two popular criteria for measuring the degree of the non-Gaussianity, namely, kurtosis and negentropy. As described in detail in this thesis, both criteria use more than the second moment. Accordingly, it is referred to as higher order statistics (HOS) in contrast to SOS. HOS is not considered in the field of acoustic beamforming well although Arai et al. showed the similarity between acoustic beamforming and BSS [2]. This thesis investigates new beamforming algorithms which take into consideration higher-order statistics (HOS). The new beamforming methods adjust the beamformer\u27s weights based on one of the following criteria: • minimum mutual information of the two beamformer\u27s outputs, • maximum negentropy of the beamformer\u27s outputs and • maximum kurtosis of the beamformer\u27s outputs. Those algorithms do not suffer from the signal cancellation, which is shown in this thesis. Notice that the new beamforming techniques can keep the distortionless constraint for the direction of interest in contrast to the BSS algorithms. The effectiveness of the new techniques is finally demonstrated through a series of distant automatic speech recognition experiments on real data recorded with real sensors unlike other work where signals artificially convolved with measured impulse responses are considered. Significant improvements are achieved by the beamforming algorithms proposed here.Diese Dissertation präsentiert neue Methoden zur Spracherkennung auf Entfernung. Mit diesen Methoden ist es möglich auf Nahbesprechungsmikrofone zu verzichten. Spracherkennungssysteme, die auf Nahbesprechungsmikrofone verzichten, sind in vielen Anwendungen nützlich, wie zum Beispiel bei Humanoiden-Robotern, in Voice Control Systemen für Autos oder bei automatischen Transcriptionssystemen von Meetings. Ein Hauptproblem in der Spracherkennung auf Entfernung ist, dass mit zunehmendem Abstand zwischen Sprecher und Mikrofon, die Genauigkeit der Spracherkennung stark abnimmt. Aus diesem Grund ist es elementar die Störungen, nämlich Hintergrundgeräusche, Hall und Echo, aus den Mikrofonsignalen herauszurechnen. Durch den Einsatz von mehreren Mikrofonen ist eine räumliche Trennung des Nutzsignals von den Störungen möglich. Diese Methode wird als akustisches Beamformen bezeichnet. Konventionelle akustische Beamformer passen ihre Gewichte so an, dass die Varianz des Ausgangssignals minimiert wird, wobei das Signal in "Blickrichtung" die Bedingung der Verzerrungsfreiheit erfüllen muss. Die Varianz ist definiert als das quadratische Mittel des Ausgangssignals.Somit werden bei konventionellen Beamformingmethoden Second-Order Statistics (SOS) des Ausgangssignals verwendet. Konventionelle Beamformer können Störquellen effizient unterdrücken, aber leider auch das Nutzsignal. Diese unerwünschte Unterdrückung des Nutzsignals wird im Englischen signal cancellation genannt und es wurden bereits viele Algorithmen entwickelt um dies zu vermeiden. Keiner dieser Algorithmen, jedoch, funktioniert effektiv in verhallter Umgebung. Eine weitere Methode das Nutzsignal von den Störungen zu trennen, diesesmal jedoch ohne die geometrische Information zu nutzen, wird Blind Source Separation (BSS) [1] genannt. Hierbei wird eine Matrixmultiplikation mit dem Eingangssignal durchgeführt. Die Matrix muss so konstruiert werden, dass die Ausgangssignale statistisch unabhängig voneinander sind. Die statistische Unabhängigkeit wird mit der Theorie der Independent Component Analysis (ICA) gemessen [1]. Die ICA nimmt an, dass informationstragende Signale, wie z.B. Sprache, nicht gaußverteilt sind, wohingegen die Summe der Signale, z.B. das Hintergrundrauschen, gaußverteilt sind. Es gibt zwei gängige Arten um den Grad der Nichtgaußverteilung zu bestimmen, Kurtosis und Negentropy. Wie in dieser Arbeit beschrieben, werden hierbei höhere Momente als das zweite verwendet und somit werden diese Methoden als Higher-Order Statistics (HOS) bezeichnet. Obwohl Arai et al. zeigten, dass sich Beamforming und BSS ähnlich sind, werden HOS beim akustischen Beamforming bisher nicht verwendet [2] und beruhen weiterhin auf SOS. In der hier vorliegenden Dissertation werden neue Beamformingalgorithmen entwickelt und evaluiert, die auf HOS basieren. Die neuen Beamformingmethoden passen ihre Gewichte anhand eines der folgenden Kriterien an: • Minimum Mutual Information zweier Beamformer Ausgangssignale • Maximum Negentropy der Beamformer Ausgangssignale und • Maximum Kurtosis der Beamformer Ausgangssignale. Es wird anhand von Spracherkennerexperimenten (gemessen in Wortfehlerrate) gezeigt, dass die hier entwickelten Beamformingtechniken auch erfolgreich Störquellen in verhallten Umgebungen unterdrücken, was ein klarer Vorteil gegenüber den herkömmlichen Methoden ist

    Theory and Design of Spatial Active Noise Control Systems

    No full text
    The concept of spatial active noise control is to use a number of loudspeakers to generate anti-noise sound waves, which would cancel the undesired acoustic noise over a spatial region. The acoustic noise hazards that exist in a variety of situations provide many potential applications for spatial ANC. However, using existing ANC techniques, it is difficult to achieve satisfying noise reduction for a spatial area, especially using a practical hardware setup. Therefore, this thesis explores various aspects of spatial ANC, and seeks to develop algorithms and techniques to promote the performance and feasibility of spatial ANC in real-life applications. We use the spherical harmonic analysis technique as the basis for our research in this work. This technique provides an accurate representation of the spatial noise field, and enables in-depth analysis of the characteristics of the noise field. Incorporating this technique into the design of spatial ANC systems, we developed a series of algorithms and methods that optimizes the spatial ANC systems, towards both improving noise reduction performance and reducing system complexity. Several contributions of this work are: (i) design of compact planar microphone array structures capable of recording 3D spatial sound fields, so that the noise field can be monitored with minimum physical intrusion to the quiet zone, (ii) derivation of a Direct-to-Reverberant Energy Ratio (DRR) estimation algorithm which can be used for evaluating reverberant characteristics of a noisy environment, (iii) propose a few methods to estimate and optimize spatial noise reduction of an ANC system, including a new metric for measuring spatial noise energy level, and (iv) design of an adaptive spatial ANC algorithm incorporating the spherical harmonic analysis technique. The combination of these contributions enables the design of compact, high performing spatial ANC systems for various applications

    Array signal processing algorithms for localization and equalization in complex acoustic channels

    No full text
    The reproduction of realistic soundscapes in consumer electronic applications has been a driving force behind the development of spatial audio signal processing techniques. In order to accurately reproduce or decompose a particular spatial sound field, being able to exploit or estimate the effects of the acoustic environment becomes essential. This requires both an understanding of the source of the complexity in the acoustic channel (the acoustic path between a source and a receiver) and the ability to characterize its spatial attributes. In this thesis, we explore how to exploit or overcome the effects of the acoustic channel for sound source localization and sound field reproduction. The behaviour of a typical acoustic channel can be visualized as a transformation of its free field behaviour, due to scattering and reflections off the measurement apparatus and the surfaces in a room. These spatial effects can be modelled using the solutions to the acoustic wave equation, yet the physical nature of these scatterers typically results in complex behaviour with frequency. The first half of this thesis explores how to exploit this diversity in the frequency-domain for sound source localization, a concept that has not been considered previously. We first extract down-converted subband signals from the broadband audio signal, and collate these signals, such that the spatial diversity is retained. A signal model is then developed to exploit the channel's spatial information using a signal subspace approach. We show that this concept can be applied to multi-sensor arrays on complex-shaped rigid bodies as well as the special case of binaural localization. In both c! ases, an improvement in the closely spaced source resolution is demonstrated over traditional techniques, through simulations and experiments using a KEMAR manikin. The binaural analysis further indicates that the human localization performance in certain spatial regions is limited by the lack of spatial diversity, as suggested in perceptual experiments in the literature. Finally, the possibility of exploiting known inter-subband correlated sources (e.g., speech) for localization in under-determined systems is demonstrated. The second half of this thesis considers reverberation control, where reverberation is modelled as a superposition of sound fields created by a number of spatially distributed sources. We consider the mode/wave-domain description of the sound field, and propose modelling the reverberant modes as linear transformations of the desired sound field modes. This is a novel concept, as we consider each mode transformation to be independent of other modes. This model is then extended to sound field control, and used to derive the compensation signals required at the loudspeakers to equalize the reverberation. We show that estimating the reverberant channel and controlling the sound field now becomes a single adaptive filtering problem in the mode-domain, where the modes can be adapted independently. The performance of the proposed method is compared with existing adaptive and non-adaptive sound field control techniques through simulations. Finally, it is shown that an order of magnitude reduction in the computational complexity can be achieved, while maintaining comparable performance to existing adaptive control techniques
    corecore