159 research outputs found

    Local sound field synthesis

    Get PDF
    This thesis investigates the physical and perceptual properties of selected methods for (Local) Sound Field Synthesis ((L)SFS). In agreement with numerical sound field simulations, a specifically developed geometric model shows an increase of synthesis accuracy for LSFS compared to conventional SFS approaches. Different (L)SFS approaches are assessed within listening experiments, where LSFS performs at least as good as conventional methods for azimuthal sound source localisation and achieves a significant increase of timbral fidelity for distinct parametrisations.Die Arbeit untersucht die physikalischen und perzeptiven Eigenschaften von ausgewĂ€hlten Verfahren zur (lokalen) Schallfeldsynthese ((L)SFS). Zusammen mit numerischen Simulationen zeigt ein eigens entwickeltes geometrisches Modell, dass LSFS gegenĂŒber konventioneller SFS zu einer genauere Synthese fĂŒhrt. Die Verfahren werden in Hörversuchen evaluiert, wobei LSFS bei der horizontalen Lokalisierung von Schallquellen eine Genauigkeit erreicht, welche mindestens gleich der von konventionellen Methoden ist. FĂŒr bestimmte Parametrierung wird eine signifikant verbesserte klangliche Treue erreicht

    Wave Field Synthesis in a listening room

    Get PDF
    This thesis investigates the influence of the listening room on sound fields synthesised by Wave Field Synthesis. Methods are developed that allow for investigation of the spatial and timbral perception of Wave Field Synthesis in a reverberant environment using listening experiments based on simulation by binaural synthesis and room acoustical simulation. The results can serve as guidelines for the design of listening rooms for Wave Field Synthesis.Diese Dissertation untersucht den Einfluss des Wiedergaberaums auf Schallfelder, die mit Wellenfeldsynthese synthetisiert werden. Es werden Methoden zur Untersuchung von rĂ€umlicher und klangfarblicher Wahrnehmung von Wellenfeldsynthese in einer reflektierenden Umgebung mittels Hörversuchen entwickelt, die auf Simulation mit Binauralsynthese und raumakustischer Simulation beruhen. Die Ergebnisse können als Richtlinien zur Gestaltung von WiedergaberĂ€umen fĂŒr Wellenfeldsynthese dienen

    Bayesian Models for Multimodal Perception of 3D Structure and Motion

    Get PDF
    In this text we will formalise a novel solution, the Bayesian Volumetric Map (BVM), as a framework for a metric, short-term, egocentric spatial memory for multimodal perception of 3D structure and motion. This solution will enable the implementation of top-down mechanisms of attention guidance of perception towards areas of high entropy/uncertainty, so as to promote active exploration of the environment by the robotic perceptual system. In the process, we will to try address the inherent challenges of visual, auditory and vestibular sensor fusion through the BVM. In fact, it is our belief that perceptual systems are unable to yield truly useful descriptions of their environment without resorting to a temporal series of sensory fusion processed on a short-term memory such as the BVM

    Vision-Guided Robot Hearing

    Get PDF
    International audienceNatural human-robot interaction (HRI) in complex and unpredictable environments is important with many potential applicatons. While vision-based HRI has been thoroughly investigated, robot hearing and audio-based HRI are emerging research topics in robotics. In typical real-world scenarios, humans are at some distance from the robot and hence the sensory (microphone) data are strongly impaired by background noise, reverberations and competing auditory sources. In this context, the detection and localization of speakers plays a key role that enables several tasks, such as improving the signal-to-noise ratio for speech recognition, speaker recognition, speaker tracking, etc. In this paper we address the problem of how to detect and localize people that are both seen and heard. We introduce a hybrid deterministic/probabilistic model. The deterministic component allows us to map 3D visual data onto an 1D auditory space. The probabilistic component of the model enables the visual features to guide the grouping of the auditory features in order to form audiovisual (AV) objects. The proposed model and the associated algorithms are implemented in real-time (17 FPS) using a stereoscopic camera pair and two microphones embedded into the head of the humanoid robot NAO. We perform experiments with (i)~synthetic data, (ii)~publicly available data gathered with an audiovisual robotic head, and (iii)~data acquired using the NAO robot. The results validate the approach and are an encouragement to investigate how vision and hearing could be further combined for robust HRI

    SystÚme d'audition artificielle embarqué optimisé pour robot mobile muni d'une matrice de microphones

    Get PDF
    Dans un environnement non contrĂŽlĂ©, un robot doit pouvoir interagir avec les personnes d’une façon autonome. Cette autonomie doit Ă©galement inclure une interaction grĂące Ă  la voix humaine. Lorsque l’interaction s’effectue Ă  une distance de quelques mĂštres, des phĂ©nomĂšnes tels que la rĂ©verbĂ©ration et la prĂ©sence de bruit ambiant doivent ĂȘtre pris en considĂ©ration pour effectuer efficacement des tĂąches comme la reconnaissance de la parole ou de locuteur. En ce sens, le robot doit ĂȘtre en mesure de localiser, suivre et sĂ©parer les sources sonores prĂ©sentes dans son environnement. L’augmentation rĂ©cente de la puissance de calcul des processeurs et la diminution de leur consommation Ă©nergĂ©tique permettent dorĂ©navant d’intĂ©grer ces systĂšmes d’audition articielle sur des systĂšmes embarquĂ©s en temps rĂ©el. L’audition robotique est un domaine relativement jeune qui compte deux principales librairies d’audition artificielle : ManyEars et HARK. Jusqu’à prĂ©sent, le nombre de microphones se limite gĂ©nĂ©ralement Ă  huit, en raison de l’augmentation rapide de charge de calculs lorsque des microphones supplĂ©mentaires sont ajoutĂ©s. De plus, il est parfois difficile d’utiliser ces librairies avec des robots possĂ©dant des gĂ©omĂ©tries variĂ©es puisqu’il est nĂ©cessaire de les calibrer manuellement. Cette thĂšse prĂ©sente la librairie ODAS qui apporte des solutions Ă  ces difficultĂ©s. Afin d’effectuer une localisation et une sĂ©paration plus robuste aux matrices de microphones fermĂ©es, ODAS introduit un modĂšle de directivitĂ© pour chaque microphone. Une recherche hiĂ©rarchique dans l’espace permet Ă©galement de rĂ©duire la quantitĂ© de calculs nĂ©cessaires. De plus, une mesure de l’incertitude du dĂ©lai d’arrivĂ©e du son est introduite pour ajuster automatiquement plusieurs paramĂštres et ainsi Ă©viter une calibration manuelle du systĂšme. ODAS propose Ă©galement un nouveau module de suivi de sources sonores qui emploie des filtres de Kalman plutĂŽt que des filtres particulaires. Les rĂ©sultats dĂ©montrent que les mĂ©thodes proposĂ©es rĂ©duisent la quantitĂ© de fausses dĂ©tections durant la localisation, amĂ©liorent la robustesse du suivi pour des sources sonores multiples et augmentent la qualitĂ© de la sĂ©paration de 2.7 dB dans le cas d’un formateur de faisceau Ă  variance minimale. La quantitĂ© de calculs requis diminue par un facteur allant jusqu’à 4 pour la localisation et jusqu’à 30 pour le suivi par rapport Ă  la librairie ManyEars. Le module de sĂ©paration des sources sonores exploite plus efficacement la gĂ©omĂ©trie de la matrice de microphones, sans qu’il soit nĂ©cessaire de mesurer et calibrer manuellement le systĂšme. Avec les performances observĂ©es, la librairie ODAS ouvre aussi la porte Ă  des applications dans le domaine de la dĂ©tection des drones par le bruit, la localisation de bruits extĂ©rieurs pour une navigation plus efficace pour les vĂ©hicules autonomes, des assistants main-libre Ă  domicile et l’intĂ©gration dans des aides auditives

    Three-Dimensional Geometry Inference of Convex and Non-Convex Rooms using Spatial Room Impulse Responses

    Get PDF
    This thesis presents research focused on the problem of geometry inference for both convex- and non-convex-shaped rooms, through the analysis of spatial room impulse responses. Current geometry inference methods are only applicable to convex-shaped rooms, requiring between 6--78 discretely spaced measurement positions, and are only accurate under certain conditions, such as a first-order reflection for each boundary being identifiable across all, or some subset of, these measurements. This thesis proposes that by using compact microphone arrays capable of capturing spatiotemporal information, boundary locations, and hence room shape for both convex and non-convex cases, can be inferred, using only a sufficient number of measurement positions to ensure each boundary has a first-order reflection attributable to, and identifiable in, at least one measurement. To support this, three research areas are explored. Firstly, the accuracy of direction-of-arrival estimation for reflections in binaural room impulse responses is explored, using a state-of-the-art methodology based on binaural model fronted neural networks. This establishes whether a two-microphone array can produce accurate enough direction-of-arrival estimates for geometry inference. Secondly, a spherical microphone array based spatiotemporal decomposition workflow for analysing reflections in room impulse responses is explored. This establishes that simultaneously arriving reflections can be individually detected, relaxing constraints on measurement positions. Finally, a geometry inference method applicable to both convex and more complex non-convex shaped rooms is proposed. Therefore, this research expands the possible scenarios in which geometry inference can be successfully applied at a level of accuracy comparable to existing work, through the use of commonly used compact microphone arrays. Based on these results, future improvements to this approach are presented and discussed in detail

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc

    Acoustic Source Localisation in constrained environments

    Get PDF
    Acoustic Source Localisation (ASL) is a problem with real-world applications across multiple domains, from smart assistants to acoustic detection and tracking. And yet, despite the level of attention in recent years, a technique for rapid and robust ASL remains elusive – not least in the constrained environments in which such techniques are most likely to be deployed. In this work, we seek to address some of these current limitations by presenting improvements to the ASL method for three commonly encountered constraints: the number and configuration of sensors; the limited signal sampling potentially available; and the nature and volume of training data required to accurately estimate Direction of Arrival (DOA) when deploying a particular supervised machine learning technique. In regard to the number and configuration of sensors, we find that accuracy can be maintained at state-of-the-art levels, Steered Response Power (SRP), while reducing computation sixfold, based on direct optimisation of well known ASL formulations. Moreover, we find that the circular microphone configuration is the least desirable as it yields the highest localisation error. In regard to signal sampling, we demonstrate that the computer vision inspired algorithm presented in this work, which extracts selected keypoints from the signal spectrogram, and uses them to select signal samples, outperforms an audio fingerprinting baseline while maintaining a compression ratio of 40:1. In regard to the training data employed in machine learning ASL techniques, we show that the use of music training data yields an improvement of 19% against a noise data baseline while maintaining accuracy using only 25% of the training data, while training with speech as opposed to noise improves DOA estimation by an average of 17%, outperforming the Generalised Cross-Correlation technique by 125% in scenarios in which the test and training acoustic environments are matched.Heriot-Watt University James Watt Scholarship (JSW) in the School of Engineering & Physical Sciences

    Detection and localization of 3d audio-visual objects using unsupervised clustering

    Full text link
    • 

    corecore