501 research outputs found

    Äänikentän tila-analyysi parametrista tilaäänentoistoa varten käyttäen harvoja mikrofoniasetelmia

    Get PDF
    In spatial audio capturing the aim is to store information about the sound field so that the sound field can be reproduced without a perceptual difference to the original. The need for this is in applications like virtual reality and teleconferencing. Traditionally the sound field has been captured with a B-format microphone, but it is not always a feasible solution due to size and cost constraints. Alternatively, also arrays of omnidirectional microphones can be utilized and they are often used in devices like mobile phones. If the microphone array is sparse, i.e., the microphone spacings are relatively large, the analysis of the sound Direction of Arrival (DoA) becomes ambiguous in higher frequencies. This is due to spatial aliasing, which is a common problem in narrowband DoA estimation. In this thesis the spatial aliasing problem was examined and its effect on DoA estimation and spatial sound synthesis with Directional Audio Coding (DirAC) was studied. The aim was to find methods for unambiguous narrowband DoA estimation. The current State of the Art methods can remove aliased estimates but are not capable of estimating the DoA with the optimal Time-Frequency resolution. In this thesis similar results were obtained with parameter extrapolation when only a single broadband source exists. The main contribution of this thesis was the development of a correlation-based method. The developed method utilizes pre-known, array-specific information on aliasing in each DoA and frequency. The correlation-based method was tested and found to be the best option to overcome the problem of spatial aliasing. This method was able to resolve spatial aliasing even with multiple sources or when the source’s frequency content is completely above the spatial aliasing frequency. In a listening test it was found that the correlation-based method could provide a major improvement to the DirAC synthesized spatial image quality when compared to an aliased estimator.Tilaäänen tallentamisessa tavoitteena on tallentaa äänikentän ominaisuudet siten, että äänikenttä pystytään jälkikäteen syntetisoimaan ilman kuuloaistilla havaittavaa eroa alkuperäiseen. Tarve tälle löytyy erilaisista sovelluksista, kuten virtuaalitodellisuudesta ja telekonferensseista. Perinteisesti äänikentän ominaisuuksia on tallennettu B-formaatti mikrofonilla, jonka käyttö ei kuitenkaan aina ole koko- ja kustannussyistä mahdollista. Vaihtoehtoisesti voidaan käyttää myös pallokuvioisista mikrofoneista koostuvia mikrofoniasetelmia. Mikäli mikrofonien väliset etäisyydet ovat liian suuria, eli asetelma on harva, tulee äänen saapumissuunnan selvittämisestä epäselvää korkeammilla taajuuksilla. Tämä johtuu ilmiöstä nimeltä tilallinen laskostuminen. Tämän diplomityön tarkoituksena oli tutkia tilallisen laskostumisen ilmiötä, sen vaikutusta saapumissuunnan arviointiin sekä tilaäänisynteesiin Directional Audio Coding (DirAC) -menetelmällä. Lisäksi tutkittiin menetelmiä, joiden avulla äänen saapumissuunta voitaisiin selvittää oikein myös tilallisen laskostumisen läsnä ollessa. Työssä havaittiin, että nykyiset ratkaisut laskostumisongelmaan eivät kykene tuottamaan oikeita suunta-arvioita optimaalisella aikataajuusresoluutiolla. Tässä työssä samantapaisia tuloksia saatiin laajakaistaisen äänilähteen tapauksessa ekstrapoloimalla suunta-arvioita laskostumisen rajataajuuden alapuolelta. Työn pääosuus oli kehittää korrelaatioon perustuva saapumissuunnan arviointimenetelmä, joka kykenee tuottamaan luotettavia arvioita rajataajuuden yläpuolella ja useamman äänilähteen ympäristöissä. Kyseinen menetelmä hyödyntää mikrofoniasetelmalle ominaista, saapumissuunnasta ja taajuudesta riippuvaista laskostumiskuviota. Kuuntelukokeessa havaittiin, että korrelaatioon perustuva menetelmä voi tuoda huomattavan parannuksen syntetisoidun tilaäänikuvan laatuun verrattuna synteesiin laskostuneilla suunta-arvioilla

    Surround by Sound: A Review of Spatial Audio Recording and Reproduction

    Get PDF
    In this article, a systematic overview of various recording and reproduction techniques for spatial audio is presented. While binaural recording and rendering is designed to resemble the human two-ear auditory system and reproduce sounds specifically for a listener’s two ears, soundfield recording and reproduction using a large number of microphones and loudspeakers replicate an acoustic scene within a region. These two fundamentally different types of techniques are discussed in the paper. A recent popular area, multi-zone reproduction, is also briefly reviewed in the paper. The paper is concluded with a discussion of the current state of the field and open problemsThe authors acknowledge National Natural Science Foundation of China (NSFC) No. 61671380 and Australian Research Council Discovery Scheme DE 150100363

    Reconstruction Resolution Of Coherent Point Sources With Helmholtz Equation Least Squares

    Get PDF
    The dissertation investigates the reconstruction of coherent point sources using Helmholtz Equation Least Squares (HELS) method based on measurements in violation of Nearfield Acoustical Holography (NAH) resolution guidelines. In HELS, the Helmholtz equation is solved by matching a series of localized spherical expansion functions to the measured pressures in the field. Expansion coefficients are solved for by least squares and used to reconstruct pressures at the source surface. By approximating the pressure radiation with expansion functions, field and surface pressures can be synthesized, resulting in the possibility of higher spatial resolution than previous generation NAH methods such as Fourier Acoustics and Inverse Boundary Element Methods. The NAH guidelines dictate that spatial resolution decreases with increasing stand-off distance and decreasing Signal to Noise Ratio (SNR). Also, in methods other than HELS, measurement spacing must exceed the spacing derived from the Nyquist rate to mitigate the risk of aliasing. HELS is not limited by the Nyquist rate due to its ability to synthesize field and surface points. The resolution capability of HELS is tested through numerical simulation and experimental testing. Besides HELS, a weighted variant of HELS, termed Modified HELS is tested. For comparison, Fourier Acoustics is used as a baseline with measurement spacing equal to and finer than the measurement spacing used in the HELS simulations. Results show that both HELS and Fourier Acoustics reconstruct point sources at finer resolution than the NAH guidelines predict. The increased resolution is likely due to the use of point sources and its affect on the definition of SNR and the angular spectrum. However, HELS, and in particular Modified HELS, show a significant increase in accuracy in comparison to Fourier Acoustics for the parameters tested. The main conclusion of this dissertation is that Standard and Modified HELS are better tools than traditional NAH methods when reconstructing coherent point sources in violation of the NAH spatial resolution guidelines

    Wave Field Synthesis in a listening room

    Get PDF
    This thesis investigates the influence of the listening room on sound fields synthesised by Wave Field Synthesis. Methods are developed that allow for investigation of the spatial and timbral perception of Wave Field Synthesis in a reverberant environment using listening experiments based on simulation by binaural synthesis and room acoustical simulation. The results can serve as guidelines for the design of listening rooms for Wave Field Synthesis.Diese Dissertation untersucht den Einfluss des Wiedergaberaums auf Schallfelder, die mit Wellenfeldsynthese synthetisiert werden. Es werden Methoden zur Untersuchung von räumlicher und klangfarblicher Wahrnehmung von Wellenfeldsynthese in einer reflektierenden Umgebung mittels Hörversuchen entwickelt, die auf Simulation mit Binauralsynthese und raumakustischer Simulation beruhen. Die Ergebnisse können als Richtlinien zur Gestaltung von Wiedergaberäumen für Wellenfeldsynthese dienen

    Spatial sound reproduction with frequency band processing of B-format audio signals

    Get PDF
    Lisääntynyt tietämys tilakuulon toimintaperiaatteista on mahdollistanut lukuisien tilaäänentoistoteknologioiden synnyn. Näihin lukeutuvat muiden muassa monikanavaäänen pakkaus, kanavakokoonpanon muunnokset sekä tilaäänen yleinen kanavariippumaton esitystapa. Directional Audio Coding (DirAC) on teknologia, jolla pyritään analysoimaan ja vastaanottopäässä syntetisoimaan havainnon kannalta oleelliset äänikentän ominaisuudet. Ihmisen tilakuulo toimii niinsanottujen vihjeiden avulla. Näitä ovat muiden muassa korviin saapuvien äänisignaalien keskinäiset erot sekä moniaistiset vihjeet kuten näköaistista saatava informaatio. DirAC:n tavoitteena on mitata äänitystilassa ja uudelleentuottaa kuuntelutilassa ne äänikentän ominaisuudet, jotka vaikuttavat kuuloaistiin liittyvien vihjeiden syntyyn. Yhdestä pisteestä mitattavasta hiukkasnopeudesta sekä äänenpaineesta voidaan laskea äänikentän hetkellinen intensiteetti ja energia taajuuskaistoittain. Näistä voidaan puolestaan selvittää äänen tulosuunta sekä diffuusisuus eli hajaantuneisuus. DirAC:n perusoletus on, että ihmisen suuntakuulon vihjeet muodostuvat näiden ominaisuuksien perusteella, äänen taajuus- ja aikarakenteen lisäksi. Toisin sanoen oletus on, että mikäli nämä ominaisuudet onnistutaan uudelleentuottamaan, kuulijan tulisi kokea kuulokokemus, joka vastaisi täysin sitä kuulokokemusta, joka olisi syntynyt alkuperäisessä mittaustilassakin. Reaaliaikainen lineaarivaiheiseen suodinpankkiin perustuva DirAC-ohjelmisto toteutettiin tutkimuksen yhteydessä. Kuuntelukokeet osoittivat, että riittävällä määrällä kaiuttimia sekä ideaalisella mikrofonilla DirAC:n kyky uudelleentuottaa tilaääntä oli erinomainen. 5.0-kotiteatterikokoonpanoa sekä Soundfield ST350 -mikrofonia käytettäessä laatu oli hyvä. Lisätutkimukset osoittivat, että ST350-mikrofonin toimivuus suunta-analyysissä heikkenee voimakkaasti taajuuksilla, jotka ylittävät 1,5-3 kHz.The increase of knowledge in the field of spatial hearing has given birth to various spatial audio reproduction technologies. These include efficient perceptual coding of multi-channel audio, channel conversion technologies and universal audio formats with no restrictions to any specific loudspeaker setup. Directional Audio Coding (DirAC) extends the scope of universal audio reproduction to real sound environments by utilizing existing microphones for analysis and arbitrary loudspeaker setups for synthesis of the perceptually relevant properties of the sound field. The human spatial hearing functions on the basis of multitude of cues. These cues range from the differences of the sound reaching both ears to the multimodal cues such as the visual cues. The goal of DirAC is to measure and synthesize those sound field properties by the influence of which the auditory cues arise, leaving only the multimodality out of scope. The particle velocity and the sound pressure in a single measurement point enable the calculation of the sound field intensity and the energy in frequency bands. From these, the direction of arrival and the sound field diffuseness can be formulated. The fundamental assumption of DirAC is that the human auditory cues arise by the influence of these sound field properties along with the monaural spectral and temporal properties. Therefore a successful re-synthesis of these properties is assumed to bring a spatial hearing experience identical to that of the original measurement space. A real-time linear phase filterbank version of DirAC was implemented. The reproduction quality of DirAC was shown to be excellent in formal listening tests if the number of loudspeakers is adequate and the microphone is ideal. The reproduction quality with standard 5.0 setup and Soundfield ST350 microphone was good. Additional experiments showed that the directional properties of the ST350 microphone collapse at frequencies above 1,5-3 kHz

    Quality assessment of spherical microphone array auralizations

    Get PDF
    The thesis documents a scientific study on quality assessment and quality prediction in Virtual Acoustic Environments (VAEs) based on spherical microphone array data, using binaural synthesis for reproduction. In the experiments, predictive modeling is applied to estimate the influence of the array on the reproduction quality by relating the data derived in perceptual experiments to the output of an auditory model. The experiments adress various aspects of the array considered relevant in auralization applications: the influence of system errors as well as the influence of the array configuration employed. The system errors comprise spatial aliasing, measurement noise, and microphone positioning errors while the array configuration is represented by the sound field order in terms of spherical harmonics, defining the spatial resolution of the array. Based on array simulations, the experimental data comprise free-field sound fields and two shoe-box shaped rooms, one with weak and another with strong reverberation. Ten audio signals served as test material, e.g., orchestral/pop music, male/female singing voice or single instruments such as castanets. In the perceptual experiments, quantitative methods are used to evaluate the impact of system errors while a descriptive analysis assesses the array configuration using two quality factors for attribution: Apparent Source Width (ASW) and Listener Envelopment (LEV). Both are quality measures commonly used in concert hall acoustics to describe the spaciousness of a room. The results from the perceptual experiments are subsequently related to the technical data derived from the auditory model in order to build, train, and evaluate a variety of predictive models. Based on classification and regression approaches, these models are applied and investigated for automated quality assessment in order to identify and categorize system errors as well as to estimate their perceptual strength. Moreover, the models allow to predict the array’s influence on ASW and LEV perception and enable the classification of further sound field characteristics, like the reflection properties of the simulated room or the sound field order used. The applied prediction models comprise simple linear regression and decision trees, or more complex models such as support vector machines or artificial neural networks. The results show that the developed prediction models perform well in their classification and regression tasks. Although their functionality is limited to the conditions underlying the conducted experiments, they can still provide a useful tool to assess basic quality-related aspects which are important when developing spherical microphone arrays for auralization applications.Die vorliegende Arbeit beschäftigt sich mit der Qualitätsbewertung und -vorhersage in virtuellen akustischen Umgebungen, insbesondere in Raumsimulationen basierend auf Kugelarraydaten, die mithilfe binauraler Synthese auralisiert werden. Dabei werden verschiedene Prädiktionsverfahren angewandt, um den Einfluss des Arrays auf die Wiedergabequalität automatisiert vorherzusagen, indem die Daten von Hörexperimenten mit denen eines auditorischen Modells in Bezug gesetzt werden. Im Fokus der Experimente stehen unterschiedliche, praxisrelevante Aspekte des Messsystems, die einen Einfluss auf die Wiedergabequalität haben. Konkret sind dies Messfehler, wie räumliches Aliasing, Rauschen oder Mikrofonpositionierungsfehler, oder die Konfiguration des Arrays. Diese definiert das räumliche Auflösungsvermögen und entspricht der gewählten Ordnung der Sphärischen Harmonischen Zerlegung. Die Experimente basieren auf Kugelarray-Simulationen unter Freifeldbedingungen und in einfachen simulierten Rechteckräumen mit unterschiedlichen Reflexionseigenschaften, wobei ein Raum trocken, der andere dagegen stark reflektierend ist. Dabei dienen zehn Testsignale als Audiomaterial, die in praktischen Anwendungen relevant erscheinen, wie z. B. Orchester- oder Popmusik, männlicher und weiblicher Gesang oder Kastagnetten. In Wahrnehmungsexperimenten wird der Einfluss von Messfehlern in einer quantitativen Analyse bewertet und die Qualität der Synthese deskriptiv mit den Attributen Apparent Source Width (ASW) und Listener Envelopment (LEV) bewertet. Die resultierenden Daten bilden die Basis für die Qualitätsvorhersage, wobei die Hörtestergebnisse als Observationen und die Ausgangsdaten des auditorischen Modells als Prädiktoren dienen. Mit den Daten werden unterschiedliche Prädiktionsmodelle trainiert und deren Vorhersagegenauigkeit anschließend bewertet. Die entwickelten Modelle ermöglichen es, sowohl Messfehler zu identifizieren und zu klassifizieren als auch deren Ausprägung zu schätzen. Darüber hinaus erlauben sie es, den Einfluss der Arraykonfiguration auf die Wahrnehmung von ASW und LEV vorherzusagen und die verwendete Ordnung der Schallfeldzerlegung zu identifizieren, ebenso wie die Reflexionseigenschaften des simulierten Raumes. Es kommen sowohl einfache Regressionsmodelle und Entscheidungsbäume zur Anwendung als auch komplexere Modelle, wie Support Vector Machines oder neuronale Netze. Die entwickelten Modelle zeigen in der Regel eine hohe Genauigkeit bei der Qualitätsvorhersage und erlauben so die Analyse von grundlegenden Array-Eigenschaften, ohne aufwendige Hörexperimente durchführen zu müssen. Obwohl die Anwendbarkeit der Modelle auf die hier untersuchten Fälle beschränkt ist, können sie sich als hilfreiche Werkzeuge bei der Entwicklung von Kugelarrays für Auralisationsanwendungen erweisen

    Anthropometric Individualization of Head-Related Transfer Functions Analysis and Modeling

    Get PDF
    Human sound localization helps to pay attention to spatially separated speakers using interaural level and time differences as well as angle-dependent monaural spectral cues. In a monophonic teleconference, for instance, it is much more difficult to distinguish between different speakers due to missing binaural cues. Spatial positioning of the speakers by means of binaural reproduction methods using head-related transfer functions (HRTFs) enhances speech comprehension. These HRTFs are influenced by the torso, head and ear geometry as they describe the propagation path of the sound from a source to the ear canal entrance. Through this geometry-dependency, the HRTF is directional and subject-dependent. To enable a sufficient reproduction, individual HRTFs should be used. However, it is tremendously difficult to measure these HRTFs. For this reason this thesis proposes approaches to adapt the HRTFs applying individual anthropometric dimensions of a user. Since localization at low frequencies is mainly influenced by the interaural time difference, two models to adapt this difference are developed and compared with existing models. Furthermore, two approaches to adapt the spectral cues at higher frequencies are studied, improved and compared. Although the localization performance with individualized HRTFs is slightly worse than with individual HRTFs, it is nevertheless still better than with non-individual HRTFs, taking into account the measurement effort

    Proceedings of the EAA Joint Symposium on Auralization and Ambisonics 2014

    Get PDF
    In consideration of the remarkable intensity of research in the field of Virtual Acoustics, including different areas such as sound field analysis and synthesis, spatial audio technologies, and room acoustical modeling and auralization, it seemed about time to organize a second international symposium following the model of the first EAA Auralization Symposium initiated in 2009 by the acoustics group of the former Helsinki University of Technology (now Aalto University). Additionally, research communities which are focused on different approaches to sound field synthesis such as Ambisonics or Wave Field Synthesis have, in the meantime, moved closer together by using increasingly consistent theoretical frameworks. Finally, the quality of virtual acoustic environments is often considered as a result of all processing stages mentioned above, increasing the need for discussions on consistent strategies for evaluation. Thus, it seemed appropriate to integrate two of the most relevant communities, i.e. to combine the 2nd International Auralization Symposium with the 5th International Symposium on Ambisonics and Spherical Acoustics. The Symposia on Ambisonics, initiated in 2009 by the Institute of Electronic Music and Acoustics of the University of Music and Performing Arts in Graz, were traditionally dedicated to problems of spherical sound field analysis and re-synthesis, strategies for the exchange of ambisonics-encoded audio material, and – more than other conferences in this area – the artistic application of spatial audio systems. This publication contains the official conference proceedings. It includes 29 manuscripts which have passed a 3-stage peer-review with a board of about 70 international reviewers involved in the process. Each contribution has already been published individually with a unique DOI on the DepositOnce digital repository of TU Berlin. Some conference contributions have been recommended for resubmission to Acta Acustica united with Acustica, to possibly appear in a Special Issue on Virtual Acoustics in late 2014. These are not published in this collection.European Acoustics Associatio

    Étude de transformées temps-fréquence pour le codage audio faible retard en haute qualité

    Get PDF
    In recent years there has been a phenomenal increase in the number of products and applications which make use of audio coding formats. Amongthe most successful audio coding schemes, the MPEG-1 Layer III (mp3), the MPEG-2 Advanced Audio Coding (AAC) or its evolution MPEG-4High Efficiency-Advanced Audio Coding (HE-AAC) can be cited. More recently, perceptual audio coding has been adapted to achieve codingat low-delay such to become suitable for conversational applications. Traditionally, the use of filter bank such as the Modified Discrete CosineTransform (MDCT) is a central component of perceptual audio coding and its adaptation to low delay audio coding has become an important researchtopic. Low delay transforms have been developed in order to retain the performance of standard audio coding while reducing dramatically the associated algorithmic delay.This work presents some elements allowing to better accommodate the delay reduction constraint. Among the contributions, a low delay blockswitching tool which allows the direct transition between long transform and short transform without the insertion of transition window. The sameprinciple has been extended to define new perfect reconstruction conditions for the MDCT with relaxed constraints compared to the original definition.As a consequence, a seamless reconstruction method has been derived to increase the flexibility of transform coding schemes with the possibility toselect a transform for a frame independently from its neighbouring frames. Finally, based on this new approach, a new low delay window design procedure has been derived to obtain an analytic definition for a new family of transforms, permitting high quality with a substantial coding delay reduction. The performance of the proposed transforms has been thoroughly evaluated, an evaluation framework involving an objective measurement of the optimal transform sequence is proposed. It confirms the relevance of the proposed transforms used for audio coding. In addition, the new approaches have been successfully applied to the recent standardisation work items, such as the low delay audio coding developed at MPEG (LD-AAC and ELD-AAC) and they have been evaluated with numerous subjective testing, showing a significant improvement of the quality for transient signals. The new low delay window design has been adopted in G.718, a scalable speech and audio codec standardized in ITU-T and has demonstrated its benefit in terms of delay reduction while maintaining the audio quality of a traditional MDCT.Codage audio à faible retard à l'aide de la définition de nouvelles fenêtres pour la transformée MDCT et l'introduction d'un nouveau schéma de commutation de fenêtre
    corecore