117 research outputs found

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc

    Music in Virtual Space: Theories and Techniques for Sound Spatialization and Virtual Reality-Based Stage Performance

    Get PDF
    This research explores virtual reality as a medium for live concert performance. I have realized compositions in which the individual performing on stage uses a VR head-mounted display complemented by other performance controllers to explore a composed virtual space. Movements and objects within the space are used to influence and control sound spatialization and diffusion, musical form, and sonic content. Audience members observe this in real-time, watching the performer\u27s journey through the virtual space on a screen while listening to spatialized audio on loudspeakers variable in number and position. The major artistic challenge I will explore through this activity is the relationship between virtual space and musical form. I will also explore and document the technical challenges of this activity, resulting in a shareable software tool called the Multi-source Ambisonic Spatialization Interface (MASI), which is useful in creating a bridge between VR technologies and associated software, ambisonic spatialization techniques, sound synthesis, and audio playback and effects, and establishes a unique workflow for working with sound in virtual space

    Proceedings of the EAA Joint Symposium on Auralization and Ambisonics 2014

    Get PDF
    In consideration of the remarkable intensity of research in the field of Virtual Acoustics, including different areas such as sound field analysis and synthesis, spatial audio technologies, and room acoustical modeling and auralization, it seemed about time to organize a second international symposium following the model of the first EAA Auralization Symposium initiated in 2009 by the acoustics group of the former Helsinki University of Technology (now Aalto University). Additionally, research communities which are focused on different approaches to sound field synthesis such as Ambisonics or Wave Field Synthesis have, in the meantime, moved closer together by using increasingly consistent theoretical frameworks. Finally, the quality of virtual acoustic environments is often considered as a result of all processing stages mentioned above, increasing the need for discussions on consistent strategies for evaluation. Thus, it seemed appropriate to integrate two of the most relevant communities, i.e. to combine the 2nd International Auralization Symposium with the 5th International Symposium on Ambisonics and Spherical Acoustics. The Symposia on Ambisonics, initiated in 2009 by the Institute of Electronic Music and Acoustics of the University of Music and Performing Arts in Graz, were traditionally dedicated to problems of spherical sound field analysis and re-synthesis, strategies for the exchange of ambisonics-encoded audio material, and – more than other conferences in this area – the artistic application of spatial audio systems. This publication contains the official conference proceedings. It includes 29 manuscripts which have passed a 3-stage peer-review with a board of about 70 international reviewers involved in the process. Each contribution has already been published individually with a unique DOI on the DepositOnce digital repository of TU Berlin. Some conference contributions have been recommended for resubmission to Acta Acustica united with Acustica, to possibly appear in a Special Issue on Virtual Acoustics in late 2014. These are not published in this collection.European Acoustics Associatio

    Interactive auralization based on hybrid simulation methods and plane wave expansion

    No full text
    The reconstruction and reproduction of sound fields have been extensively researched in the last decades leading to an intuitive approach to estimate and evaluate the acoustic properties of enclosures. Applications of auralization can be found in acoustic design, subjective tests, virtual reality and entertainment, among others. Different methodologies have been established to generate auralizations for room acoustics purposes, the most common of them, the use of geometrical acoustics and methods based on the numerical solution of the wave equation to synthesize the room impulse responses. The assumptions and limitations of each approach are well known, which in turn, restrict their application to specific frequency bands. If the aim is to reconstruct accurately the sound field in an extended range of frequencies, a combination of these methodologies has to be performed. Furthermore, recent advances in computational power have enabled the possibility to generate interactive atmospheres where the user is able to interact with the environment. This feature, although it expands the applications of the auralization technique, is nowadays mainly based on geometrical acoustics or interpolation methods. The present research addresses the generation of interactive broadband auralizations of enclosures using a combination of the finite element method and geometrical acoustics. For this, modelling parameters for both simulation methods are discussed making emphasis on the assumptions made in each case. Then, the predicted room impulse responses are represented by means of a plane wave expansion, which in turn, enables interactive features such as translation and rotation of the acoustic fields. An analytical expression is derived for the translation in the plane wave domain. Furthermore, the transformation of the plane wave representation in terms of spherical harmonics is also explored allowing the acoustic fields to be rotated. The effects of assuming a plane wave propagation within small enclosures and the consequences of using a finite number of plane waves to synthesize the sound fields are discussed. Finally, an implementation of an interactive auralization system is considered for different reference cases. This methodology enables reconstruction of the aural impression of enclosures in real-time with higher accuracy at low frequencies compared to only geometrical acoustics techniques. The plane wave expansion provides a convenient sound field representation in which the listener can interact with the acoustics of the enclosure. Furthermore, the sound reconstruction can be performed by implementing several sound reproduction techniques extending the versatility of the proposed approach

    Iceberg: a loudspeaker-based room auralization method for auditory research

    Get PDF
    Depending on the acoustic scenario, people with hearing loss are challenged on a different scale than normal hearing people to comprehend sound, especially speech. That happen especially during social interactions within a group, which often occurs in environments with low signal-to-noise ratios. This communication disruption can create a barrier for people to acquire and develop communication skills as a child or to interact with society as an adult. Hearing loss compensation aims to provide an opportunity to restore the auditory part of socialization. Technology and academic efforts progressed to a better understanding of the human hearing system. Through constant efforts to present new algorithms, miniaturization, and new materials, constantly-improving hardware with high-end software is being developed with new features and solutions to broad and specific auditory challenges. The effort to deliver innovative solutions to the complex phenomena of hearing loss encompasses tests, verifications, and validation in various forms. As the newer devices achieve their purpose, the tests need to increase the sensitivity, requiring conditions that effectively assess their improvements. Regarding realism, many levels are required in hearing research, from pure tone assessment in small soundproof booths to hundreds of loudspeakers combined with visual stimuli through projectors or head-mounted displays, light, and movement control. Hearing aids research commonly relies on loudspeaker setups to reproduce sound sources. In addition, auditory research can use well-known auralization techniques to generate sound signals. These signals can be encoded to carry more than sound pressure level information, adding spatial information about the environment where that sound event happened or was simulated. This work reviews physical acoustics, virtualization, and auralization concepts and their uses in listening effort research. This knowledge, combined with the experiments executed during the studies, aimed to provide a hybrid auralization method to be virtualized in four-loudspeaker setups. Auralization methods are techniques used to encode spatial information into sounds. The main methods were discussed and derived, observing their spatial sound characteristics and trade-offs to be used in auditory tests with one or two participants. Two well-known auralization techniques (Ambisonics and Vector-Based Amplitude Panning) were selected and compared through a calibrated virtualization setup regarding spatial distortions in the binaural cues. The choice of techniques was based on the need for loudspeakers, although a small number of them. Furthermore, the spatial cues were examined by adding a second listener to the virtualized sound field. The outcome reinforced the literature around spatial localization and these techniques driving Ambisonics to be less spatially accurate but with greater immersion than Vector-Based Amplitude Panning. A combination study to observe changes in listening effort due to different signal-to-noise ratios and reverberation in a virtualized setup was defined. This experiment aimed to produce the correct sound field via a virtualized setup and assess listening effort via subjective impression with a questionnaire, an objective physiological outcome from EEG, and behavioral performance on word recognition. Nine levels of degradation were imposed on speech signals over speech maskers separated in the virtualized space through Ambisonics' first-order technique in a setup with 24 loudspeakers. A high correlation between participants' performance and their responses on the questionnaire was observed. The results showed that the increased virtualized reverberation time negatively impacts speech intelligibility and listening effort. A new hybrid auralization method was proposed merging the investigated techniques that presented complementary spatial sound features. The method was derived through room acoustics concepts and a specific objective parameter derived from the room impulse response called Center Time. The verification around the binaural cues was driven with three different rooms (simulated). As the validation with test subjects was not possible due to the COVID-19 pandemic situation, a psychoacoustic model was implemented to estimate the spatial accuracy of the method within a four-loudspeaker setup. Also, an investigation ran the same verification, and the model estimation was performed with the introduction of hearing aids. The results showed that it is possible to consider the hybrid method with four loudspeakers for audiological tests while considering some limitations. The setup can provide binaural cues to a maximum ambiguity angle of 30 degrees in the horizontal plane for a centered listener

    Movements in Binaural Space: Issues in HRTF Interpolation and Reverberation, with applications to Computer Music

    Get PDF
    This thesis deals broadly with the topic of Binaural Audio. After reviewing the literature, a reappraisal of the minimum-phase plus linear delay model for HRTF representation and interpolation is offered. A rigorous analysis of threshold based phase unwrapping is also performed. The results and conclusions drawn from these analyses motivate the development of two novel methods for HRTF representation and interpolation. Empirical data is used directly in a Phase Truncation method. A Functional Model for phase is used in the second method based on the psychoacoustical nature of Interaural Time Differences. Both methods are validated; most significantly, both perform better than a minimum-phase method in subjective testing. The accurate, artefact-free dynamic source processing afforded by the above methods is harnessed in a binaural reverberation model, based on an early reflection image model and Feedback Delay Network diffuse field, with accurate interaural coherence. In turn, these flexible environmental processing algorithms are used in the development of a multi-channel binaural application, which allows the audition of multi-channel setups in headphones. Both source and listener are dynamic in this paradigm. A GUI is offered for intuitive use of the application. HRTF processing is thus re-evaluated and updated after a review of accepted practice. Novel solutions are presented and validated. Binaural reverberation is recognised as a crucial tool for convincing artificial spatialisation, and is developed on similar principles. Emphasis is placed on transparency of development practices, with the aim of wider dissemination and uptake of binaural technology

    Iceberg: a loudspeaker-based room auralization method for auditory research

    Get PDF
    Depending on the acoustic scenario, people with hearing loss are challenged on a different scale than normal hearing people to comprehend sound, especially speech. That happen especially during social interactions within a group, which often occurs in environments with low signal-to-noise ratios. This communication disruption can create a barrier for people to acquire and develop communication skills as a child or to interact with society as an adult. Hearing loss compensation aims to provide an opportunity to restore the auditory part of socialization. Technology and academic efforts progressed to a better understanding of the human hearing system. Through constant efforts to present new algorithms, miniaturization, and new materials, constantly-improving hardware with high-end software is being developed with new features and solutions to broad and specific auditory challenges. The effort to deliver innovative solutions to the complex phenomena of hearing loss encompasses tests, verifications, and validation in various forms. As the newer devices achieve their purpose, the tests need to increase the sensitivity, requiring conditions that effectively assess their improvements. Regarding realism, many levels are required in hearing research, from pure tone assessment in small soundproof booths to hundreds of loudspeakers combined with visual stimuli through projectors or head-mounted displays, light, and movement control. Hearing aids research commonly relies on loudspeaker setups to reproduce sound sources. In addition, auditory research can use well-known auralization techniques to generate sound signals. These signals can be encoded to carry more than sound pressure level information, adding spatial information about the environment where that sound event happened or was simulated. This work reviews physical acoustics, virtualization, and auralization concepts and their uses in listening effort research. This knowledge, combined with the experiments executed during the studies, aimed to provide a hybrid auralization method to be virtualized in four-loudspeaker setups. Auralization methods are techniques used to encode spatial information into sounds. The main methods were discussed and derived, observing their spatial sound characteristics and trade-offs to be used in auditory tests with one or two participants. Two well-known auralization techniques (Ambisonics and Vector-Based Amplitude Panning) were selected and compared through a calibrated virtualization setup regarding spatial distortions in the binaural cues. The choice of techniques was based on the need for loudspeakers, although a small number of them. Furthermore, the spatial cues were examined by adding a second listener to the virtualized sound field. The outcome reinforced the literature around spatial localization and these techniques driving Ambisonics to be less spatially accurate but with greater immersion than Vector-Based Amplitude Panning. A combination study to observe changes in listening effort due to different signal-to-noise ratios and reverberation in a virtualized setup was defined. This experiment aimed to produce the correct sound field via a virtualized setup and assess listening effort via subjective impression with a questionnaire, an objective physiological outcome from EEG, and behavioral performance on word recognition. Nine levels of degradation were imposed on speech signals over speech maskers separated in the virtualized space through Ambisonics' first-order technique in a setup with 24 loudspeakers. A high correlation between participants' performance and their responses on the questionnaire was observed. The results showed that the increased virtualized reverberation time negatively impacts speech intelligibility and listening effort. A new hybrid auralization method was proposed merging the investigated techniques that presented complementary spatial sound features. The method was derived through room acoustics concepts and a specific objective parameter derived from the room impulse response called Center Time. The verification around the binaural cues was driven with three different rooms (simulated). As the validation with test subjects was not possible due to the COVID-19 pandemic situation, a psychoacoustic model was implemented to estimate the spatial accuracy of the method within a four-loudspeaker setup. Also, an investigation ran the same verification, and the model estimation was performed with the introduction of hearing aids. The results showed that it is possible to consider the hybrid method with four loudspeakers for audiological tests while considering some limitations. The setup can provide binaural cues to a maximum ambiguity angle of 30 degrees in the horizontal plane for a centered listener

    Spatial Sound Rendering – A Survey

    Get PDF
    Simulating propagation of sound and audio rendering can improve the sense of realism and the immersion both in complex acoustic environments and dynamic virtual scenes. In studies of sound auralization, the focus has always been on room acoustics modeling, but most of the same methods are also applicable in the construction of virtual environments such as those developed to facilitate computer gaming, cognitive research, and simulated training scenarios. This paper is a review of state-of-the-art techniques that are based on acoustic principles that apply not only to real rooms but also in 3D virtual environments. The paper also highlights the need to expand the field of immersive sound in a web based browsing environment, because, despite the interest and many benefits, few developments seem to have taken place within this context. Moreover, the paper includes a list of the most effective algorithms used for modelling spatial sound propagation and reports their advantages and disadvantages. Finally, the paper emphasizes in the evaluation of these proposed works

    The role that sound spatialization plays in improving performance in an interactive installation : study of the correlation between gesture and localization of sound sources in space

    Get PDF
    The main objective of this research work is to study the correlation between gesture and localization of sound sources in space within the framework of interactive installations, based on theories of hearing and gesture. We have therefore chosen the experimental method by developing an interactive installation with which we carry out three different experiments, in which a subject’s hand is tracked by a Microsoft Kinect depth camera (motion capture) and a deictic gesture is used to trigger recorded music sounds and identify their localization in the horizontal plane. Thus, we manipulate the direction of sound and we measure the percentage of correct perceptual sound source localizations resulting from the participant’s responses in an Inquiry Mode Questionnaire in comparison with the actual directions of the gesture and perceptual sound sources provided by software. Descriptive and inferential statistics is applied to the collected data. The main results show that it is easier to define the origin of sound and that auditory perception is more accurate when its incidence is frontal in the horizontal plane, just as sound source localization theory predicts. Whereas 86.1% of all volunteers consider that their gesture coincides with the origin of sound in experiment 1, in which the use of their gesture in a certain direction produces a sound from that direction, only 58.1% admit the same in experiment 3, in which the same gesture is used to identify the system-predetermined localization of a perceptual sound source in an angle of 260o around a subject. At least 55.9% of all participants do not perceive that their gesture cannot coincide with the origin of sound in experiment 2, since sound is produced from the opposite surround direction, which seems to demonstrate that, when sounds are produced frontally or from the back and a person has the task of controlling their motion with a deictic gesture at the same time, his or her ability to identify the origin of sound generally diminishes, in addition to the already well-known reduced ability to identify it when it is in the median plane, if the head is not rotated. We therefore conclude that there is a relatively high correlation between gesture and localization of sound sources in space, but this is not as perfect as it could be owing to the limitations of the human auditory system and to the natural dependence of head movement on gesture.O objectivo principal deste trabalho de pesquisa é o de estudar a correlação entre gesto e localização de fontes sonoras no espaço, no âmbito das instalações interactivas, com base nas teorias da audição e do gesto. Na ocasisão em que começamos a nossa investigação verificámos que havia vários estudos que abordavam os assuntos “gesto” e “localização de fontes sonoras” de diversas maneiras: 1) de forma independente um do outro e/ou noutros contextos distintos dos das instalações interactivas, como por exemplo em Blauert (1997), Pulkki (1999) Pulkki & Karjalainen (2001), Pulkki (2001a), Bates et al. (2007), Hammershøi (2009), McNeill (1992), Coutaz & Crowley (1995), Choi (2000), Cadoz & Wanderley (2000), Nehaniv (2005), Campbell (2005), ou Godøy & Leman (2010); 2) de um ponto de vista mais técnico, como por exemplo em Harada et al. (1992), Jensenius et al. (2006), Marshall et al. (2006), Schacher (2007), Neukom & Schacher (2008), Zelli (2009), Marshall et al. (2009), Bhuiyan & Picking (2009), ou Schumacher & Bresson (2010); ou 3) de um ponto de vista mais artístico, como em Bencina et al. (2008) ou Grigoriou & Floros (2010). Havia, no entanto, muito poucos estudos a envolver ou a abordar ambos os assuntos e a analisar de maneira conjugada as suas relações de um ponto de vista mais perceptual, como por exemplo em Gröhn (2002), de Götzen (2004) ou Marentakis et al. (2008). Foi esta última perspectiva que decidimos seguir e que aqui exploramos. Desta forma, optámos pelo método experimental, aplicando um desenho de medidas repetidas e desenvolvendo uma instalação interactiva com a qual realizamos três experiências diferentes, em que a mão de um sujeito é rastreada por uma câmara de profundidade Microsoft Kinect (captação de movimento) e um gesto díctico é usado para activar sons de música gravada e para identificar as suas localizações no plano de escuta horizontal. Assim, manipulamos a direcção do som e medimos a percentagem de localizações de fontes sonoras perceptuais correctas, resultante das respostas dos participantes num Inquérito Por Questionário em comparação com as direcções reais do gesto díctico e das fontes sonoras perceptuais fornecidas pelo software que utilizamos no nosso trabalho. Para população-alvo pensámos em pessoas com conhecimentos musicais e pessoas com poucos ou nenhuns conhecimentos musicais, o que nos levou a solicitar a um grande número de pessoas a sua participação voluntária, anónima e sem constrangimentos no nosso estudo. Isso foi levado a cabo sobretudo através do envio de correio electrónico para amigos, para estudantes de diferentes áreas a frequentar e para colegas a trabalhar na Escola de Artes da Universidade Católica Portuguesa (EA- -UCP), na Escola Superior de Música e Artes do Espetáculo do Instituto Politécnico do Porto e na Academia de Música de Espinho. Para além disso, foi também crucial falar-se com amigos e familiares e informar tantas pessoas quanto possíıvel sobre a nossa investigação, através da colocação de cartazes informativos nas paredes dos corredores da Universidade Católica, alguns dias antes de as experiências terem sido realizadas no Laboratório de Captação de Movimento da EA-UCP. Por fim, é efectuada uma análise estatística descritiva e inferencial dos dados recolhidos. Os principais resultados apontam no sentido de ser mais fácil definir a origem do som quando a sua incidência é frontal no plano de escuta horizontal, para além de a percepção auditiva ser mais precisa nessa direcção, tal como a teoria da localização de fontes sonoras prevê. Enquanto 86.1% de todos os participantes consideram que o seu gesto díctico coincide com a origem do som na experiência 1, em que o uso desse gesto numa determinada direcção faz despoletar um som proveniente dessa direcção, apenas 58.1% admitem o mesmo na experiência 3, em que o mesmo gesto é usado para identificar a localização de uma fonte sonora perceptual predeterminada pelo sistema num ângulo de 260º em torno de um sujeito. Esta última percentagem parece dever-se ao facto de a maior parte dos sons ser produzida a partir de direcções laterais na experiência 3, tendo a posição da cabeça voltada para a câmara como referência. Pelo menos 55.9% de todos os voluntários não percebem que o seu gesto não poderia ter coincidido com a origem do som na experiência 2, já que o som é produzido a partir da direcção envolvente oposta. Este facto parece demonstrar que, quando os sons são produzidos frontalmente ou de trás e uma pessoa tem a tarefa de controlar os seus movimentos com um gesto díctico ao mesmo tempo, a sua capacidade para identificar a origem do som é, em geral, ainda mais baixa, para além da já conhecida capacidade reduzida para identificá-la quando o som se encontra no plano mediano, se a cabeça não for rodada. A maior parte dos participantes sente um controlo imediato sobre o som nas experiências 1 e 2, mas os tempos estimados pelos próprios são bastante superiores aos aproximadamente 650 milissegundos necessários para o ser humano ouvir e reagir a um som na nossa instalação interactiva. Descobrimos também que o tempo médio necessário para localizar sons com o uso de um gesto díctico na nossa experiência 3 é de cerca de 10 segundos, o que corresponde a um tempo bastante mais longo do que os 3 segundos que supusemos. Para além disso, os voluntários fazem em média 2 tentativas para localizar sons com os seus gestos dícticos, tendo a necessidade de ouvir apenas uma vez em média cada som na íntegra para o localizar. Os desvios à esquerda e à direita efectuados pela maior parte dos participantes relativamente às direcções verdadeiras do som, quando estes tentam identificar as localizações predeterminadas pelo sistema das fontes sonoras perceptuais com os seus gestos dícticos na zona periférica do corpo, são em média de 7.97º e -7.19º, respectivamente. Desta forma, o desvio médio absoluto é de 7.76º. Comparando esses desvios com aqueles levados a cabo pelos participantes usando a mão esquerda (desvios de 6.86o para a esquerda e -6.35º para a direita das direcções verdadeiras do som) e com aqueles usando a mão direita (desvios de 8.46º para a esquerda e -7.38º para a direita das direcções verdadeiras do som), concluímos que os resultados são bastante parecidos entre si. Descobrimos que a maior parte dos voluntários estima um tempo muito mais longo do que os 2 segundos que supusemos experimentalmente para entender cada uma das três experiências. Para além disso, esse tempo estimado pelos participantes diminui da primeira para a última experiência, aparentemente devido à familiarização, conscientemente provocada por nós através da mesma sequência de realização das experiências imposta a cada participante, com o nosso sistema interactivo, embora considerem ter entendido cada uma das três experiências rapidamente. Acresce que a maioria dos voluntários interage facilmente com a nossa instalação e concorda que o gesto sugerido por nós foi adequadamente seleccionado para qualquer uma das três experiências. Também constatamos que os participantes consideram a resposta do sistema ao gesto como sendo imediata nas nossas três experiências, ou seja, estimam cerca de 1 segundo, o que é consistente com o resultado da medição da latência do sistema de cerca de 470 milissegundos. Além disso, verificamos que a maioria dos voluntários se sente envolvida pelo som na nossa instalação interactiva usando Ambisonics Equivalent Panning. Portanto, concluímos que, usando uma instalação interactiva como a nossa com um público-alvo semelhante aquele que tivemos, há uma correlação relativamente elevada entre o gesto e a localização de fontes sonoras no espaço, mas que esta não é tão perfeita como poderia ser devido às limitações do nosso sistema auditivo e aparentemente à dependência natural do movimento da cabeça do gesto. Assim, parece que a espacialização sonora pode melhorar o desempenho numa instalação interactiva, mas de forma moderada. Mesmo assim, defendemos que um sistema como o nosso pode vir a ser aplicado com vantagem em domínios diversos como os que apresentamos como exemplos
    corecore