63 research outputs found

    A systematic study of binaural reproduction systems through loudspeakers:A multiple stereo-dipole approach

    Get PDF

    Experimental Sound Mixing for “The Well”, a Short Film Made for Tablets

    Get PDF
    This article presents an overview of the use of binaural recording and experimental headphone mixing for a short film. Drawing loosely on theories of proxemics, the article illustrates how sound mixing can be used to create a unique subjective perspective. In particular, the authors sought to experiment with and to use the peculiarities of stereo headphone mixing and binaural sound to reinforce visual elements of a film designed for horizontal viewing on tablets

    An investigation into the real-time manipulation and control of three-dimensional sound fields

    Get PDF
    This thesis describes a system that can be used for the decoding of a three dimensional audio recording over headphones or two, or more, speakers. A literature review of psychoacoustics and a review (both historical and current) of surround sound systems is carried out. The need for a system which is platform independent is discussed, and the proposal for a system based on an amalgamation of Ambisonics, binaural and transaural reproduction schemes is given. In order for this system to function optimally, each of the three systems rely on providing the listener with the relevant psychoacoustic cues. The conversion from a five speaker ITU array to binaural decode is well documented but pair-wise panning algorithms will not produce the correct lateralisation parameters at the ears of a centrally seated listener. Although Ambisonics has been well researched, no one has, as yet, produced a psychoacoustically optimised decoder for the standard irregular five speaker array as specified by the ITU as the original theory, as proposed by Gerzon and Barton (1992) was produced (known as a Vienna decoder), and example solutions given, before the standard had been decided on. In this work, the original work by Gerzon and Barton (1992) is analysed, and shown to be suboptimal, showing a high/low frequency decoder mismatch due to the method of solving the set of non-linear simultaneous equations. A method, based on the Tabu search algorithm, is applied to the Vienna decoder problem and is shown to provide superior results to those shown by Gerzon and Barton (1992) and is capable of producing multiple solutions to the Vienna decoder problem. During the write up of this report Craven (2003) has shown how 4th order circular harmonics (as used in Ambisonics) can be used to create a frequency independent panning law for the five speaker ITU array, and this report also shows how the Tabu search algorithm can be used to optimise these decoders further. A new method is then demonstrated using the Tabu search algorithm coupled with lateralisation parameters extracted from a binaural simulation of the Ambisonic system to be optimised (as these are the parameters that the Vienna system is approximating). This method can then be altered to take into account head rotations directly which have been shown as an important psychoacoustic parameter in the localisation of a sound source (Spikofski et al., 2001) and is also shown to be useful in differentiating between decoders optimised using the Tabu search form of the Vienna optimisations as no objective measure had been suggested. Optimisations for both Binaural and Transaural reproductions are then discussed so as to maximise the performance of generic HRTF data (i.e. not individualised) using inverse filtering methods, and a technique is shown that minimises the amount of frequency dependant regularisation needed when calculating cross-talk cancellation filters.EPRS

    Messaging in mobile augmented reality audio

    Get PDF
    Monen käyttäjän välinen asynkroninen viestintä tapahtuu tyypillisesti tekstiä käyttäen. Mobiileissa käyttötilanteissa tekstinsyöttö voi kuitenkin olla hidasta ja vaivalloista. Sekä viestien kirjoittaminen että lukeminen vaatii huomion keskittämistä laitteen näyttöön. Tässä työssä kehitettiin viestintäsovellus, jossa tekstin sijaan käytetään puhetta lyhyiden viestien jakamiseen ryhmien jäsenten välillä. Näitä viestejä voidaan kuunnella heti niiden saapuessa tai niitä voi selata ja kuunnella myöhemmin. Sovellusta on tarkoitettu käytettävän mobiilin lisätyn äänitodellisuuden alustan kanssa, mikä mahdollistaa lähes häiriintymättömän ympäristön havaitsemisen samalla kun kommunikoi ääniviestien avulla. Pieni ryhmä käyttäjiä testasi sovellusta pöytätietokoneilla ja kannettavilla tietokoneilla. Yksi isoimmista eduista tekstipohjaiseen viestintään verrattuna todettiin olevan puheen mukana välittyvä ylimääräinen tieto verrattuna samaan kirjoitettuun viestiin, puheviestinnän ollessa paljon ilmeikkäämpää. Huonoja puolia verrattuna tekstipohjaiseen viestintään olivat hankaluus selata vanhojen viestien läpi sekä vaikeus osallistua useampaan keskusteluun samaan aikaan.Asynchronous multi-user communication is typically done using text. In the context of mobile use text input can, however, be slow and cumbersome, and attention on the display of the device is required both when writing and reading messages. A messaging application was developed to test the concept of sharing short messages between members of groups using recorded speech rather than text. These messages can be listened to as they arrive, or browsed through and listened to later. The application is intended to be used on a mobile augmented reality audio platform, allowing almost undisturbed perception of and interaction with the surrounding environment while communicating using audio messages. A small group of users tested the application on desktop and laptop computers. The users found one of the biggest advantages over text-based communication to be the additional information associated with a spoken message, being much more expressive than the same written message. Compared with text chats, the users thought it was difficult to quickly browse through old messages and confusing to participate in several discussions at the same time

    PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS

    Full text link
    Multichannel acoustic signal processing has undergone major development in recent years due to the increased complexity of current audio processing applications. People want to collaborate through communication with the feeling of being together and sharing the same environment, what is considered as Immersive Audio Schemes. In this phenomenon, several acoustic e ects are involved: 3D spatial sound, room compensation, crosstalk cancelation, sound source localization, among others. However, high computing capacity is required to achieve any of these e ects in a real large-scale system, what represents a considerable limitation for real-time applications. The increase of the computational capacity has been historically linked to the number of transistors in a chip. However, nowadays the improvements in the computational capacity are mainly given by increasing the number of processing units, i.e expanding parallelism in computing. This is the case of the Graphics Processing Units (GPUs), that own now thousands of computing cores. GPUs were traditionally related to graphic or image applications, but new releases in the GPU programming environments, CUDA or OpenCL, allowed that most applications were computationally accelerated in elds beyond graphics. This thesis aims to demonstrate that GPUs are totally valid tools to carry out audio applications that require high computational resources. To this end, di erent applications in the eld of audio processing are studied and performed using GPUs. This manuscript also analyzes and solves possible limitations in each GPU-based implementation both from the acoustic point of view as from the computational point of view. In this document, we have addressed the following problems: Most of audio applications are based on massive ltering. Thus, the rst implementation to undertake is a fundamental operation in the audio processing: the convolution. It has been rst developed as a computational kernel and afterwards used for an application that combines multiples convolutions concurrently: generalized crosstalk cancellation and equalization. The proposed implementation can successfully manage two di erent and common situations: size of bu ers that are much larger than the size of the lters and size of bu ers that are much smaller than the size of the lters. Two spatial audio applications that use the GPU as a co-processor have been developed from the massive multichannel ltering. First application deals with binaural audio. Its main feature is that this application is able to synthesize sound sources in spatial positions that are not included in the database of HRTF and to generate smoothly movements of sound sources. Both features were designed after di erent tests (objective and subjective). The performance regarding number of sound source that could be rendered in real time was assessed on GPUs with di erent GPU architectures. A similar performance is measured in a Wave Field Synthesis system (second spatial audio application) that is composed of 96 loudspeakers. The proposed GPU-based implementation is able to reduce the room e ects during the sound source rendering. A well-known approach for sound source localization in noisy and reverberant environments is also addressed on a multi-GPU system. This is the case of the Steered Response Power with Phase Transform (SRPPHAT) algorithm. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate acoustic localization systems require high computational power. The solutions implemented in this thesis are evaluated both from localization and from computational performance points of view, taking into account different acoustic environments, and always from a real-time implementation perspective. Finally, This manuscript addresses also massive multichannel ltering when the lters present an In nite Impulse Response (IIR). Two cases are analyzed in this manuscript: 1) IIR lters composed of multiple secondorder sections, and 2) IIR lters that presents an allpass response. Both cases are used to develop and accelerate two di erent applications: 1) to execute multiple Equalizations in a WFS system, and 2) to reduce the dynamic range in an audio signal.Belloch Rodríguez, JA. (2014). PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/40651TESISPremios Extraordinarios de tesis doctorale

    Binaural to multichannel audio upmix

    Get PDF
    Audion tallennus- ja toistolaitteiden valikoiman kasvaessa on tärkeää, että kaikenlaisilla välineillä tallennettua sekä syntetisoitua audiota voidaan muokata toistettavaksi kaikenlaisilla äänentoistojärjestelmillä. Tässä diplomityössä esitellään menetelmä, jolla binauraalinen audiosignaali voidaan muokata toistettavaksi monikanavaisella kaiutinjärjestelmällä säilyttäen signaalin suuntainformaation. Tällaiselle muokkausmenetelmälle on tarvetta esimerkiksi etäläsnäolosovelluksissa keinona toistaa binauraalinen äänitys monikanavaisella kaiutinjärjestelmällä. Menetelmässä binauraalisesta signaalista estimoidaan ensin äänilähteiden suunnat käyttäen hyväksi korvien välistä aikaeroa. Signaali muokataan monofoniseksi, ja tulosuunnan estimoinnin antama tieto tallennetaan sivuinformaationa. Monofoninen signaali muokataan sen jälkeen halutulle monikanavaiselle kaiutinjärjestelmälle panoroimalla se tallennetun suuntainformaation mukaisesti. Käytännössä menetelmä siis muuntaa korvien välisen aikaeron kanavien väliseksi voimakkuuseroksi. Menetelmässä käytetään ja yhdistellään olemassaolevia tekniikoita tulosuunnan estimoinnille sekä panoroinnille. Menetelmää testattiin vapaamuotoisessa kuuntelukokeessa, sekä lisäämällä ääninäytteisiin binauraalista taustamelua ennen muokkausta ja arvioimalla sen vaikutusta muokatun signaalin laatuun. Menetelmän todettiin toimivan kelvollisesti sekä suuntainformaation säilymisen, että äänen laadun suhteen, ottaen huomioon, että sen kehitystyö on vasta aluillaan.The increasing diversity of popular audio recording and playback systems gives reasons to ensure that recordings made with any equipment, as well as any synthesised audio, can be reproduced for playback with all types of devices. In this thesis, a method is introduced for upmixing binaural audio into a multichannel format while preserving the correct spatial sensation. This type of upmix is required when a binaural recording is desired to be spatially reproduced for playback over a multichannel loudspeaker setup, a scenario typical for e.g. the prospective telepresence appliances. In the upmix method the sound source directions are estimated from the binaural signal by using the interaural time difference. The signal is then downmixed into a monophonic format and the data given by the azimuth estimation is stored as side-information. The monophonic signal is upmixed for an arbitrary multichannel loudspeaker setup by panning it on the basis of the spatial side-information. The method, thus effectively converting interaural time differences into interchannel level differences, employs and conjoins existing techniques for azimuth estimation and discrete panning. The method was tested in an informal listening test, as well as by adding spatial background noise into the samples before upmixing and evaluating its influence on the sound quality of the upmixed samples. The method was found to perform acceptably well in maintaining both the spatiality as well as the sound quality, regarding that much development work remains to be done

    The creation of a binaural spatialization tool

    Get PDF
    The main focus of the research presented within this thesis is, as the title suggests, binaural spatialization. Binaural technology and, especially, the binaural recording technique are not particu-larly recent. Nevertheless, the interest in this technology has lately become substantial due to the increase in the calculation power of personal computers, which started to allow the complete and accurate real-time simulation of three-dimensional sound-fields over headphones. The goals of this body of research have been determined in order to provide elements of novelty and of contribution to the state of the art in the field of binaural spatialization. A brief summary of these is found in the following list: • The development and implementation of a binaural spatialization technique with Distance Simulation, based on the individual simulation of the distance cues and Binaural Reverb, in turn based on the weighted mix between the signals convolved with the different HRIR and BRIR sets; • The development and implementation of a characterization process for modifying a BRIR set in order to simulate different environments with different characteristics in terms of frequency response and reverb time; • The creation of a real-time and offline binaural spatialization application, imple-menting the techniques cited in the previous points, and including a set of multichannel(and Ambisonics)-to-binaural conversion tools. • The performance of a perceptual evaluation stage to verify the effectiveness, realism, and quality of the techniques developed, and • The application and use of the developed tools within both scientific and artistic “case studies”. In the following chapters, sections, and subsections, the research performed between January 2006 and March 2010 will be described, outlining the different stages before, during, and after the development of the software platform, analysing the results of the perceptual evaluations and drawing conclusions that could, in the future, be considered the starting point for new and innovative research projects

    Quality-controlled audio-visual depth in stereoscopic 3D media

    Get PDF
    BACKGROUND: The literature proposes several algorithms that produce “quality-controlled” stereoscopic depth in 3D films by limiting the stereoscopic depth to a defined depth budget. Like stereoscopic displays, spatial sound systems provide the listener with enhanced (auditory) depth cues, and are now commercially available in multiple forms. AIM: We investigate the implications of introducing auditory depth cues to quality-controlled 3D media, by asking: “Is it important to quality-control audio-visual depth by considering audio-visual interactions, when integrating stereoscopic display and spatial sound systems?” MOTIVATION: There are several reports in literature of such “audio-visual interactions”, in which visual and auditory perception influence each other. We seek to answer our research question by investigating whether these audio-visual interactions could extend the depth budget used in quality-controlled 3D media. METHOD/CONCLUSIONS: The related literature is reviewed before presenting four novel experiments that build upon each other’s conclusions. In the first experiment, we show that content created with a stereoscopic depth budget creates measurable positive changes in audiences’ attitude towards 3D films. These changes are repeatable for different locations, displays and content. In the second experiment we calibrate an audio-visual display system and use it to measure the minimum audible depth difference. Our data is used to formulate recommendations for content designers and systems engineers. These recommendations include the design of an auditory depth perception screening test. We then show that an auditory-visual stimulus with a nearer auditory depth is perceived as nearer. We measure the impact of this effect upon a relative depth judgement, and investigate how the impact varies with audio-visual depth separation. Finally, the size of the cross-modal bias in depth is measured, from which we conclude that sound does have the potential to extend the depth budget by a small, but perceivable, amount

    Proposing Factors Towards a Standardised Testing Environment for Binaural and 3D Sound Systems

    Get PDF
    Binaural sound systems are a growing industry in spatial audio. For the first time, a method of defining and evaluating the efficiency of such systems is investigated. A testing, and comparison, methodology is proposed based on implicating factors which determine the location of a sound. This proposed methodology provides quantitative and qualitative comparison methods to determine the function and suggested application of any given binaural sound system. A series of tests are conducted and results provide a foundation for proposing and creating a standardised testing environment
    corecore