33 research outputs found

    Messaging in mobile augmented reality audio

    Get PDF
    Monen käyttäjän välinen asynkroninen viestintä tapahtuu tyypillisesti tekstiä käyttäen. Mobiileissa käyttötilanteissa tekstinsyöttö voi kuitenkin olla hidasta ja vaivalloista. Sekä viestien kirjoittaminen että lukeminen vaatii huomion keskittämistä laitteen näyttöön. Tässä työssä kehitettiin viestintäsovellus, jossa tekstin sijaan käytetään puhetta lyhyiden viestien jakamiseen ryhmien jäsenten välillä. Näitä viestejä voidaan kuunnella heti niiden saapuessa tai niitä voi selata ja kuunnella myöhemmin. Sovellusta on tarkoitettu käytettävän mobiilin lisätyn äänitodellisuuden alustan kanssa, mikä mahdollistaa lähes häiriintymättömän ympäristön havaitsemisen samalla kun kommunikoi ääniviestien avulla. Pieni ryhmä käyttäjiä testasi sovellusta pöytätietokoneilla ja kannettavilla tietokoneilla. Yksi isoimmista eduista tekstipohjaiseen viestintään verrattuna todettiin olevan puheen mukana välittyvä ylimääräinen tieto verrattuna samaan kirjoitettuun viestiin, puheviestinnän ollessa paljon ilmeikkäämpää. Huonoja puolia verrattuna tekstipohjaiseen viestintään olivat hankaluus selata vanhojen viestien läpi sekä vaikeus osallistua useampaan keskusteluun samaan aikaan.Asynchronous multi-user communication is typically done using text. In the context of mobile use text input can, however, be slow and cumbersome, and attention on the display of the device is required both when writing and reading messages. A messaging application was developed to test the concept of sharing short messages between members of groups using recorded speech rather than text. These messages can be listened to as they arrive, or browsed through and listened to later. The application is intended to be used on a mobile augmented reality audio platform, allowing almost undisturbed perception of and interaction with the surrounding environment while communicating using audio messages. A small group of users tested the application on desktop and laptop computers. The users found one of the biggest advantages over text-based communication to be the additional information associated with a spoken message, being much more expressive than the same written message. Compared with text chats, the users thought it was difficult to quickly browse through old messages and confusing to participate in several discussions at the same time

    Improvements in the Perceived Quality of Streaming and Binaural Rendering of Ambisonics

    Get PDF
    With the increasing popularity of spatial audio content streaming and interactive binaural audio rendering, it is pertinent to study the quality of the critical components of such systems. This includes low-bitrate compression of Ambisonic scenes and binaural rendering schemes. This thesis presents a group of perceptual experiments focusing on these two elements of the Ambisonic delivery chain. The first group of experiments focused on the quality of low-bitrate compression of Ambisonics. The first study evaluated the perceived timbral quality degradation introduced by the Opus audio codec at different bitrate settings and Ambisonic orders. This experiment was conducted using multi-loudspeaker reproduction as well as binaural rendering. The second study has been dedicated to auditory localisation performance in bitrate-compressed Ambisonic scenes reproduced over loudspeakers and binaurally using generic and individually measured HRTF sets. Finally, the third study extended the evaluated set of codec parameters by testing different channel mappings and various audio stimuli contexts. This study was conducted in VR thanks to a purposely developed listening test framework. The comprehensive evaluation of the Opus codec led to a set of recommendations regarding optimal codec parameters. The second group of experiments focused on the evaluation of different methods for binaural rendering of Ambisonics. The first study in this group focused on the implementation of the established methods for designing Ambisonic-to-binaural filters and subsequent objective and subjective evaluations of these. The second study explored the concept of hybrid binaural rendering combining anechoic filters with reverberant ones. Finally, addressing the problem of non-individual HRTFs used for spatial audio rendering, an XR-based method for acquiring individual HRTFs using a single loudspeaker has been proposed. The conducted perceptual evaluations identified key areas where the Ambisonic delivery chain could be improved to provide a more satisfactory user experience

    Enabling technologies for audio augmented reality systems

    Get PDF
    Audio augmented reality (AAR) refers to technology that embeds computer-generated auditory content into a user's real acoustic environment. An AAR system has specific requirements that set it apart from regular human--computer interfaces: an audio playback system to allow the simultaneous perception of real and virtual sounds; motion tracking to enable interactivity and location-awareness; the design and implementation of auditory display to deliver AAR content; and spatial rendering to display spatialised AAR content. This thesis presents a series of studies on enabling technologies to meet these requirements. A binaural headset with integrated microphones is assumed as the audio playback system, as it allows mobility and precise control over the ear input signals. Here, user position and orientation tracking methods are proposed that rely on speech signals recorded at the binaural headset microphones. To evaluate the proposed methods, the head orientations and positions of three conferees engaged in a discussion were tracked. The binaural microphones improved tracking performance substantially. The proposed methods are applicable to acoustic tracking with other forms of user-worn microphones. Results from a listening test investigating the effect of auditory display parameters on user performance are reported. The parameters studied were derived from the design choices to be made when implementing auditory display. The results indicate that users are able to detect a sound sample among distractors and estimate sample numerosity accurately with both speech and non-speech audio, if the samples are presented with adequate temporal separation. Whether or not samples were separated spatially had no effect on user performance. However, with spatially separated samples, users were able to detect a sample among distractors and simultaneously localise it. The results of this study are applicable to a variety of AAR applications that require conveying sample presence or numerosity. Spatial rendering is commonly implemented by convolving virtual sounds with head-related transfer functions (HRTFs). Here, a framework is proposed that interpolates HRTFs measured at arbitrary directions and distances. The framework employs Delaunay triangulation to group HRTFs into subsets suitable for interpolation and barycentric coordinates as interpolation weights. The proposed interpolation framework allows the realtime rendering of virtual sources in the near-field via HRTFs measured at various distances

    Custom architecture for multicore audio Beamforming systems

    Get PDF
    The audio Beamforming (BF) technique utilizes microphone arrays to extract acoustic sources recorded in a noisy environment. In this article, we propose a new approach for rapid development of multicore BF systems. Research on literature reveals that the majority of such experimental and commercial audio systems are based on desktop PCs, due to their high-level programming support and potential of rapid system development. However, these approaches introduce performance bottlenecks, excessive power consumption, and increased overall cost. Systems based on DSPs require very low power, but their performance is still limited. Custom hardware solutions alleviate the aforementioned drawbacks, however, designers primarily focus on performance optimization without providing a high-level interface for system control and test. In order to address the aforementioned problems, we propose a custom platform-independent architecture for reconfigurable audio BF systems. To evaluate our proposal, we implement our architecture as a heterogeneous multicore reconfigurable processor and map it onto FPGAs. Our approach combines the software flexibility of General-Purpose Processors (GPPs) with the computational power of multicore platforms. In order to evaluate our system we compare it against a BF software application implemented to a low-power Atom 330, amiddle-ranged Core2 Duo, and a high-end Core i3. Experimental results suggest that our proposed solution can extract up to 16 audio sources in real time under a 16-microphone setup. In contrast, under the same setup, the Atom 330 cannot extract any audio sources in real time, while the Core2 Duo and the Core i3 can process in real time only up to 4 and 6 sources respectively. Furthermore, a Virtex4-based BF system consumes more than an order less energy compared to the aforementioned GPP-based approaches. © 2013 ACM

    Surround by Sound: A Review of Spatial Audio Recording and Reproduction

    Get PDF
    In this article, a systematic overview of various recording and reproduction techniques for spatial audio is presented. While binaural recording and rendering is designed to resemble the human two-ear auditory system and reproduce sounds specifically for a listener’s two ears, soundfield recording and reproduction using a large number of microphones and loudspeakers replicate an acoustic scene within a region. These two fundamentally different types of techniques are discussed in the paper. A recent popular area, multi-zone reproduction, is also briefly reviewed in the paper. The paper is concluded with a discussion of the current state of the field and open problemsThe authors acknowledge National Natural Science Foundation of China (NSFC) No. 61671380 and Australian Research Council Discovery Scheme DE 150100363

    A system for room acoustic simulation for one's own voice

    Get PDF
    The real-time simulation of room acoustical environments for one’s own voice, using generic software, has been difficult until very recently due to the computational load involved: requiring real-time convolution of a person’s voice with a potentially large number of long room impulse responses. This thesis is presenting a room acoustical simulation system with a software-based solution to perform real-time convolutions with headtracking; to simulate the effect of room acoustical environments on the sound of one’s own voice, using binaural technology. In order to gather data to implement headtracking in the system, human head- movements are characterized while reading a text aloud. The rooms that are simulated with the system are actual rooms that are characterized by measuring the room impulse response from the mouth to ears of the same head (oral binaural room impulse response, OBRIR). By repeating this process at 2o increments in the yaw angle on the horizontal plane, the rooms are binaurally scanned around a given position to obtain a collection of OBRIRs, which is then used by the software-based convolution system. In the rooms that are simulated with the system, a person equipped with a near- mouth microphone and near-ear loudspeakers can speak or sing, and hear their voice as it would sound in the measured rooms, while physically being in an anechoic room. By continually updating the person’s head orientation using headtracking, the corresponding OBRIR is chosen for convolution with their voice. The system described in this thesis achieves the low latency that is required to simulate nearby reflections, and it can perform convolution with long room impulse responses. The perceptual validity of the system is studied with two experiments, involving human participants reading aloud a set-text. The system presented in this thesis can be used to design experiments that study the various aspects of the auditory perception of the sound of one’s own voice in room environments. The system can also be adapted to incorporate a module that enables listening to the sound of one’s own voice in commercial applications such as architectural acoustic room simulation software, teleconferencing systems, virtual reality and gaming applications, etc

    A system for room acoustic simulation for one's own voice

    Get PDF
    The real-time simulation of room acoustical environments for one’s own voice, using generic software, has been difficult until very recently due to the computational load involved: requiring real-time convolution of a person’s voice with a potentially large number of long room impulse responses. This thesis is presenting a room acoustical simulation system with a software-based solution to perform real-time convolutions with headtracking; to simulate the effect of room acoustical environments on the sound of one’s own voice, using binaural technology. In order to gather data to implement headtracking in the system, human head- movements are characterized while reading a text aloud. The rooms that are simulated with the system are actual rooms that are characterized by measuring the room impulse response from the mouth to ears of the same head (oral binaural room impulse response, OBRIR). By repeating this process at 2o increments in the yaw angle on the horizontal plane, the rooms are binaurally scanned around a given position to obtain a collection of OBRIRs, which is then used by the software-based convolution system. In the rooms that are simulated with the system, a person equipped with a near- mouth microphone and near-ear loudspeakers can speak or sing, and hear their voice as it would sound in the measured rooms, while physically being in an anechoic room. By continually updating the person’s head orientation using headtracking, the corresponding OBRIR is chosen for convolution with their voice. The system described in this thesis achieves the low latency that is required to simulate nearby reflections, and it can perform convolution with long room impulse responses. The perceptual validity of the system is studied with two experiments, involving human participants reading aloud a set-text. The system presented in this thesis can be used to design experiments that study the various aspects of the auditory perception of the sound of one’s own voice in room environments. The system can also be adapted to incorporate a module that enables listening to the sound of one’s own voice in commercial applications such as architectural acoustic room simulation software, teleconferencing systems, virtual reality and gaming applications, etc

    Methods and applications of mobile audio augmented reality

    Get PDF
    In augmented reality, virtual objects are presented as if they were a part of the real world. In mobile audio augmented reality, sounds presented with headphones are perceived as if they originated from the surrounding environment. This thesis investigates potential applications of mobile audio augmented reality and different methods that are needed in these applications. The two main topics studied are distance presentation and spatial audio guidance. Reverberation is known to be an important factor affecting the perceived distance of sound sources. Here, a practical method for modifying the perceived distance of virtual sound sources is investigated, where the temporal envelopes of binaural room impulse responses (BRIRs) are modified. In a listening test, speech sources were presented using these modified BRIRs. The results show that the perceived distance is controlled most effectively by modifying an early-to-late energy ratio with the first 50–100 ms of the BRIR included in the early energy. Presenting large distances in an audio augmented reality environment is difficult, since people underestimate the distances of distant sound sources and very distant sound sources cannot even be heard. In a user study, the presentation of points of interest (POIs) outdoors using auditory distance cues was compared with a voice saying the distance in meters. The results suggest that distances should be given in meters if fairly accurate distance estimates are needed without prior training. With training, however, the user study participants were able to estimate the distances of the POIs fairly accurately based on the provided auditory distance cues, performing the task faster than when the distances were presented in meters. In addition to the presentation of POIs, another type of spatial audio guidance is investigated: using spatialized music to guide pedestrians and cyclists to their destination. Two forms of guidance, route and beacon guidance, were tested in different environments. The user studies showed that music guidance is a pleasant and effective aid for navigation. Both route and beacon guidance were effective methods, but suitable for different environments and circumstances. This thesis also investigates a mobile teleconferencing scenario, where participants can move freely from one location to another. With hear-through headphones, co-located participants can hear each other naturally. To avoid transmitting the speech of the participants to other participants in the same room – as this would be perceived as an echo – acoustic co-location detection is applied. In a user study, utilization of acoustic co-location detection was shown to improve the clarity of communication. Together, the studies presented in this thesis provide methods and guidelines for the development of mobile audio augmented reality applications

    Adaptiiviset läpikuuluvuuskuulokkeet

    Get PDF
    Hear-through equalization can be used to make a headset acoustically transparent, i.e.~to produce sound perception that is similar to perception without the headset. The headset must have microphones outside the earpieces to capture the ambient sounds, which is then reproduced with the headset transducers after the equalization. The reproduced signal is called the hear-through signal. Equalization is needed, since the headset affects the acoustics of the outer ear. \\ In addition to the external microphones, the headset used in this study has additional internal microphones. Together these microphones can be used to estimate the attenuation of the headset online and to detect poor fit. Since the poor fit causes leaks and decreased attenuation, the combined effect of the leaked sound and the hear-through signal changes, when compared to proper fit situation. Therefore, the isolation estimate is used to control the hear-through equalization in order to produce better acoustical transparency. Furthermore, the proposed adaptive hear-through algorithm includes manual controls for the equalizers and the volume of the hear-through signal. \\ The proposed algorithm is found to transform the used headset acoustically transparent. The equalization controls improve the performance of the headset, when the fit is poor or when the volume of the hear-through signal is adjusted, by reducing the comb-filtering effect due to the summation of the leaked sound and the hear-through signal inside the ear canal. The behavior of the proposed algorithm can be demonstrated with an implemented Matlab simulator.Läpikuuluvuusekvalisoinnilla voidaan saavuttaa akustinen läpinäkyvyys kuulokkeita käytettäessä, eli tuottaa samankaltainen ääniaistimus kuin mikä havaittaisiin ilman kuulokkeita. Käytetyissä kuulokkeissa tulee olla mikrofonit kuulokkeen ulkopinnalla, joiden avulla voidaan tallentaa ympäröiviä ääniä. Mikrofonisignaalit ekvalisoidaan, jolloin niistä tulee läpikuuluvuussignaalit, ja toistetaan kuulokkeista. Ekvalisointi on tarpeellista, sillä kuulokkeet muuttavat ulkokorvan akustiikka ja siten myös äänihavaintoa. \\ Tässä diplomityössä käytetyssä prototyyppikuulokeparissa on edellä mainittujen mikrofonien lisäksi myös toiset, korvakäytävän sisälle asettuvat mikrofonit. Yhdessä näiden kahden mikrofonin avulla voidaan määrittää reaaliaikainen estimaatti kuulokkeen vaimennukselle ja tunnistaa huono istuvuus. Koska huonosti asetettu kuuloke vuotaa enemmän ääntä korvakäytävän sisään kuin kunnolla asetettu, kuulokkeen äänen ja vuotavan äänen yhteisvaikutus muuttuu. Tästä syystä vaimennusestimaattia käytetään läpikuuluvuusekvalisoinnin säätöön, jotta akustinen läpinäkyvyys ei kärsisi. Lisäksi esitellyssä algoritmissa on manuaaliset säädöt ekvalisaattoreille ja läpikuuluvuussignaalin voimakkuudelle.\\ Esitetyn algoritmin havaitaan tuottavan akustinen läpinäkyvyys, kun sitä käytetään prototyyppikuulokkeiden kanssa. Ekvalisointisäädöt parantavat kuulokkeiden toimintaa istuvuuden ollessa huono tai säädettäessä läpikuuluvuussignaalin voimakkuutta, koska ne vähentävät kampasuodatusefektiä, joka voi aiheutua vuotavan äänen ja läpikuuluvuussignaalin summautuessa. Esitellyn algoritmin toimintaa voidaan havainnollistaa toteutetulla Matlab-simulaattorilla

    Beiträge zu breitbandigen Freisprechsystemen und ihrer Evaluation

    Get PDF
    This work deals with the advancement of wideband hands-free systems (HFS’s) for mono- and stereophonic cases of application. Furthermore, innovative contributions to the corr. field of quality evaluation are made. The proposed HFS approaches are based on frequency-domain adaptive filtering for system identification, making use of Kalman theory and state-space modeling. Functional enhancement modules are developed in this work, which improve one or more of key quality aspects, aiming at not to harm others. In so doing, these modules can be combined in a flexible way, dependent on the needs at hand. The enhanced monophonic HFS is evaluated according to automotive ITU-T recommendations, to prove its customized efficacy. Furthermore, a novel methodology and techn. framework are introduced in this work to improve the prototyping and evaluation process of automotive HF and in-car-communication (ICC) systems. The monophonic HFS in several configurations hereby acts as device under test (DUT) and is thoroughly investigated, which will show the DUT’s satisfying performance, as well as the advantages of the proposed development process. As current methods for the evaluation of HFS’s in dynamic conditions oftentimes still lack flexibility, reproducibility, and accuracy, this work introduces “Car in a Box” (CiaB) as a novel, improved system for this demanding task. It is able to enhance the development process by performing high-resolution system identification of dynamic electro-acoustical systems. The extracted dyn. impulse response trajectories are then applicable to arbitrary input signals in a synthesis operation. A realistic dynamic automotive auralization of a car cabin interior is available for HFS evaluation. It is shown that this system improves evaluation flexibility at guaranteed reproducibility. In addition, the accuracy of evaluation methods can be increased by having access to exact, realistic imp. resp. trajectories acting as a so-called “ground truth” reference. If CiaB is included into an automotive evaluation setup, there is no need for an acoustical car interior prototype to be present at this stage of development. Hency, CiaB may ease the HFS development process. Dynamic acoustic replicas may be provided including an arbitrary number of acoustic car cabin interiors for multiple developers simultaneously. With CiaB, speech enh. system developers therefore have an evaluation environment at hand, which can adequately replace the real environment.Diese Arbeit beschäftigt sich mit der Weiterentwicklung breitbandiger Freisprechsysteme für mono-/stereophone Anwendungsfälle und liefert innovative Beiträge zu deren Qualitätsmessung. Die vorgestellten Verfahren basieren auf im Frequenzbereich adaptierenden Algorithmen zur Systemidentifikation gemäß Kalman-Theorie in einer Zustandsraumdarstellung. Es werden funktionale Erweiterungsmodule dahingehend entwickelt, dass mindestens eine Qualitätsanforderung verbessert wird, ohne andere eklatant zu verletzen. Diese nach Anforderung flexibel kombinierbaren algorithmischen Erweiterungen werden gemäß Empfehlungen der ITU-T (Rec. P.1110/P.1130) in vorwiegend automotiven Testszenarien getestet und somit deren zielgerichtete Wirksamkeit bestätigt. Es wird eine Methodensammlung und ein technisches System zur verbesserten Prototypentwicklung/Evaluation von automotiven Freisprech- und Innenraumkommunikationssystemen vorgestellt und beispielhaft mit dem monophonen Freisprechsystem in diversen Ausbaustufen zur Anwendung gebracht. Daraus entstehende Vorteile im Entwicklungs- und Testprozess von Sprachverbesserungssystem werden dargelegt und messtechnisch verifiziert. Bestehende Messverfahren zum Verhalten von Freisprechsystemen in zeitvarianten Umgebungen zeigten bisher oft nur ein unzureichendes Maß an Flexibilität, Reproduzierbarkeit und Genauigkeit. Daher wird hier das „Car in a Box“-Verfahren (CiaB) entwickelt und vorgestellt, mit dem zeitvariante elektro-akustische Systeme technisch identifiziert werden können. So gewonnene dynamische Impulsantworten können im Labor in einer Syntheseoperation auf beliebige Eingangsignale angewandt werden, um realistische Testsignale unter dyn. Bedingungen zu erzeugen. Bei diesem Vorgehen wird ein hohes Maß an Flexibilität bei garantierter Reproduzierbarkeit erlangt. Es wird gezeigt, dass die Genauigkeit von darauf basierenden Evaluationsverfahren zudem gesteigert werden kann, da mit dem Vorliegen von exakten, realen Impulsantworten zu jedem Zeitpunkt der Messung eine sogenannte „ground truth“ als Referenz zur Verfügung steht. Bei der Einbindung von CiaB in einen Messaufbau für automotive Freisprechsysteme ist es bedeutsam, dass zu diesem Zeitpunkt das eigentliche Fahrzeug nicht mehr benötigt wird. Es wird gezeigt, dass eine dyn. Fahrzeugakustikumgebung, wie sie im Entwicklungsprozess von automotiven Sprachverbesserungsalgorithmen benötigt wird, in beliebiger Anzahl vollständig und mind. gleichwertig durch CiaB ersetzt werden kann
    corecore