513 research outputs found

    Messaging in mobile augmented reality audio

    Get PDF
    Monen käyttäjän välinen asynkroninen viestintä tapahtuu tyypillisesti tekstiä käyttäen. Mobiileissa käyttötilanteissa tekstinsyöttö voi kuitenkin olla hidasta ja vaivalloista. Sekä viestien kirjoittaminen että lukeminen vaatii huomion keskittämistä laitteen näyttöön. Tässä työssä kehitettiin viestintäsovellus, jossa tekstin sijaan käytetään puhetta lyhyiden viestien jakamiseen ryhmien jäsenten välillä. Näitä viestejä voidaan kuunnella heti niiden saapuessa tai niitä voi selata ja kuunnella myöhemmin. Sovellusta on tarkoitettu käytettävän mobiilin lisätyn äänitodellisuuden alustan kanssa, mikä mahdollistaa lähes häiriintymättömän ympäristön havaitsemisen samalla kun kommunikoi ääniviestien avulla. Pieni ryhmä käyttäjiä testasi sovellusta pöytätietokoneilla ja kannettavilla tietokoneilla. Yksi isoimmista eduista tekstipohjaiseen viestintään verrattuna todettiin olevan puheen mukana välittyvä ylimääräinen tieto verrattuna samaan kirjoitettuun viestiin, puheviestinnän ollessa paljon ilmeikkäämpää. Huonoja puolia verrattuna tekstipohjaiseen viestintään olivat hankaluus selata vanhojen viestien läpi sekä vaikeus osallistua useampaan keskusteluun samaan aikaan.Asynchronous multi-user communication is typically done using text. In the context of mobile use text input can, however, be slow and cumbersome, and attention on the display of the device is required both when writing and reading messages. A messaging application was developed to test the concept of sharing short messages between members of groups using recorded speech rather than text. These messages can be listened to as they arrive, or browsed through and listened to later. The application is intended to be used on a mobile augmented reality audio platform, allowing almost undisturbed perception of and interaction with the surrounding environment while communicating using audio messages. A small group of users tested the application on desktop and laptop computers. The users found one of the biggest advantages over text-based communication to be the additional information associated with a spoken message, being much more expressive than the same written message. Compared with text chats, the users thought it was difficult to quickly browse through old messages and confusing to participate in several discussions at the same time

    Enabling technologies for audio augmented reality systems

    Get PDF
    Audio augmented reality (AAR) refers to technology that embeds computer-generated auditory content into a user's real acoustic environment. An AAR system has specific requirements that set it apart from regular human--computer interfaces: an audio playback system to allow the simultaneous perception of real and virtual sounds; motion tracking to enable interactivity and location-awareness; the design and implementation of auditory display to deliver AAR content; and spatial rendering to display spatialised AAR content. This thesis presents a series of studies on enabling technologies to meet these requirements. A binaural headset with integrated microphones is assumed as the audio playback system, as it allows mobility and precise control over the ear input signals. Here, user position and orientation tracking methods are proposed that rely on speech signals recorded at the binaural headset microphones. To evaluate the proposed methods, the head orientations and positions of three conferees engaged in a discussion were tracked. The binaural microphones improved tracking performance substantially. The proposed methods are applicable to acoustic tracking with other forms of user-worn microphones. Results from a listening test investigating the effect of auditory display parameters on user performance are reported. The parameters studied were derived from the design choices to be made when implementing auditory display. The results indicate that users are able to detect a sound sample among distractors and estimate sample numerosity accurately with both speech and non-speech audio, if the samples are presented with adequate temporal separation. Whether or not samples were separated spatially had no effect on user performance. However, with spatially separated samples, users were able to detect a sample among distractors and simultaneously localise it. The results of this study are applicable to a variety of AAR applications that require conveying sample presence or numerosity. Spatial rendering is commonly implemented by convolving virtual sounds with head-related transfer functions (HRTFs). Here, a framework is proposed that interpolates HRTFs measured at arbitrary directions and distances. The framework employs Delaunay triangulation to group HRTFs into subsets suitable for interpolation and barycentric coordinates as interpolation weights. The proposed interpolation framework allows the realtime rendering of virtual sources in the near-field via HRTFs measured at various distances

    Bioinspired auditory sound localisation for improving the signal to noise ratio of socially interactive robots

    Get PDF
    In this paper we describe a bioinspired hybrid architecture for acoustic sound source localisation and tracking to increase the signal to noise ratio (SNR) between speaker and background sources for a socially interactive robot's speech recogniser system. The model presented incorporates the use of Interaural Time Differ- ence for azimuth estimation and Recurrent Neural Net- works for trajectory prediction. The results are then pre- sented showing the difference in the SNR of a localised and non-localised speaker source, in addition to presenting the recognition rates between a localised and non-localised speaker source. From the results presented in this paper it can be seen that by orientating towards the sound source of interest the recognition rates of that source can be in- creased

    Video-aided model-based source separation in real reverberant rooms

    Get PDF
    Source separation algorithms that utilize only audio data can perform poorly if multiple sources or reverberation are present. In this paper we therefore propose a video-aided model-based source separation algorithm for a two-channel reverberant recording in which the sources are assumed static. By exploiting cues from video, we first localize individual speech sources in the enclosure and then estimate their directions. The interaural spatial cues, the interaural phase difference and the interaural level difference, as well as the mixing vectors are probabilistically modeled. The models make use of the source direction information and are evaluated at discrete timefrequency points. The model parameters are refined with the wellknown expectation-maximization (EM) algorithm. The algorithm outputs time-frequency masks that are used to reconstruct the individual sources. Simulation results show that by utilizing the visual modality the proposed algorithm can produce better timefrequency masks thereby giving improved source estimates. We provide experimental results to test the proposed algorithm in different scenarios and provide comparisons with both other audio-only and audio-visual algorithms and achieve improved performance both on synthetic and real data. We also include dereverberation based pre-processing in our algorithm in order to suppress the late reverberant components from the observed stereo mixture and further enhance the overall output of the algorithm. This advantage makes our algorithm a suitable candidate for use in under-determined highly reverberant settings where the performance of other audio-only and audio-visual methods is limited

    Studies on binaural and monaural signal analysis methods and applications

    Get PDF
    Sound signals can contain a lot of information about the environment and the sound sources present in it. This thesis presents novel contributions to the analysis of binaural and monaural sound signals. Some new applications are introduced in this work, but the emphasis is on analysis methods. The three main topics of the thesis are computational estimation of sound source distance, analysis of binaural room impulse responses, and applications intended for augmented reality audio. A novel method for binaural sound source distance estimation is proposed. The method is based on learning the coherence between the sounds entering the left and right ears. Comparisons to an earlier approach are also made. It is shown that these kinds of learning methods can correctly recognize the distance of a speech sound source in most cases. Methods for analyzing binaural room impulse responses are investigated. These methods are able to locate the early reflections in time and also to estimate their directions of arrival. This challenging problem could not be tackled completely, but this part of the work is an important step towards accurate estimation of the individual early reflections from a binaural room impulse response. As the third part of the thesis, applications of sound signal analysis are studied. The most notable contributions are a novel eyes-free user interface controlled by finger snaps, and an investigation on the importance of features in audio surveillance. The results of this thesis are steps towards building machines that can obtain information on the surrounding environment based on sound. In particular, the research into sound source distance estimation functions as important basic research in this area. The applications presented could be valuable in future telecommunications scenarios, such as augmented reality audio

    Evaluation of an Augmented Reality Audio Headset and Mixer

    Get PDF
    Lisätty Audiotodellisuus (LAT) on käsite, joka on määritelty todellisen ja virtuaalisen maailman reaaliaikaisena yhdistelmänä. Täten jokapäiväiväiseen äänimaailmaan voidaan lisätä virtuaalisia ääniobjekteja. Lisätyn audiotodellisuuden laitteisto, jota tutkitaan tässä työssä, koostuu kuulokeparista sekä kontrolliyksiköstä, nimeltään LAT-mikseri. LAT-kuulokkeet koostuvat binauraalisista kuuloke-elementeistä sekä sisäänrakennetuista mikrofoneista. LAT-mikserissä on kaikki LAT-sovellusten tarvitsemat liittimet sekä signaalinkäsittelyelektroniikka. LAT-kuulokkeiden toimintaperiaate perustuu siihen, että binauraalisten mikrofonien tulisi välittää äänisignaalit muuttumattomana kuuloke-elementeille, jotta todellinen äänimaailma saataisiin kopioitua muuttumattomana. Valitettavasti LAT-kuulokkeet aiheuttavat muutoksia kopioituun äänimaailmaan. Näiden muutoksien takia tarvitaan LAT-mikseriä ekvalisoimaan kuulokkeita. LAT-mikseri mahdollistaa myös virtuaalisten ääniobjektien lisäämisen. Virtuaaliset ääniobjektit voidaan lisätä todellisen äänimaailmaan siten, että käyttäjä voi erottaa ne todellisesta äänimaailmasta tai siten, että käyttäjä ei erota virtuaalisia ja todellisia äänilähteitä toisistaan. Tämän diplomityön tavoitteena on mitata LAT-laitteiston suorituskykyä erilaisten laboratoriomittausten avulla sekä suorittaa käyttäjäkoe. Mittausten ja käyttäjäkokeen avulla pyritään selvittämään LAT-laitteiston tekniset tiedot sekä ymmärtämään miten käyttäjät kokevat LATlaitteiston käytettävyyden jokapäiväisessä elämässä. Kerätyn informaation avulla on mahdollista kehittää LAT-laitteiston käytettävyyttä sekä äänenlaatua.Augmented Reality Audio (ARA) is a concept that is defined as a real-time combination of real and virtual auditory worlds, that is, the everyday sound surroundings can be extended with virtual sounds. The hardware used in this study for augmented reality audio consists of a pair of headphones and a controlling unit, called an ARA mixer. The ARA headphones are composed of binaural earphone elements with integrated microphones. The ARA mixer provides all the connections and signal processing electronics needed in ARA applications. The basic operating principle of the ARA headset is that the binaural microphones should relay the sound signals unaltered to the earphones in order to create an accurate copy of the surrounding sound environment. Unfortunately, the ARA headset creates some alterations to the copied representation of the real sound environment. Because of these alterations, the ARA mixer is needed to equalize the headphones. Furthermore, the ARA mixer enables the addition of virtual sound objects. Virtual sound objects can be embedded into the real environment in a way that the user can distinguish them from the real sound environment or in a way that the user cannot tell the difference between the real and virtual sounds. The aim of this thesis is to perform full-scale laboratory measurements and an usability evaluation of the ARA hardware. The objective is to collect technical data about the hardware and to gather knowledge concerning how users perceive the usability of the ARA headset in everyday-life situations. With the gathered information it is possible to further improve the usability and sound quality of the ARA hardware

    Sound Source Separation

    Get PDF
    This is the author's accepted pre-print of the article, first published as G. Evangelista, S. Marchand, M. D. Plumbley and E. Vincent. Sound source separation. In U. Zölzer (ed.), DAFX: Digital Audio Effects, 2nd edition, Chapter 14, pp. 551-588. John Wiley & Sons, March 2011. ISBN 9781119991298. DOI: 10.1002/9781119991298.ch14file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.2

    Closed-loop sound source localization in neuromorphic systems

    Get PDF
    Sound source localization (SSL) is used in various applications such as industrial noise-control, speech detection in mobile phones, speech enhancement in hearing aids and many more. Newest video conferencing setups use SSL. The position of a speaker is detected from the difference in the audio waves received by a microphone array. After detection the camera focuses onto the location of the speaker. The human brain is also able to detect the location of a speaker from auditory signals. It uses, among other cues, the difference in amplitude and arrival time of the sound wave at the two ears, called interaural level and time difference. However, the substrate and computational primitives of our brain are different from classical digital computing. Due to its low power consumption of around 20 W and its performance in real time the human brain has become a great source of inspiration for emerging technologies. One of these technologies is neuromorphic hardware which implements the fundamental principles of brain computing identified until today using complementary metal-oxide-semiconductor technologies and new devices. In this work we propose the first neuromorphic closed-loop robotic system that uses the interaural time difference for SSL in real time. Our system can successfully locate sound sources such as human speech. In a closed-loop experiment, the robotic platform turned immediately into the direction of the sound source with a turning velocity linearly proportional to the angle difference between sound source and binaural microphones. After this initial turn, the robotic platform remains at the direction of the sound source. Even though the system only uses very few resources of the available hardware, consumes around 1 W, and was only tuned by hand, meaning it does not contain any learning at all, it already reaches performances comparable to other neuromorphic approaches. The SSL system presented in this article brings us one step closer towards neuromorphic event-based systems for robotics and embodied computing
    corecore