6 research outputs found

    Multi-channel spatialization systems for audio signals

    Get PDF
    Synthetic head related transfer functions (HRTF's) for imposing reprogrammable spatial cues to a plurality of audio input signals included, for example, in multiple narrow-band audio communications signals received simultaneously are generated and stored in interchangeable programmable read only memories (PROM's) which store both head related transfer function impulse response data and source positional information for a plurality of desired virtual source locations. The analog inputs of the audio signals are filtered and converted to digital signals from which synthetic head related transfer functions are generated in the form of linear phase finite impulse response filters. The outputs of the impulse response filters are subsequently reconverted to analog signals, filtered, mixed, and fed to a pair of headphones

    Gesture interaction with spatial audio displays: Effects of target size and inter-target separation

    Get PDF
    Presented at the 11th International Conference on Auditory Display (ICAD2005)This paper presents the results of an experiment comparing two spatial audio display segmentation techniques by investi-gating the relative salience of target width versus distance to target in a gesture based spatial audio selection task. The first technique, MINIMAL, occupies as little of the display area as possible with sounds placed as close to each other as possible. The second technique, MAXIMAL, occupies all the available display area and sounds are placed as far apart as possible and the associated display area assigned to each sound is allowed to grow. Ratios of distance to target to target width were kept constant in both displays to investigate the relative salience of distance to target versus target width in the sound selection task. Participants performed an orientation based pointing task to select an audio display element in the presence of distracting sounds. Results show that the maximal strategy results in faster and more accurate interaction. Target width was found to have significantly more impact on time ratings than distance to tar-get. Time and accuracy ratings indicate that deictic gesture interaction with a spatial audio display is a robust and efficient interaction technique

    Providing Computer Telephony to Disabled Users with Voice Recognition

    Get PDF

    Auditory perspective: perception, rendering, and applications

    Get PDF
    Nell'apprezzare gli ambienti acustici, la percezione della distanza e cruciale tanto quanto la lateralizzazione. Ancorche sia stato condotto del lavoro di ricerca sulla percezione della distanza, i moderni display uditivi non traggono ancora vantag- gio da cio al ne di fornire dell'informazione addizionale sulla disposizione nello spazio delle sorgenti acustiche in modo da arricchirsi, di conseguenza, di contenuto e qualita. Quando si progetta un display uditivo si deve tener conto dell'obiettivo dell'applicazione data e delle risorse disponibili al ne di scegliere l'approccio ot- timale. In particolare, la resa della prospettiva acustica fornisce un ordinamento gerarchico delle sorgenti sonore e permette di focalizzare l'attenzione dell'utente sulla sorgente piu vicina. A parte cio, quando i dati visuali non sono piu disponibili in quanto al di fuori del campo visivo o perche l'utente e al buio, ovvero perche e bene non adoperarli per ridurre il carico sull'attenzione visiva, il rendering uditivo deve convogliare tutta l'informazione spaziale inclusa la distanza. Questo lavoro di ricerca intende studiare la profondita acustica (sorgenti sonore dislocate di fronte all'ascoltatore) in termini di percezione, resa, e applicazioni all'interazione uomo- macchina. Dapprima si propone una rassegna degli aspetti piu importanti della percezione uditiva della distanza. Le indagini sulla percezione della distanza sono molto piu avanzate nel campo della visione, in quanto hanno gia trovato applicazioni nelle tecnologie di visualizzazione. Da cio, sembrerebbe naturale fornire la stessa in- formazione nel dominio uditivo per aumentare il grado di realismo del display complessivo. La percezione della profondita di fatto puo essere facilitata combi- nando indizi visuali e uditivi. Vengono riportati alcuni risultati di rilievo della letteratura sugli eetti dell'interazione audio-visiva, e illustrati due esperimenti sulla percezione della profondita audio-visiva. In particolare, e stata indagata l'in uenza degli indizi uditivi sull'ordinamento visuo-spaziale percepito. I risultati mostrano che la manipolazione dell'intensita acustica non in uisce sulla percezione dell'ordinamento lungo l'asse della profondita, un'evidenza dovuta probabilmente alla mancanza di integrazione multisensoriale. Inoltre, introducendo un ritardo tra i due stimoli audiovisuali, il secondo esperimento ha rivelato un eetto legato all'ordine temporale dei due stimoli visivi. Tra le tecniche esistenti per la spazializzazione della sorgente acustica lungo la dimenzione della profondita esiste uno studio che ha proposto un modello di tubo virtuale, basato sull'esagerazione del riverbero all'interno di questo ambiente. La tecnica di progetto segue un approccio a modelli sici e fa uso della Digital Waveg- uide Mesh (DWM) rettangolare 3D, la quale ha gia evidenziato la sua capacita di simulare ambienti acustici complessi in larga scala. La DMW 3D e troppo aamata di risorse per la simulazione in tempo reale di ambienti 3D di dimensioni accetta- bili. Ancorche una decimazione possa aiutare a ridurre il carico computazionale sulla CPU, un'alternativa piu eciente e quella di adoperare un modello 2D che, conseguentemente, simula una membrana. Sebbene suoni meno naturale delle sim- ulazioni in 3D, lo spazio acustico bidimensionale risultante presenta proprieta simili specialmente rispetto alla resa della profondita. Questo lavoro di ricerca dimostra anche che l'acustica virtuale permette di plas- mare la percezione della distanza e, in particolare, di compensare la nota com- pressione delle stime soggettive di distanza. A tale scopo si e proposta una DWM bidimensionale trapezoidale come ambiente virtuale capace di fornire una relazione lineare tra distanza sica e percepita. Sono stati poi condotti tre test d'ascolto per misurarne la linearita. Peraltro essi hanno dato vita a una nuova procedura di test che deriva dal test MUSHRA, adatta a condurre un confronto diretto di distanze multiple. Nello specico essa riduce la variabilita della risposta a confronto della procedura di stima di grandezze dirette. Le implementazioni in tempo reale della DWM 2D rettangolare sono state re- alizzate in forma di oggetti \external" per Max/MSP. Il primo external per- mette di rendere una o piu sorgenti acustiche statiche dislocate a diverse dis- tanze dall'ascoltatore, mentre il secondo external simula una sorgente sonora in movimento lungo la dimensione della profondita, una sorgente cioe in avvicina- mento/allontanamento. Come applicazione del primo external e stata proposta un'interfaccia audio-tattile. L'interfaccia tattile comprende un sensore lineare di posizione fatto di materiale conduttivo. La posizione del tocco sulla fascetta viene mappata sulla posizione d'ascolto di una membrana virtuale rettangolare modellata dalla DWM 2D, la quale fornisce indizi di profondita per quattro sorgenti egualmente spaziate. In ag- giunta a cio si dopera la manopola di un controller MIDI per variare la posizione della membrana lungo l'elenco dei suoni, permettendo cos di passare in rassegna l'intero insieme di suoni muovendosi avanti e indietro lungo la nestra audio cos- tituita dalla membrana virtuale. I soggetti coinvolti nella valutazione d'uso hanno avuto successo nel trovare tutti i le audio deniti come target, cos come giudicato l'interfaccia intuitiva e gradevole. Inoltre e stata realizzata un'altra dimostrazione dell'interfaccia audio-tattile adoperando modelli sici per il suono. Suoni di es- perienza quotidiana derivanti da eventi quali \friggere", \bussare", \sgocciolare" sono stati adoperati in modo che sia la creazione del suono che la sua resa in profon- dita fossero il risultato di una sintesi per modelli sici, ipotizzando che l'approccio di tipo ecologico potesse fornire un'interazione di tipo intuitivo. Inne, \DepThrow" e un gioco audio basato sull'utilizzo della DWM 2D per ren- dere indizi di profondita di una sorgente acustica dinamica. Il gioco consiste nel lanciare una palla virtuale, modellata da un modello sico di suoni di rotolamento, all'interno di un tubo virtuale inclinato e aperto alle estremita, modellato da una DWM 2D. L'obiettivo e fare rotolare la palla quanto piu in la nel tubo senza farla cadere all'estremita lontana. Dimostrato come un gioco, questo prototipo e stato pensato anche come strumento per condurre indagini sulla percezione della dis- tanza dinamica. I risultati preliminari di un test d'ascolto condotto sulla percezione della distanza variabile all'interno del tubo virtuale, hanno mostrato che la durata del rotolamento della palla in uenza la stima della distanza raggiunta.In our appreciation of auditory environments, distance perception is as crucial as lateralization. Although research work has been carried out on distance percep- tion, modern auditory displays do not yet take advantage of it to provide additional information on the spatial layout of sound sources and as a consequence enrich their content and quality. When designing a spatial auditory display, one must take into account the goal of the given application and the resources available in order to choose the optimal approach. In particular, rendering auditory perspec- tive provides a hierarchical ordering of sound sources and allows to focus the user attention on the closest sound source. Besides, when visual data are no longer available, either because they are out of the visual eld or the user is in the dark, or should be avoided to reduce the load of visual attention, auditory rendering must convey all the spatial information, including distance. The present research work aims at studying auditory depth (i.e. sound sources displayed straight ahead of the listener) in terms of perception, rendering and applications in human com- puter interaction. First, an overview is given of the most important aspects of auditory distance perception. Investigations on depth perception are much more advanced in vision since they already found applications in computer graphics. Then it seems nat- ural to give the same information in the auditory domain to increase the degree of realism of the overall display. Depth perception may indeed be facilitated by combining both visual and auditory cues. Relevant results from past literature on audio-visual interaction eects are reported, and two experiments were carried out on the perception of audio-visual depth. In particular, the in uence of auditory cues on the perceived visual layering in depth was investigated. Results show that auditory intensity manipulation does not aect the perceived order in depth, which is most probably due to the lack of multisensory integration. Besides, the second experiment, which introduced a delay between the two auditory-visual stimuli, re- vealed an eect of the temporal order of the two visual stimuli. Among existing techniques for sound source spatialization along the depth di- mension, a previous study proposed the modeling of a virtual pipe, based on the exaggeration of reverberation in such an environment. The design strategy follows a physics-based modeling approach and makes use of a 3D rectangular Digital Waveguide Mesh (DWM), which had already shown its ability to simulate complex, large-scale acoustical environments. The 3D DWM resulted to be too resource consuming for real-time simulations of 3D environments of decent size. While downsampling may help in reducing the CPU processing load, a more ef- cient alternative is to use a model in 2D, consequently simulating a membrane. Although sounding less natural than 3D simulations, the resulting bidimensional audio space presents similar properties, especially for depth rendering. The research work has also shown that virtual acoustics allows to shape depth perception and in particular to compensate for the usual compression of distance estimates. A trapezoidal bidimensional DWM is proposed as a virtual environment able to provide a linear relationship between perceived and physical distance. Three listening tests were conducted to assess the linearity. They also gave rise to a new test procedure deriving from the MUSHRA test and which is suitable for direct comparison of multiple distances. In particular, it reduces the response variability in comparison with the direct magnitude estimation procedure. Real-time implementations of the rectangular 2D DWM have been realized as Max/MSP external objects. The rst external allows to render in depth one or more static sound sources located at dierent distances from the listener, while the second external simulates one moving sound source along the depth dimension, i.e. an approaching/receding source. As an application of the rst external, an audio-tactile interface for sound naviga- tion has been proposed. The tactile interface includes a linear position sensor made by conductive material. The touch position on the ribbon is mapped onto the lis- tening position on a rectangular virtual membrane, modeled by the 2D DWM and providing depth cues of four equally spaced sound sources. Furthermore the knob of a MIDI controller controls the position of the mesh along the playlist, which allows to browse a whole set of les by moving back and forth the audio window resulting from the virtual membrane. Subjects involved in a user study succeeded in nding all the target les, and found the interface intuitive and entertaining. Furthermore, another demonstration of the audio-tactile interface was realized, using physics-based models of sounds. Everyday sounds of \frying", \knocking" and \liquid dripping" are used such that both sound creation and depth rendering are physics-based. It is believed that this ecological approach provides an intuitive interaction. Finally, \DepThrow" is an audio game, based on the use of the 2D DWM to render depth cues of a dynamic sound source. The game consists in throwing a virtual ball (modeled by a physics-based model of rolling sound) inside a virtual tube (modeled by a 2D DWM) which is open-ended and tilted. The goal is to make the ball roll as far as possible in the tube without letting it fall out at the far end. Demonstrated as a game, this prototype is also meant to be a tool for investi- gations on the perception of dynamic distance. Preliminary results of a listening test on the perception of distance motion in the virtual tube showed that duration of the ball's movement in uences the estimation of the distance reached by the rolling ball

    Interactively skimming recorded speech

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1994.Includes bibliographical references (p. 143-156).Barry Michael Arons.Ph.D

    Comparaison et combinaison de rendus visuels et sonores pour la conception d'interfaces homme-machine (des facteurs humains aux stratégies de présentation à base de distorsion.)

    Get PDF
    Bien que de plus en plus de données sonores et audiovisuelles soient disponibles, la majorité des interfaces qui permettent d y accéder reposent uniquement sur une présentation visuelle. De nombreuses techniques de visualisation ont déjà été proposées utilisant une présentation simultanée de plusieurs documents et des distorsions permettant de mettre en relief l information plus pertinente. Nous proposons de définir des équivalents auditifs pour la présentation de plusieurs fichiers sonores en concurrence, et de combiner de façon optimale les stratégies audio et visuelles pour la présentation de documents multimédia. Afin d adapter au mieux ces stratégies à l utilisateur, nous avons dirigé nos recherches sur l étude des processus perceptifs et attentionnels impliqués dans l écoute et l observation d objets audiovisuels concurrents, en insistant sur les interactions entre les deux modalités.Exploitant les paramètres de taille visuelle et de volume sonore, nous avons étendu le concept de lentille grossissante, utilisée dans les méthodes focus+contexte visuelles, aux modalités auditive et audiovisuelle. A partir de ce concept, une application de navigation dans une collection de documents vidéo a été développée. Nous avons comparé notre outil à un autre mode de rendu dit de Pan&Zoom à travers une étude d utilisabilité. Les résultats, en particulier subjectifs, encouragent à poursuivre vers des stratégies de présentation multimodales associant un rendu audio aux rendus visuels déjà disponibles.Une seconde étude a concerné l identification de sons d environnement en milieu bruité en présence d un contexte visuel. Le bruit simule la présence de plusieurs sources sonores simultanées telles qu on pourrait les retrouver dans une interface où les documents audio et audiovisuels sont présentés ensemble. Les résultats de cette expérience ont confirmé l avantage de la multimodalité en condition de dégradation. De plus, au-delà des buts premiers de la thèse, l étude a confirmé l importance de la congruence sémantique entre les composantes visuelle et sonore pour la reconnaissance d objets et a permis d approfondir les connaissances sur la perception auditive des sons d environnement.Finalement, nous nous sommes intéressée aux processus attentionnels impliqués dans la recherche d un objet parmi plusieurs, en particulier au phénomène de pop-out par lequel un objet saillant attire l attention automatiquement. En visuel, un objet net attire l attention au milieu d objets flous et certaines stratégies de présentation visuelle exploitent déjà ce paramètre visuel. Nous avons alors étendu la notion de flou aux modalités auditives et audiovisuelles par analogie. Une série d expériences perceptives a confirmé qu un objet net parmi des objets flous attire l attention, quelle que soit la modalité. Les processus de recherche et d identification sont alors accélérés quand l indice de netteté correspond à la cible, mais ralentis quand il s agit d un distracteur, mettant ainsi en avant un phénomène de guidage involontaire. Concernant l interaction intermodale, la combinaison redondante des flous audio et visuel s est révélée encore plus efficace qu une présentation unimodale. Les résultats indiquent aussi qu une combinaison optimale n implique pas d appliquer obligatoirement une distorsion sur les deux modalités.Although more and more sound and audiovisual data are available, the majority of access interfaces are solely based on a visual presentation. Many visualization techniques have been proposed that use simultaneous presentation of multiple documents and distortions to highlight the most relevant information. We propose to define equivalent audio technique for the presentation of several competing sound files, and optimally combine such audio and visual presentation strategies for multimedia documents. To better adapt these strategies to the user, we studied attentional and perceptual processes involved in listening and watching simultaneous audio-visual objects, focusing on the interactions between the two modalities.Combining visual size and sound level parameters, we extended the visual concept of magnifying lens to auditory and audiovisual modalities. Exploiting this concept, a navigation application in a video collection has been developed. We compared our tool with another rendering mode called Pan & Zoom through a usability study. Results, especially subjective results, encourage further research to develop multimodal presentation strategies by combining an audio rendering to the visual renderings already available.A second study concerned the identification of environmental sounds in a noisy environment in the presence of a visual context. The noise simulated the presence of multiple competing sounds as would be observed in an interface where several multimedia documents are presented together. The experimental results confirmed the multimodality advantage in condition of audio degradation. Moreover, beyond the primary goals of the thesis, this study confirms the importance of the semantic congruency between visual and auditory components for object recognition and provides deeper knowledge about the auditory perception of environmental sounds.Finally, we investigated attentional processes involved in the search of a specific object among many, especially the pop-out phenomenon whereby a salient object automatically attracts attention. In vision, an sharp object attracts attention among blurred objects and some visual strategies already exploit this parameter to display the information. We extended by analogy the concept of visual blur to auditory and audiovisual modalities. A serie of experiments confirmed that a perceptual object among blurred objects attracts attention, regardless of the modality. The identification and search process is then accelerated when the sharpness parameter is applied to the target, but slow when it is applied to a distractor. These results highlight an involuntary attraction effect. Concerning the crossmodal interaction, a redundant combination of audio and visual blur proved to be more effective than a unimodal presentation. Results also indicate that optimal combination does not necessarily require a distortion of both modalities.PARIS11-SCD-Bib. électronique (914719901) / SudocSudocFranceF
    corecore