6 research outputs found
Multi-channel spatialization systems for audio signals
Synthetic head related transfer functions (HRTF's) for imposing reprogrammable spatial cues to a plurality of audio input signals included, for example, in multiple narrow-band audio communications signals received simultaneously are generated and stored in interchangeable programmable read only memories (PROM's) which store both head related transfer function impulse response data and source positional information for a plurality of desired virtual source locations. The analog inputs of the audio signals are filtered and converted to digital signals from which synthetic head related transfer functions are generated in the form of linear phase finite impulse response filters. The outputs of the impulse response filters are subsequently reconverted to analog signals, filtered, mixed, and fed to a pair of headphones
Gesture interaction with spatial audio displays: Effects of target size and inter-target separation
Presented at the 11th International Conference on Auditory Display (ICAD2005)This paper presents the results of an experiment comparing two spatial audio display segmentation techniques by investi-gating the relative salience of target width versus distance to target in a gesture based spatial audio selection task. The first technique, MINIMAL, occupies as little of the display area as possible with sounds placed as close to each other as possible. The second technique, MAXIMAL, occupies all the available display area and sounds are placed as far apart as possible and the associated display area assigned to each sound is allowed to grow. Ratios of distance to target to target width were kept constant in both displays to investigate the relative salience of distance to target versus target width in the sound selection task. Participants performed an orientation based pointing task to select an audio display element in the presence of distracting sounds. Results show that the maximal strategy results in faster and more accurate interaction. Target width was found to have significantly more impact on time ratings than distance to tar-get. Time and accuracy ratings indicate that deictic gesture interaction with a spatial audio display is a robust and efficient interaction technique
Auditory perspective: perception, rendering, and applications
Nell'apprezzare gli ambienti acustici, la percezione della distanza e cruciale tanto
quanto la lateralizzazione. Ancorche sia stato condotto del lavoro di ricerca sulla
percezione della distanza, i moderni display uditivi non traggono ancora vantag-
gio da cio al ne di fornire dell'informazione addizionale sulla disposizione nello
spazio delle sorgenti acustiche in modo da arricchirsi, di conseguenza, di contenuto
e qualita. Quando si progetta un display uditivo si deve tener conto dell'obiettivo
dell'applicazione data e delle risorse disponibili al ne di scegliere l'approccio ot-
timale. In particolare, la resa della prospettiva acustica fornisce un ordinamento
gerarchico delle sorgenti sonore e permette di focalizzare l'attenzione dell'utente
sulla sorgente piu vicina. A parte cio, quando i dati visuali non sono piu disponibili
in quanto al di fuori del campo visivo o perche l'utente e al buio, ovvero perche e
bene non adoperarli per ridurre il carico sull'attenzione visiva, il rendering uditivo
deve convogliare tutta l'informazione spaziale inclusa la distanza. Questo lavoro di
ricerca intende studiare la profondita acustica (sorgenti sonore dislocate di fronte
all'ascoltatore) in termini di percezione, resa, e applicazioni all'interazione uomo-
macchina.
Dapprima si propone una rassegna degli aspetti piu importanti della percezione
uditiva della distanza. Le indagini sulla percezione della distanza sono molto piu
avanzate nel campo della visione, in quanto hanno gia trovato applicazioni nelle
tecnologie di visualizzazione. Da cio, sembrerebbe naturale fornire la stessa in-
formazione nel dominio uditivo per aumentare il grado di realismo del display
complessivo. La percezione della profondita di fatto puo essere facilitata combi-
nando indizi visuali e uditivi. Vengono riportati alcuni risultati di rilievo della
letteratura sugli eetti dell'interazione audio-visiva, e illustrati due esperimenti
sulla percezione della profondita audio-visiva. In particolare, e stata indagata
l'in
uenza degli indizi uditivi sull'ordinamento visuo-spaziale percepito. I risultati
mostrano che la manipolazione dell'intensita acustica non in
uisce sulla percezione
dell'ordinamento lungo l'asse della profondita, un'evidenza dovuta probabilmente
alla mancanza di integrazione multisensoriale. Inoltre, introducendo un ritardo
tra i due stimoli audiovisuali, il secondo esperimento ha rivelato un eetto legato
all'ordine temporale dei due stimoli visivi.
Tra le tecniche esistenti per la spazializzazione della sorgente acustica lungo la
dimenzione della profondita esiste uno studio che ha proposto un modello di tubo
virtuale, basato sull'esagerazione del riverbero all'interno di questo ambiente. La
tecnica di progetto segue un approccio a modelli sici e fa uso della Digital Waveg-
uide Mesh (DWM) rettangolare 3D, la quale ha gia evidenziato la sua capacita di
simulare ambienti acustici complessi in larga scala. La DMW 3D e troppo aamata
di risorse per la simulazione in tempo reale di ambienti 3D di dimensioni accetta-
bili. Ancorche una decimazione possa aiutare a ridurre il carico computazionale
sulla CPU, un'alternativa piu eciente e quella di adoperare un modello 2D che,
conseguentemente, simula una membrana. Sebbene suoni meno naturale delle sim-
ulazioni in 3D, lo spazio acustico bidimensionale risultante presenta proprieta simili
specialmente rispetto alla resa della profondita.
Questo lavoro di ricerca dimostra anche che l'acustica virtuale permette di plas-
mare la percezione della distanza e, in particolare, di compensare la nota com-
pressione delle stime soggettive di distanza. A tale scopo si e proposta una DWM
bidimensionale trapezoidale come ambiente virtuale capace di fornire una relazione
lineare tra distanza sica e percepita. Sono stati poi condotti tre test d'ascolto per
misurarne la linearita. Peraltro essi hanno dato vita a una nuova procedura di test
che deriva dal test MUSHRA, adatta a condurre un confronto diretto di distanze
multiple. Nello specico essa riduce la variabilita della risposta a confronto della
procedura di stima di grandezze dirette.
Le implementazioni in tempo reale della DWM 2D rettangolare sono state re-
alizzate in forma di oggetti \external" per Max/MSP. Il primo external per-
mette di rendere una o piu sorgenti acustiche statiche dislocate a diverse dis-
tanze dall'ascoltatore, mentre il secondo external simula una sorgente sonora in
movimento lungo la dimensione della profondita, una sorgente cioe in avvicina-
mento/allontanamento.
Come applicazione del primo external e stata proposta un'interfaccia audio-tattile.
L'interfaccia tattile comprende un sensore lineare di posizione fatto di materiale
conduttivo. La posizione del tocco sulla fascetta viene mappata sulla posizione
d'ascolto di una membrana virtuale rettangolare modellata dalla DWM 2D, la
quale fornisce indizi di profondita per quattro sorgenti egualmente spaziate. In ag-
giunta a cio si dopera la manopola di un controller MIDI per variare la posizione
della membrana lungo l'elenco dei suoni, permettendo cos di passare in rassegna
l'intero insieme di suoni muovendosi avanti e indietro lungo la nestra audio cos-
tituita dalla membrana virtuale. I soggetti coinvolti nella valutazione d'uso hanno
avuto successo nel trovare tutti i le audio deniti come target, cos come giudicato
l'interfaccia intuitiva e gradevole. Inoltre e stata realizzata un'altra dimostrazione
dell'interfaccia audio-tattile adoperando modelli sici per il suono. Suoni di es-
perienza quotidiana derivanti da eventi quali \friggere", \bussare", \sgocciolare"
sono stati adoperati in modo che sia la creazione del suono che la sua resa in profon-
dita fossero il risultato di una sintesi per modelli sici, ipotizzando che l'approccio
di tipo ecologico potesse fornire un'interazione di tipo intuitivo.
Inne, \DepThrow" e un gioco audio basato sull'utilizzo della DWM 2D per ren-
dere indizi di profondita di una sorgente acustica dinamica. Il gioco consiste nel
lanciare una palla virtuale, modellata da un modello sico di suoni di rotolamento,
all'interno di un tubo virtuale inclinato e aperto alle estremita, modellato da una
DWM 2D. L'obiettivo e fare rotolare la palla quanto piu in la nel tubo senza farla
cadere all'estremita lontana. Dimostrato come un gioco, questo prototipo e stato
pensato anche come strumento per condurre indagini sulla percezione della dis-
tanza dinamica. I risultati preliminari di un test d'ascolto condotto sulla percezione
della distanza variabile all'interno del tubo virtuale, hanno mostrato che la durata
del rotolamento della palla in
uenza la stima della distanza raggiunta.In our appreciation of auditory environments, distance perception is as crucial as
lateralization. Although research work has been carried out on distance percep-
tion, modern auditory displays do not yet take advantage of it to provide additional
information on the spatial layout of sound sources and as a consequence enrich
their content and quality. When designing a spatial auditory display, one must
take into account the goal of the given application and the resources available in
order to choose the optimal approach. In particular, rendering auditory perspec-
tive provides a hierarchical ordering of sound sources and allows to focus the user
attention on the closest sound source. Besides, when visual data are no longer
available, either because they are out of the visual eld or the user is in the dark,
or should be avoided to reduce the load of visual attention, auditory rendering
must convey all the spatial information, including distance. The present research
work aims at studying auditory depth (i.e. sound sources displayed straight ahead
of the listener) in terms of perception, rendering and applications in human com-
puter interaction.
First, an overview is given of the most important aspects of auditory distance
perception. Investigations on depth perception are much more advanced in vision
since they already found applications in computer graphics. Then it seems nat-
ural to give the same information in the auditory domain to increase the degree
of realism of the overall display. Depth perception may indeed be facilitated by
combining both visual and auditory cues. Relevant results from past literature on
audio-visual interaction eects are reported, and two experiments were carried out
on the perception of audio-visual depth. In particular, the in
uence of auditory
cues on the perceived visual layering in depth was investigated. Results show that
auditory intensity manipulation does not aect the perceived order in depth, which
is most probably due to the lack of multisensory integration. Besides, the second
experiment, which introduced a delay between the two auditory-visual stimuli, re-
vealed an eect of the temporal order of the two visual stimuli.
Among existing techniques for sound source spatialization along the depth di-
mension, a previous study proposed the modeling of a virtual pipe, based on
the exaggeration of reverberation in such an environment. The design strategy
follows a physics-based modeling approach and makes use of a 3D rectangular
Digital Waveguide Mesh (DWM), which had already shown its ability to simulate
complex, large-scale acoustical environments. The 3D DWM resulted to be too
resource consuming for real-time simulations of 3D environments of decent size.
While downsampling may help in reducing the CPU processing load, a more ef-
cient alternative is to use a model in 2D, consequently simulating a membrane.
Although sounding less natural than 3D simulations, the resulting bidimensional
audio space presents similar properties, especially for depth rendering.
The research work has also shown that virtual acoustics allows to shape depth
perception and in particular to compensate for the usual compression of distance
estimates. A trapezoidal bidimensional DWM is proposed as a virtual environment
able to provide a linear relationship between perceived and physical distance. Three
listening tests were conducted to assess the linearity. They also gave rise to a new
test procedure deriving from the MUSHRA test and which is suitable for direct
comparison of multiple distances. In particular, it reduces the response variability
in comparison with the direct magnitude estimation procedure.
Real-time implementations of the rectangular 2D DWM have been realized as
Max/MSP external objects. The rst external allows to render in depth one or
more static sound sources located at dierent distances from the listener, while
the second external simulates one moving sound source along the depth dimension,
i.e. an approaching/receding source.
As an application of the rst external, an audio-tactile interface for sound naviga-
tion has been proposed. The tactile interface includes a linear position sensor made
by conductive material. The touch position on the ribbon is mapped onto the lis-
tening position on a rectangular virtual membrane, modeled by the 2D DWM and
providing depth cues of four equally spaced sound sources. Furthermore the knob
of a MIDI controller controls the position of the mesh along the playlist, which
allows to browse a whole set of les by moving back and forth the audio window
resulting from the virtual membrane. Subjects involved in a user study succeeded
in nding all the target les, and found the interface intuitive and entertaining.
Furthermore, another demonstration of the audio-tactile interface was realized,
using physics-based models of sounds. Everyday sounds of \frying", \knocking"
and \liquid dripping" are used such that both sound creation and depth rendering
are physics-based. It is believed that this ecological approach provides an intuitive
interaction.
Finally, \DepThrow" is an audio game, based on the use of the 2D DWM to render
depth cues of a dynamic sound source. The game consists in throwing a virtual
ball (modeled by a physics-based model of rolling sound) inside a virtual tube
(modeled by a 2D DWM) which is open-ended and tilted. The goal is to make the
ball roll as far as possible in the tube without letting it fall out at the far end.
Demonstrated as a game, this prototype is also meant to be a tool for investi-
gations on the perception of dynamic distance. Preliminary results of a listening
test on the perception of distance motion in the virtual tube showed that duration
of the ball's movement in
uences the estimation of the distance reached by the
rolling ball
Interactively skimming recorded speech
Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1994.Includes bibliographical references (p. 143-156).Barry Michael Arons.Ph.D
Comparaison et combinaison de rendus visuels et sonores pour la conception d'interfaces homme-machine (des facteurs humains aux stratégies de présentation à base de distorsion.)
Bien que de plus en plus de données sonores et audiovisuelles soient disponibles, la majorité des interfaces qui permettent d y accéder reposent uniquement sur une présentation visuelle. De nombreuses techniques de visualisation ont déjà été proposées utilisant une présentation simultanée de plusieurs documents et des distorsions permettant de mettre en relief l information plus pertinente. Nous proposons de définir des équivalents auditifs pour la présentation de plusieurs fichiers sonores en concurrence, et de combiner de façon optimale les stratégies audio et visuelles pour la présentation de documents multimédia. Afin d adapter au mieux ces stratégies à l utilisateur, nous avons dirigé nos recherches sur l étude des processus perceptifs et attentionnels impliqués dans l écoute et l observation d objets audiovisuels concurrents, en insistant sur les interactions entre les deux modalités.Exploitant les paramètres de taille visuelle et de volume sonore, nous avons étendu le concept de lentille grossissante, utilisée dans les méthodes focus+contexte visuelles, aux modalités auditive et audiovisuelle. A partir de ce concept, une application de navigation dans une collection de documents vidéo a été développée. Nous avons comparé notre outil à un autre mode de rendu dit de Pan&Zoom à travers une étude d utilisabilité. Les résultats, en particulier subjectifs, encouragent à poursuivre vers des stratégies de présentation multimodales associant un rendu audio aux rendus visuels déjà disponibles.Une seconde étude a concerné l identification de sons d environnement en milieu bruité en présence d un contexte visuel. Le bruit simule la présence de plusieurs sources sonores simultanées telles qu on pourrait les retrouver dans une interface où les documents audio et audiovisuels sont présentés ensemble. Les résultats de cette expérience ont confirmé l avantage de la multimodalité en condition de dégradation. De plus, au-delà des buts premiers de la thèse, l étude a confirmé l importance de la congruence sémantique entre les composantes visuelle et sonore pour la reconnaissance d objets et a permis d approfondir les connaissances sur la perception auditive des sons d environnement.Finalement, nous nous sommes intéressée aux processus attentionnels impliqués dans la recherche d un objet parmi plusieurs, en particulier au phénomène de pop-out par lequel un objet saillant attire l attention automatiquement. En visuel, un objet net attire l attention au milieu d objets flous et certaines stratégies de présentation visuelle exploitent déjà ce paramètre visuel. Nous avons alors étendu la notion de flou aux modalités auditives et audiovisuelles par analogie. Une série d expériences perceptives a confirmé qu un objet net parmi des objets flous attire l attention, quelle que soit la modalité. Les processus de recherche et d identification sont alors accélérés quand l indice de netteté correspond à la cible, mais ralentis quand il s agit d un distracteur, mettant ainsi en avant un phénomène de guidage involontaire. Concernant l interaction intermodale, la combinaison redondante des flous audio et visuel s est révélée encore plus efficace qu une présentation unimodale. Les résultats indiquent aussi qu une combinaison optimale n implique pas d appliquer obligatoirement une distorsion sur les deux modalités.Although more and more sound and audiovisual data are available, the majority of access interfaces are solely based on a visual presentation. Many visualization techniques have been proposed that use simultaneous presentation of multiple documents and distortions to highlight the most relevant information. We propose to define equivalent audio technique for the presentation of several competing sound files, and optimally combine such audio and visual presentation strategies for multimedia documents. To better adapt these strategies to the user, we studied attentional and perceptual processes involved in listening and watching simultaneous audio-visual objects, focusing on the interactions between the two modalities.Combining visual size and sound level parameters, we extended the visual concept of magnifying lens to auditory and audiovisual modalities. Exploiting this concept, a navigation application in a video collection has been developed. We compared our tool with another rendering mode called Pan & Zoom through a usability study. Results, especially subjective results, encourage further research to develop multimodal presentation strategies by combining an audio rendering to the visual renderings already available.A second study concerned the identification of environmental sounds in a noisy environment in the presence of a visual context. The noise simulated the presence of multiple competing sounds as would be observed in an interface where several multimedia documents are presented together. The experimental results confirmed the multimodality advantage in condition of audio degradation. Moreover, beyond the primary goals of the thesis, this study confirms the importance of the semantic congruency between visual and auditory components for object recognition and provides deeper knowledge about the auditory perception of environmental sounds.Finally, we investigated attentional processes involved in the search of a specific object among many, especially the pop-out phenomenon whereby a salient object automatically attracts attention. In vision, an sharp object attracts attention among blurred objects and some visual strategies already exploit this parameter to display the information. We extended by analogy the concept of visual blur to auditory and audiovisual modalities. A serie of experiments confirmed that a perceptual object among blurred objects attracts attention, regardless of the modality. The identification and search process is then accelerated when the sharpness parameter is applied to the target, but slow when it is applied to a distractor. These results highlight an involuntary attraction effect. Concerning the crossmodal interaction, a redundant combination of audio and visual blur proved to be more effective than a unimodal presentation. Results also indicate that optimal combination does not necessarily require a distortion of both modalities.PARIS11-SCD-Bib. électronique (914719901) / SudocSudocFranceF