Search CORE

6,895 research outputs found

Spherical microphone array acoustic rake receivers

Author: Javed HA
Moore AH
Naylor PA
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/12/2015
Field of study

Several signal independent acoustic rake receivers are proposed for speech dereverberation using spherical microphone arrays. The proposed rake designs take advantage of multipaths, by separately capturing and combining early reflections with the direct path. We investigate several approaches in combining reflections with the direct path source signal, including the development of beam patterns that point nulls at all preceding reflections. The proposed designs are tested in experimental simulations and their dereverberation performances evaluated using objective measures. For the tested configuration, the proposed designs achieve higher levels of dereverberation compared to conventional signal independent beamforming systems; achieving up to 3.6 dB improvement in the direct-to-reverberant ratio over the plane-wave decomposition beamformer

Spiral - Imperial College Digital Repository

Technical aspects of a demonstration tape for three-dimensional sound displays

Author: Begault Durand R.
Wenzel Elizabeth M.
Publication venue
Publication date
Field of study

This document was developed to accompany an audio cassette that demonstrates work in three-dimensional auditory displays, developed at the Ames Research Center Aerospace Human Factors Division. It provides a text version of the audio material, and covers the theoretical and technical issues of spatial auditory displays in greater depth than on the cassette. The technical procedures used in the production of the audio demonstration are documented, including the methods for simulating rotorcraft radio communication, synthesizing auditory icons, and using the Convolvotron, a real-time spatialization device

NASA Technical Reports Server

Egocentric Auditory Attention Localization in Conversations

Author: Ithapu Vamsi Krishna
Jiang Hao
Rehg James M.
Ryan Fiona
Shukla Abhinav
Publication venue
Publication date: 28/03/2023
Field of study

In a noisy conversation environment such as a dinner party, people often exhibit selective auditory attention, or the ability to focus on a particular speaker while tuning out others. Recognizing who somebody is listening to in a conversation is essential for developing technologies that can understand social behavior and devices that can augment human hearing by amplifying particular sound sources. The computer vision and audio research communities have made great strides towards recognizing sound sources and speakers in scenes. In this work, we take a step further by focusing on the problem of localizing auditory attention targets in egocentric video, or detecting who in a camera wearer's field of view they are listening to. To tackle the new and challenging Selective Auditory Attention Localization problem, we propose an end-to-end deep learning approach that uses egocentric video and multichannel audio to predict the heatmap of the camera wearer's auditory attention. Our approach leverages spatiotemporal audiovisual features and holistic reasoning about the scene to make predictions, and outperforms a set of baselines on a challenging multi-speaker conversation dataset. Project page: https://fkryan.github.io/saa

arXiv.org e-Print Archive

Recommended from our members

Cross-modal extinction in a boy with severely autistic behaviour and high verbal intelligence

Author: Adini Y
Akshoomoff N
Belmonte MK
Bonneh YS
Houde JF
Iversen PE
Kenet T
Merzenich MM
Moore CI
Pei F
Simon HJ
Publication venue: 'Informa UK Limited'
Publication date: 23/07/2008
Field of study

Anecdotal reports from individuals with autism suggest a loss of awareness to stimuli from one modality in the presence of stimuli from another. Here we document such a case in a detailed study of T.M., a 13-year-old boy with autism in whom significant autistic behaviors are combined with an uneven IQ profile of superior verbal and low performance abilities. Although T.M.'s speech is often unintelligible and his behavior is dominated by motor stereotypies and impulsivity, he can communicate by typing or pointing independently within a letter board. A series of experiments using simple and highly salient visual, auditory, and tactile stimuli demonstrated a hierarchy of cross-modal extinction, in which auditory information extinguished other modalities at various levels of processing. T.M. also showed deficits in shifting and sustaining attention. These results provide evidence for mono-channel perception in autism and suggest a general pattern of winner-takes-all processing in which a stronger stimulus-d riven representation dominates behavior, extinguishing weaker representations

Nottingham Trent Institutional Repository (IRep)

Real-time Microphone Array Processing for Sound-field Analysis and Perceptually Motivated Reproduction

Author: McCormack Leo
Publication venue
Publication date: 11/12/2017
Field of study

This thesis details real-time implementations of sound-field analysis and perceptually motivated reproduction methods for visualisation and auralisation purposes. For the former, various methods for visualising the relative distribution of sound energy from one point in space are investigated and contrasted; including a novel reformulation of the cross-pattern coherence (CroPaC) algorithm, which integrates a new side-lobe suppression technique. Whereas for auralisation applications, listening tests were conducted to compare ambisonics reproduction with a novel headphone formulation of the directional audio coding (DirAC) method. The results indicate that the side-lobe suppressed CroPaC method offers greater spatial selectivity in reverberant conditions compared with other popular approaches, and that the new DirAC formulation yields higher perceived spatial accuracy when compared to the ambisonics method

Aaltodoc Publication Archive

Cognitive performance in open-plan office acoustic simulations: Effects of room acoustics and semantics but not spatial separation of sound sources

Author: Fels Janina
Georgi Markus
Klatte Maria
Leist Larissa
Schlittmeier Sabine J.
Yadav Manuj
Publication venue
Publication date: 08/08/2023
Field of study

The irrelevant sound effect (ISE) characterizes short-term memory performance impairment during irrelevant sounds relative to quiet. Irrelevant sound presentation in most laboratory-based ISE studies has been rather limited to represent complex scenarios including open-plan offices (OPOs) and not many studies have considered serial recall of heard information. This paper investigates ISE using an auditory-verbal serial recall task, wherein performance was evaluated for relevant factors in simulating OPO acoustics: the irrelevant sounds including the semanticity of speech, reproduction methods over headphones, and room acoustics. Results (Experiments 1 and 2) show that ISE was exhibited in most conditions with anechoic (irrelevant) nonspeech sounds with/without speech, but the effect was substantially higher with meaningful speech compared to foreign speech, suggesting a semantic effect. Performance differences in conditions with diotic and binaural reproductions were not statistically robust, suggesting limited role of spatial separation of sources. In Experiment 3, statistically robust ISE were exhibited for binaural room acoustic conditions with mid-frequency reverberation times, T30 (s) = 0.4, 0.8, 1.1, suggesting cognitive impairment regardless of sound absorption representative of OPOs. Performance differences in T30 = 0.4 s relative to T30 = 0.8 and 1.1 s conditions were statistically robust. This emphasizes the benefits for cognitive performance with increased sound absorption, reinforcing extant room acoustic design recommendations. Performance differences in T30 = 0.8 s vs. 1.1 s were not statistically robust. Collectively, these results suggest that certain findings from ISE studies with idiosyncratic acoustics may not translate well to complex OPO acoustic environments

arXiv.org e-Print Archive

Meta-analyses support a taxonomic model for representations of different categories of audio-visual interaction events in the human brain

Author: Brefczynski-Lewis Julie A.
Csonka Matt
Frum Chris
Lewis James W.
Mardmomen Nadia
Webster Paula J.
Publication venue: The Research Repository @ WVU
Publication date: 01/01/2021
Field of study

Our ability to perceive meaningful action events involving objects, people and other animate agents is characterized in part by an interplay of visual and auditory sensory processing and their cross-modal interactions. However, this multisensory ability can be altered or dysfunctional in some hearing and sighted individuals, and in some clinical populations. The present meta-analysis sought to test current hypotheses regarding neurobiological architectures that may mediate audio-visual multisensory processing. Reported coordinates from 82 neuroimaging studies (137 experiments) that revealed some form of audio-visual interaction in discrete brain regions were compiled, converted to a common coordinate space, and then organized along specific categorical dimensions to generate activation likelihood estimate (ALE) brain maps and various contrasts of those derived maps. The results revealed brain regions (cortical “hubs”) preferentially involved in multisensory processing along different stimulus category dimensions, including (1) living versus non-living audio-visual events, (2) audio-visual events involving vocalizations versus actions by living sources, (3) emotionally valent events, and (4) dynamic-visual versus static-visual audio-visual stimuli. These meta-analysis results are discussed in the context of neurocomputational theories of semantic knowledge representations and perception, and the brain volumes of interest are available for download to facilitate data interpretation for future neuroimaging studies

The Research Repository @ WVU (West Virginia University)

Multisensory perception of affect, its time course and its neural basis

Author: de Gelder B.
Vroomen J.
Publication venue: 'WARC Limited'
Publication date: 01/01/2004
Field of study

Tilburg University Repository

Multisensory Motion Perception in 3\u20134 Month-Old Infants

Author: Ashmead
Baart
Bahrick
Bahrick
Bahrick
Bahrick
Bahrick
Bahrick
Bertenthal
Bremner
Bremner
Csibra
De Hevia
Dolscheid
Filippetti
Gogate
Gogate
Johnson
Kellman
Lewkowicz
Lewkowicz
Lewkowicz
Lewkowicz
Nava
Otsuka
Otsuka
Parise
Pickens
Pitteri
Rochat
Rose
Rusconi
Sann
Schlack
Shepard
Streri
Streri
Streri
Tomalski
Walker
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2017
Field of study

Human infants begin very early in life to take advantage of multisensory information by extracting the invariant amodal information that is conveyed redundantly by multiple senses. Here we addressed the question as to whether infants can bind multisensory moving stimuli, and whether this occurs even if the motion produced by the stimuli is only illusory. Three- to 4-month-old infants were presented with two bimodal pairings: visuo-tactile and audio-visual. Visuo-tactile pairings consisted of apparently vertically moving bars (the Barber Pole illusion) moving in either the same or opposite direction with a concurrent tactile stimulus consisting of strokes given on the infant\u2019s back. Audio-visual pairings consisted of the Barber Pole illusion in its visual and auditory version, the latter giving the impression of a continuous rising or ascending pitch. We found that infants were able to discriminate congruently (same direction) vs. incongruently moving (opposite direction) pairs irrespective of modality (Experiment 1). Importantly, we also found that congruently moving visuo-tactile and audio-visual stimuli were preferred over incongruently moving bimodal stimuli (Experiment 2). Our findings suggest that very young infants are able to extract motion as amodal component and use it to match stimuli that only apparently move in the same direction

Crossref

Archivio istituzionale della ricerca - Università di Padova

Optimality and limitations of audio-visual integration for cognitive systems

Author: Boyce William
Lindsay Anthony
Rano Inaki
Wong-Lin KongFatt
Zgonnikov Arkady
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2020
Field of study

Multimodal integration is an important process in perceptual decision-making. In humans, this process has often been shown to be statistically optimal, or near optimal: sensory information is combined in a fashion that minimizes the average error in perceptual representation of stimuli. However, sometimes there are costs that come with the optimization, manifesting as illusory percepts. We review audio-visual facilitations and illusions that are products of multisensory integration, and the computational models that account for these phenomena. In particular, the same optimal computational model can lead to illusory percepts, and we suggest that more studies should be needed to detect and mitigate these illusions, as artifacts in artificial cognitive systems. We provide cautionary considerations when designing artificial cognitive systems with the view of avoiding such artifacts. Finally, we suggest avenues of research toward solutions to potential pitfalls in system design. We conclude that detailed understanding of multisensory integration and the mechanisms behind audio-visual illusions can benefit the design of artificial cognitive systems.Human-Robot Interactio

TU Delft Repository

Ulster University's Research Portal