1,594 research outputs found

    Acoustic Speaker Localization with Strong Reverberation and Adaptive Feature Filtering with a Bayes RFS Framework

    Get PDF
    The thesis investigates the challenges of speaker localization in presence of strong reverberation, multi-speaker tracking, and multi-feature multi-speaker state filtering, using sound recordings from microphones. Novel reverberation-robust speaker localization algorithms are derived from the signal and room acoustics models. A multi-speaker tracking filter and a multi-feature multi-speaker state filter are developed based upon the generalized labeled multi-Bernoulli random finite set framework. Experiments and comparative studies have verified and demonstrated the benefits of the proposed methods

    3D Time-Based Aural Data Representation Using D4 Library’s Layer Based Amplitude Panning Algorithm

    Get PDF
    Presented at the 22nd International Conference on Auditory Display (ICAD-2016)The following paper introduces a new Layer Based Amplitude Panning algorithm and supporting D4 library of rapid prototyping tools for the 3D time-based data representation using sound. The algorithm is designed to scale and support a broad array of configurations, with particular focus on High Density Loudspeaker Arrays (HDLAs). The supporting rapid prototyping tools are designed to leverage oculocentric strategies to importing, editing, and rendering data, offering an array of innovative approaches to spatial data editing and representation through the use of sound in HDLA scenarios. The ensuing D4 ecosystem aims to address the shortcomings of existing approaches to spatial aural representation of data, offers unique opportunities for furthering research in the spatial data audification and sonification, as well as transportable and scalable spatial media creation and production

    Efficacy of Multichannel Audio Versus Stereo in Word Recall

    Get PDF
    This study reports on an experiment testing the efficacy of multichannel audio compared to stereo, or binaural, audio in terms of word recall. When asked to single out and recall words from multiple others, Subjects can focus on and recall no more than one at a time, and perform much worse when more than two words are played at once. Subjects recalled words with an accuracy of about 70%, and displayed increased caution and less confidence when presented with a complicated test prior to an easier one

    A comparison of two auditory front-end models for horizontal localization of concurrent speakers in adverse acoustic scenarios

    Get PDF
    Ears are complex instruments which help humans understand what is happening around them. By using two ears, a person can focus his attention on a specific sound source. The first auditory models appeared in literature in the previous century; nowadays, new approaches extend previous findings. Extensive research has been carried out through the years, but many details of the auditory processing remain unclear. In this thesis, two auditory models will be analyzed and compared

    A binaural grouping model for predicting speech intelligibility in multitalker environments

    Get PDF
    Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model.R01 DC000100 - NIDCD NIH HH
    • …
    corecore