12 research outputs found

    PAMBOX: A Python auditory modeling toolbox

    No full text
    <p>Poster presented at EuroScipy 2014.</p> <p>Toolboxes for modeling auditory perception have a surprisingly long history, starting with the Auditory Toolbox, first written by Malcom Slaney for Mathematica, in 1993, and then ported to Matlab in 1998. Here we present the Python Auditory Modeling Toolbox (PAMBOX), an open-source Python package for auditory modeling. The goal of the toolbox is to provide a collection of components that can be easily combined and extended to solve auditory modeling problems.</p> <p>PAMBOX contains code for modeling cochlear filtering, envelope extraction, as well as modulation processing. The toolbox also includes speech intelligibility models. These models are commonly used to predict how well speech is understood in a given situation, such as in the presence of noise or reverberation. The intelligibility models use a simple and consistent "predict" API, inspired by scikit-learn's "fit and predict" API. This simplifies comparisons across models. PAMBOX also includes a framework for performing intelligibility experiments compatible with IPython.parallel.</p> <p>Models that are not original to PAMBOX are validated against their original implementations, where available. PAMBOX is based on NumPy, SciPy, and Pandas. It is distributed under the Modified BSD License.</p

    The role of across-frequency envelope processing for speech intelligibility

    No full text
    <p>Poster presented at the 21st International Congress on Acoustics in Montreal, in June 2013.</p> <p>Speech intelligibility models consist of a preprocessing part that transforms the stimuli into some internal (auditory) representation, and a decision metric that quantifies effects of transmission channel, speech interferers, and auditory processing on the speech intelligibility. Here, two recent speech intelligibility models, the spectro-temporal modulation index (STMI; Elhilali et al., 2003) and the speech-based envelope power spectrum model (sEPSM; Jo/rgensen and Dau, 2011) were evaluated in conditions of noisy speech subjected to reverberation, and to nonlinear distortions through either a phase jitter process or noise reduction via spectral subtraction. The contributions of the individual preprocessing stages in the models and the role of the decision metrics were analyzed in the different experimental conditions. It is demonstrated that an explicit across-frequency envelope processing stage, as assumed in the STMI, together with the metric based on the envelope power signal-to-noise ratio, as assumed in the sEPSM, are required to account for all three conditions. However, a simple across audio-frequency mechanism combined with a purely temporal modulation filterbank is assumed to be sufficient to describe the data, i.e., a joint two-dimensional modulation filterbank might not be required.</p

    The role of across-frequency envelope processing for speech intelligibility

    No full text
    <p>Poster presented at the 21st International Congress on Acoustics in Montreal, in June 2013.</p> <p>Speech intelligibility models consist of a preprocessing part that transforms the stimuli into some internal (auditory) representation, and a decision metric that quantifies effects of transmission channel, speech interferers, and auditory processing on the speech intelligibility. Here, two recent speech intelligibility models, the spectro-temporal modulation index (STMI; Elhilali et al., 2003) and the speech-based envelope power spectrum model (sEPSM; Jo/rgensen and Dau, 2011) were evaluated in conditions of noisy speech subjected to reverberation, and to nonlinear distortions through either a phase jitter process or noise reduction via spectral subtraction. The contributions of the individual preprocessing stages in the models and the role of the decision metrics were analyzed in the different experimental conditions. It is demonstrated that an explicit across-frequency envelope processing stage, as assumed in the STMI, together with the metric based on the envelope power signal-to-noise ratio, as assumed in the sEPSM, are required to account for all three conditions. However, a simple across audio-frequency mechanism combined with a purely temporal modulation filterbank is assumed to be sufficient to describe the data, i.e., a joint two-dimensional modulation filterbank might not be required.</p

    Predicting masking release of lateralized speech

    No full text
    <p>Poster presented at ISAAR 2015, in Nyborg, DK.</p> <p> </p> <p>"Locsei et al. [2015, Speech in Noise Workshop, Copenhagen, pp.46] measured speech reception thresholds (SRTs) in anechoic conditions where the target speech and the maskers were lateralized using interaural time delays. The maskers were speech-shaped noise (SSN) and reversed babble (RB) with two, four, or 8 talkers. For a given interferer type, the number of maskers presented on the target’s side was varied, such that none, some, or all maskers were presented on the same side as the target. In general, SRTs did not vary significantly when at least one masker was presented on the same side as the target. The largest masking release (MR) was observed when all maskers were on the opposite side of the target. The data could be accounted for using a binaural extension of the sEPSM model [Jørgensen and Dau, J. Acoust. Soc. Am. 130(3), 1475–1487], which uses a short-term equalization–cancellation process to model binaural unmasking. The modeling results suggest that, in these conditions, explicit top-down processing, such as streaming, is not required and that the MR could be fully accounted for by only bottom-up processes. However, independent access to the noisy speech and the noise alone by the model could be considered as implicit streaming and should therefore be taken into account when considering “bottom-up” models."</p
    corecore