30 research outputs found
Annoyance of helicopter-like sounds in urban background noise
Scenarios of urban air mobility see electric vertical take-off and landing aircraft (eVTOLs) operating within cities. Rotorcraft sounds are typically characterised by short bursts of noise, although eVTOLs offer more opportunities for a quieter sound design. We asked participants to compare the annoyance of a reference sequence of bursts of noise with a burst duration of 20 ms with that of a test sequence for which the burst duration was 1 or 5 ms. There were 20 bursts/s. A two-interval, two-alternative forced-choice task and a 1-up/1-down procedure was used. Both sequences were played in background noise that had either the same root-mean-square (RMS) level as the sequence of bursts or 10 dB less. The results were similar to those for loudness: On average, sequences with 1-ms bursts needed 6-8 dB less RMS level to sound equally annoying as the 20-ms bursts, and sequences with 5-ms bursts needed 2-4 dB less. This suggests that psychoacoustic annoyance is mainly explained by loudness and that the RMS level is an insufficient descriptor. Compared between the two background noise levels, the level difference for equal annoyance between short and 20-ms bursts was 1.5 dB larger in the louder background, which was statistically significant
Recommended from our members
Estimation of auditory filter shapes across frequencies using machine learning
When fitting a hearing aid, the level-dependent gain prescribed at each frequency is usually based on the hearing loss at that frequency. This often results in reasonable fittings for a typical cochlear hearing loss, but may fail when the individual frequency selectivity and/or loudness growth are different from what would be typical for that hearing loss. Individualised fitting based on measures of frequency selectivity might be useful in improving a fitting, for example by reducing across-channel masking. A popular measure of frequency selectivity is the notched-noise method, but this test is time-consuming. To reduce testing time, Shen and Richards (2013) proposed an efficient machine-learning test that determines the slope of the skirts of the auditory filter (p), its minimum response for wide notches (r), and detection efficiency (K). However, their test did not determine asymmetries in the auditory filter, which are important to consider during fitting to reduce across-channel masking.
The test proposed here provides a time-efficient way of estimating the auditory filter shape and asymmetry as a function of center frequency. The noise level required for threshold is estimated for a tone with frequency fs presented at 15 dB SL in nine symmetric or asymmetric notched noises with notch edge frequencies between 0.6 and 1.4 fs. Using only narrow to medium notch widths provides good information about the tip of the auditory filter, which is of most importance in determining across-channel masking for speech-like signals (but the tail is not well defined). The nine thresholds for a given fs can be used to fit an auditory filter model with three parameters: the slopes of the lower and upper sides (pl, pu) and K. In practice, these model parameters are estimated as a continuous function of fs, and fs is varied across trials over the range 0.5-4 kHz. The stimulus parameters on a given trial (fs, notch condition, noise level) are chosen to maximally reduce the uncertainty in the model parameters, exploiting the covariance between thresholds for adjacent values of fs.
Six subjects have been tested so far. The whole procedure took about 45 minutes per ear. The lower slopes typically corresponded with values expected from the audiogram and a cochlear hearing loss. The upper slopes were steeper in some cases, although not necessarily across the whole frequency range.
Reference
Shen, Y., and Richards, V. M. (2013). "Bayesian adaptive estimation of the auditory filter," J. Acoust. Soc. Am. 134, 1134-1145.EPSR
Gaussian Processes for hearing threshold estimation using Auditory Brainstem Responses
The Auditory Brainstem Response (ABR) plays an important role in diagnosing and managing hearing loss, but can be challenging and time-consuming to measure. Test times are especially long when multiple ABR measurements are needed, e.g., when estimating hearing threshold at a range of frequencies. While many detection methods have been developed to reduce ABR test times, the majority were designed to detect the ABR at a single stimulus level and do not consider correlations in ABR waveforms across levels. These correlations hold valuable information, and can be exploited for more efficient hearing threshold estimation. This was achieved in the current work using a Gaussian Process (GP), i.e., a Bayesian approach method for non-linear regression. The function to estimate with the GP was the ABR's amplitude across stimulus levels, from which hearing threshold was ultimately inferred. Active learning rules were also designed to automatically adjust the stimulus level and efficiently locate hearing threshold. Simulation results show test time reductions of up to 50% for the GP compared to a sequentially applied Hotelling's T2 test, which does not consider correlations across ABR waveforms. A case study was also included to briefly assess the GP approach in ABR data from an adult volunteer
Loudness of ramped and damped sounds that are temporally shifted across ears
© 2019 Proceedings of the International Congress on Acoustics. All rights reserved. In a previous study we have shown that amplitude-modulated sounds are louder when their modulation is out of phase across the two ears than when it is in phase. The level difference required for equal loudness (LDEL) of sounds with diotic presentation and an interaural modulation phase difference of 180° was about 2 dB. This could be explained by a loudness model where binaural summation lags behind binaural inhibition. The present study investigated the binaural loudness of ramped and damped sounds in a similar manner. Stimuli consisted of trains of 1000-Hz tone pulses with linear rise and fall times with ratios of 1:10 (damped sounds) or 10:1 (ramped sounds). Stimuli contained 28 55-ms pulses, 14 110-ms pulses or 7 220-ms pulses, resulting in a stimulus duration of 1540 ms plus half the pulse duration for the interaurally shifted stimuli. The LDEL between diotic and interaurally shifted stimuli was close to 0 dB for all of these conditions. For a single 220-ms pulse, the LDEL was 1.4 dB for damped sounds, and 3.0 dB for ramped sounds, the diotic sounds being louder. The difference between a single pulse and a pulse train suggests differences between short-term and long-term loudness judgments.EPSR
Discrimination of amplitude-modulation depth by subjects with normal and impaired hearing
The loudness recruitment associated with cochlear hearing loss increases the perceived amount of amplitude modulation (AM), called "fluctuation strength." For normal-hearing (NH) subjects, fluctuation strength "saturates" when the AM depth is high. If such saturation occurs for hearing-impaired (HI) subjects, they may show poorer AM depth discrimination than NH subjects when the reference AM depth is high. To test this hypothesis, AM depth discrimination of a 4-kHz sinusoidal carrier, modulated at a rate of 4 or 16âHz, was measured in a two-alternative forced-choice task for reference modulation depths, , of 0.5, 0.6, and 0.7. AM detection was assessed using =â0. Ten older HI subjects, and five young and five older NH subjects were tested. Psychometric functions were measured using five target modulation depths for each . For AM depth discrimination, the HI subjects performed more poorly than the NH subjects, both at 30âdB sensation level (SL) and 75âdB sound pressure level (SPL). However, for AM detection, the HI subjects performed better than the NH subjects at 30âdB SL; there was no significant difference between the HI and NH groups at 75âdB SPL. The results for the NH subjects were not affected by age.This work was supported by the Engineering and Physical Sciences Research Council (UK, Grant No. RG78536)
Development of a Deep Neural Network for Speeding Up a Model of Loudness for Time-Varying Sounds
The âtime-varying loudnessâ (TVL) model of Glasberg and Moore calculates âinstantaneous loudnessâ every 1 ms, and this is used to generate predictions of short-term loudness, the loudness of a short segment of sound, such as a word in a sentence, and of long-term loudness, the loudness of a longer segment of sound, such as a whole sentence. The calculation of instantaneous loudness is computationally intensive and real-time implementation of the TVL model is difficult. To speed up the computation, a deep neural network (DNN) was trained to predict instantaneous loudness using a large database of speech sounds and artificial sounds (tones alone and tones in white or pink noise), with the predictions of the TVL model as a reference (providing the âcorrectâ answer, specifically the loudness level in phons). A multilayer perceptron with three hidden layers was found to be sufficient, with more complex DNN architecture not yielding higher accuracy. After training, the deviations between the predictions of the TVL model and the predictions of the DNN were typically less than 0.5 phons, even for types of sounds that were not used for training (music, rain, animal sounds, and washing machine). The DNN calculates instantaneous loudness over 100 times more quickly than the TVL model. Possible applications of the DNN are discussed
Tonotopic representation of loudness in the human cortex
A prominent feature of the auditory system is that neurons show tuning to audio frequency; each neuron has a characteristic frequency (CF) to which it is most sensitive. Furthermore, there is an orderly mapping of CF to position, which is called tonotopic organization and which is observed at many levels of the auditory system. In a previous study (Thwaites et al., 2016) we examined cortical entrainment to two auditory transforms predicted by a model of loudness, instantaneous loudness and short-term loudness, using speech as the input signal. The model is based on the assumption that neural activity is combined across CFs (i.e. across frequency channels) before the transform to short-term loudness. However, it is also possible that short-term loudness is determined on a channel-specific basis. Here we tested these possibilities by assessing neural entrainment to the overall and channel-specific instantaneous loudness and the overall and channel-specific short-term loudness. The results showed entrainment to channel-specific instantaneous loudness at latencies of 45 and 100 ms (bilaterally, in and around Heschl's gyrus). There was entrainment to overall instantaneous loudness at 165 ms in dorso-lateral sulcus (DLS). Entrainment to overall short-term loudness occurred primarily at 275 ms, bilaterally in DLS and superior temporal sulcus. There was only weak evidence for entrainment to channel-specific short-term loudness.This work was supported by an ERC Advanced Grant (230570, âNeurolexâ) to WMW, by MRC Cognition and Brain Sciences Unit (CBU) funding to WMW (U.1055.04.002.00001.01), and by EPSRC grant RG78536 to JS and BM
Recommended from our members
Loudness of ramped and damped sounds that are temporally shifted across ears
© 2019 Proceedings of the International Congress on Acoustics. All rights reserved. In a previous study we have shown that amplitude-modulated sounds are louder when their modulation is out of phase across the two ears than when it is in phase. The level difference required for equal loudness (LDEL) of sounds with diotic presentation and an interaural modulation phase difference of 180° was about 2 dB. This could be explained by a loudness model where binaural summation lags behind binaural inhibition. The present study investigated the binaural loudness of ramped and damped sounds in a similar manner. Stimuli consisted of trains of 1000-Hz tone pulses with linear rise and fall times with ratios of 1:10 (damped sounds) or 10:1 (ramped sounds). Stimuli contained 28 55-ms pulses, 14 110-ms pulses or 7 220-ms pulses, resulting in a stimulus duration of 1540 ms plus half the pulse duration for the interaurally shifted stimuli. The LDEL between diotic and interaurally shifted stimuli was close to 0 dB for all of these conditions. For a single 220-ms pulse, the LDEL was 1.4 dB for damped sounds, and 3.0 dB for ramped sounds, the diotic sounds being louder. The difference between a single pulse and a pulse train suggests differences between short-term and long-term loudness judgments.EPSR
Recommended from our members
Development of a Deep Neural Network for Speeding Up a Model of Loudness for Time-Varying Sounds
The âtime-varying loudnessâ (TVL) model of Glasberg and Moore calculates âinstantaneous loudnessâ every 1 ms, and this is used to generate predictions of short-term loudness, the loudness of a short segment of sound, such as a word in a sentence, and of long-term loudness, the loudness of a longer segment of sound, such as a whole sentence. The calculation of instantaneous loudness is computationally intensive and real-time implementation of the TVL model is difficult. To speed up the computation, a deep neural network (DNN) was trained to predict instantaneous loudness using a large database of speech sounds and artificial sounds (tones alone and tones in white or pink noise), with the predictions of the TVL model as a reference (providing the âcorrectâ answer, specifically the loudness level in phons). A multilayer perceptron with three hidden layers was found to be sufficient, with more complex DNN architecture not yielding higher accuracy. After training, the deviations between the predictions of the TVL model and the predictions of the DNN were typically less than 0.5 phons, even for types of sounds that were not used for training (music, rain, animal sounds, and washing machine). The DNN calculates instantaneous loudness over 100 times more quickly than the TVL model. Possible applications of the DNN are discussed