1,095 research outputs found

    A binaural grouping model for predicting speech intelligibility in multitalker environments

    Get PDF
    Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model.R01 DC000100 - NIDCD NIH HH

    Acoustic source separation based on target equalization-cancellation

    Full text link
    Normal-hearing listeners are good at focusing on the target talker while ignoring the interferers in a multi-talker environment. Therefore, efforts have been devoted to build psychoacoustic models to understand binaural processing in multi-talker environments and to develop bio-inspired source separation algorithms for hearing-assistive devices. This thesis presents a target-Equalization-Cancellation (target-EC) approach to the source separation problem. The idea of the target-EC approach is to use the energy change before and after cancelling the target to estimate a time-frequency (T-F) mask in which each entry estimates the strength of target signal in the original mixture. Once the mask is calculated, it is applied to the original mixture to preserve the target-dominant T-F units and to suppress the interferer-dominant T-F units. On the psychoacoustic modeling side, when the output of the target-EC approach is evaluated with the Coherence-based Speech Intelligibility Index (CSII), the predicted binaural advantage closely matches the pattern of the measured data. On the application side, the performance of the target-EC source separation algorithm was evaluated by psychoacoustic measurements using both a closed-set speech corpus and an open-set speech corpus, and it was shown that the target-EC cue is a better cue for source separation than the interaural difference cues

    Subjective listening experiments for annoyance investigation

    Get PDF
    Noise limits and guidelines that consider only the sound pressure level or the loudness of noises are not efficient in protecting people from all the adverse effects of noise. Other physical characteristics, e.g., tonality, modulation, and frequency content, should also be considered, especially when the noise level is low and it cannot cause hearing risk, but might lead to annoyance and disturbance. Annoying noises have an impact on health and well-being, but this impact and its relationship with the physical properties have not been sufficiently studied. Subjective annoyance caused by noises like those we experience in living spaces and offices should be further investigated via psychoacoustic laboratory experiments. The primary aim of this work was to develop systematic, effective, and reliable methodology to perform this type of psychoacoustic tests. The secondary aim was to investigate the objective metrics that best predict subjective annoyance in four typical noise conditions: ventilation noise in office spaces, traffic noise in homes, neighbors’ noise in homes, and noises with tonal components in homes. The main result was the development of the methodology, which in turn enabled us to define our own standards and guidelines. Furthermore, we identified the objective metrics that best correlated with subjective annoyance in each one of the four studied noise situations. In offices, five metrics predicted subjective ratings reasonably well. Noise with sound energy at higher frequencies was less tolerated. Noise with a slope of -7 dB per octave band increment resulted in the highest satisfaction. In dwellings, related to neighbors’ living sounds, four metrics of airborne sound insulation performed well to predict annoyance. We demonstrated that 50–80 Hz bands should not be included in the objective rating. In dwellings, related to five types of traffic noise transmitted through façade elements, one metric Rw+C50–3150 performed significantly better than the others. The last experiment proved that tonality is not properly considered in current standards and noise guidelines. The performed psychoacoustic research demonstrated that other physical properties than the sound pressure level should be considered when assessing noise annoyance, and it provided evidence to the objective metrics that would make noise guidelines more efficient with respect to health protection.Subjektiivisia kuuntelukokeita häiritsevyyden tutkimiseksi Melurajat ja ohjeet suojelevat ihmisiä melun haitallisista vaikutuksista, mutta ne ottavat enimmäkseen huomioon vain melun äänenpainetason tai voimakkuuden. Muut fyysiset ominaisuudet, kuten kapeakaistaisuus, modulaatio ja taajuussisältö, joilla on selvä vaikutus subjektiiviseen kokemukseen ja häiritsevyyteen, jätetään usein huomiomatta. Ärsyttävät äänet saattavat noudattaa lakia niiden kielteisistä vaikutuksista huolimatta, koska niiden äänenpainetaso ei ylitä yhtään melurajaa. Asuintilojen ja toimistojen melun aiheuttamaa subjektiivista ärsytystä tulisi tutkia tarkemmin psykoakustisten laboratoriokokeiden avulla. Työn ensisijaisena tavoitteena oli kehittää järjestelmällinen, tehokas ja luotettava menetelmä tämän tyyppisten psykoakustisten testien suorittamiseksi. Lisäksi selvitettiin, mitä muita objektiivisia mittareita, kuin äänenpainetaso tai äänenvoimakkuus, ennustavat parasta subjektiivista ärsytystä ja häiritsevyyttä. Työssä tutkittiin neljää tyypillistä meluolosuhdetta: toimistotilojen ilmanvaihdonääniä, kaupungin liikenteen melua kodeissa, naapurin melua kodeissa, ja kapeakaistaisia komponentteja sisältävää melua. Päätuloksena oli menetelmän kehittäminen, joka mahdollisti omien standardien ja toimintaohjeiden määrittämisen. Lisäksi tunnistettiin objektiiviset mittarit, jotka korreloivat paremmin subjektiivisen häiritsevyyden kanssa kussakin neljästä tutkitusta melutilanteesta. Toimistoissa viisi mittaria ennusti kohtuullisen hyvin subjektiivisia luokituksia. Kohinaa, joka kuului korkeammilla taajuuksilla toimivalla äänenergialla, siedettiin vähemmän. Asunnoissa, kun asumisääniä syntyy naapurin asunnossa, neljä ilmaääneneristysmittaria toimi hyvin ennustamaan asukkaiden subjektiivista ärsytystä. Osoitettiin, että 50–80 Hz: n kaistoja ei pitäisi sisällyttää objektiiviseen luokitukseen. Myös asunnoissa, liittyen viitteen eri liikennemeluun kantautumassa sisätilaan julkisivuelementtien kautta, yksi metrinen Rw+C50–3150 toimi huomattavasti paremmin kuin muut. Viimeinen koe osoitti, että tonaalisuutta ei oteta asianmukaisesti huomioon nykyisissä standardeissa ja meluohjeissa. Tämä tutkimus osoitti, että oikein suoritetut psykoakustiset kokeet tarjoavat laadullista ja määrällistä tietoa subjektiivisesta häiritsevyydestä, ja että näiden tietojen perusteella voidaan määrittää objektiiviset mittarit, jotka tekisivät ohjearvoista tehokkaampia melun haitallisilta vaikutuksilta suojauduttaessa

    Determination and evaluation of clinically efficient stopping criteria for the multiple auditory steady-state response technique

    Get PDF
    Background: Although the auditory steady-state response (ASSR) technique utilizes objective statistical detection algorithms to estimate behavioural hearing thresholds, the audiologist still has to decide when to terminate ASSR recordings introducing once more a certain degree of subjectivity. Aims: The present study aimed at establishing clinically efficient stopping criteria for a multiple 80-Hz ASSR system. Methods: In Experiment 1, data of 31 normal hearing subjects were analyzed off-line to propose stopping rules. Consequently, ASSR recordings will be stopped when (1) all 8 responses reach significance and significance can be maintained for 8 consecutive sweeps; (2) the mean noise levels were ≤ 4 nV (if at this “≤ 4-nV” criterion, p-values were between 0.05 and 0.1, measurements were extended only once by 8 sweeps); and (3) a maximum amount of 48 sweeps was attained. In Experiment 2, these stopping criteria were applied on 10 normal hearing and 10 hearing-impaired adults to asses the efficiency. Results: The application of these stopping rules resulted in ASSR threshold values that were comparable to other multiple-ASSR research with normal hearing and hearing-impaired adults. Furthermore, in 80% of the cases, ASSR thresholds could be obtained within a time-frame of 1 hour. Investigating the significant response-amplitudes of the hearing-impaired adults through cumulative curves indicated that probably a higher noise-stop criterion than “≤ 4 nV” can be used. Conclusions: The proposed stopping rules can be used in adults to determine accurate ASSR thresholds within an acceptable time-frame of about 1 hour. However, additional research with infants and adults with varying degrees and configurations of hearing loss is needed to optimize these criteria

    Investigating computational models of perceptual attack time

    Get PDF
    The perceptual attack time (PAT) is the compensation for differing attack components of sounds, in the case of seeking a perceptually isochronous presentation of sounds. It has applications in scheduling and is related to, but not necessarily the same as, the moment of perceptual onset. This paper describes a computational investigation of PAT over a set of 25 synthesised stimuli, and a larger database of 100 sounds equally divided into synthesised and ecological. Ground truth PATs for modeling were obtained by the alternating presentation paradigm, where subjects adjusted the relative start time of a reference click and the sound to be judged. Whilst fitting experimental data from the 25 sound set was plausible, difficulties with existing models were found in the case of the larger test set. A pragmatic solution was obtained using a neural net architecture. In general, learnt schema of sound classification may be implicated in resolving the multiple detection cues evoked by complex sounds

    Human response to aircraft noise

    Get PDF
    The human auditory system and the perception of sound are discussed. The major concentration is on the annnoyance response and methods for relating the physical characteristics of sound to those psychosociological attributes associated with human response. Results selected from the extensive laboratory and field research conducted on human response to aircraft noise over the past several decades are presented along with discussions of the methodology commonly used in conducting that research. Finally, some of the more common criteria, regulations, and recommended practices for the control or limitation of aircraft noise are examined in light of the research findings on human response

    Calculation of Unsteady Loudness in the Presence of Gaps Through Application of the Multiple Look Theory

    Get PDF
    Experimental studies have shown that for short gaps between 2 to 5 ms, the perceived loudness is higher than for uninterrupted noise presented to the ear. Other studies have also shown that the present temporal integration models for the calculation of time varying loudness do not adequately account for short duration phenomena. It has been proposed that the multiple look approach is a more applicable method for describing these short term circumstances. This approach breaks a sound into small durations or looks having length of 1 ms which allows for the intelligent processing of the looks and decision making depending on the nature of the stimulus. However, present technologies (i.e. FFT) are not adequate to deal with short duration sounds across the entire frequency spectra. A compromised approach is taken here to account for perceived loudness levels for sounds in the presence of gaps while using an integration model. This approach is referred to as a multiple look gap adjustment model. A model and software code was developed to take a recorded sound presented to the ear and process it into individual looks which are then examined for the presence of gaps ranging in length between 1 to 10 ms. If gaps are found, an appropriate gap adjustment is applied to the sound. The modified stimulus is subsequently evaluated for loudness level using a model which relies on temporal integration. The multiple look model was tested using several sounds including mechanical and speech sounds and was found to perform as intended. While recommendations for improvement and further study are included, the application of the model has shown particular merit for perceptional analysis of sounds involving speech

    Temporal Filterbanks in Cochlear Implant Hearing and Deep Learning Simulations

    Get PDF
    The masking phenomenon has been used to investigate cochlear excitation patterns and has even motivated audio coding formats for compression and speech processing. For example, cochlear implants rely on masking estimates to filter incoming sound signals onto an array. Historically, the critical band theory has been the mainstay of psychoacoustic theory. However, masked threshold shifts in cochlear implant users show a discrepancy between the observed critical bandwidths, suggesting separate roles for place location and temporal firing patterns. In this chapter, we will compare discrimination tasks in the spectral domain (e.g., power spectrum models) and the temporal domain (e.g., temporal envelope) to introduce new concepts such as profile analysis, temporal critical bands, and transition bandwidths. These recent findings violate the fundamental assumptions of the critical band theory and could explain why the masking curves of cochlear implant users display spatial and temporal characteristics that are quite unlike that of acoustic stimulation. To provide further insight, we also describe a novel analytic tool based on deep neural networks. This deep learning system can simulate many aspects of the auditory system, and will be used to compute the efficiency of spectral filterbanks (referred to as “FBANK”) and temporal filterbanks (referred to as “TBANK”)

    Optimizing Stimulation Strategies in Cochlear Implants for Music Listening

    Get PDF
    Most cochlear implant (CI) strategies are optimized for speech characteristics while music enjoyment is signicantly below normal hearing performance. In this thesis, electrical stimulation strategies in CIs are analyzed for music input. A simulation chain consisting of two parallel paths, simulating normal hearing conditions and electrical hearing respectively, is utilized. One thesis objective is to congure and develop the sound processor of the CI chain to analyze dierent compression- and channel selection strategies to optimally capture the characteristics of music signals. A new set of knee points (KPs) for the compression function are investigated together with clustering of frequency bands. The N-of-M electrode selection strategy models the eect of a psychoacoustic masking threshold. In order to evaluate the performance of the CI model, the normal hearing model is considered a true reference. Similarity among the resulting neurograms of respective model are measured using the image analysis method Neurogram Similarity Index Measure (NSIM). The validation and resolution of NSIM is another objective of the thesis. Results indicate that NSIM is sensitive to no-activity regions in the neurograms and has diculties capturing small CI changes, i.e. compression settings. Further verication of the model setup is suggested together with investigating an alternative optimal electric hearing reference and/or objective similarity measure
    • …
    corecore