70 research outputs found

    The impact of spectrally asynchronous delay on the intelligibility of conversational speech

    Get PDF
    Conversationally spoken speech is rampant with rapidly changing and complex acoustic cues that individuals are able to hear, process, and encode to meaning. For many hearing-impaired listeners, a hearing aid is necessary to hear these spectral and temporal acoustic cues of speech. For listeners with mild-moderate high frequency sensorineural hearing loss, open-fit digital signal processing (DSP) hearing aids are the most common amplification option. Open-fit DSP hearing aids introduce a spectrally asynchronous delay to the acoustic signal by allowing audible low frequency information to pass to the eardrum unimpeded while the aid delivers amplified high frequency sounds to the eardrum that has a delayed onset relative to the natural pathway of sound. These spectrally asynchronous delays may disrupt the natural acoustic pattern of speech. The primary goal of this study is to measure the effect of spectrally asynchronous delay on the intelligibility of conversational speech by normal-hearing and hearing-impaired listeners. A group of normal-hearing listeners (n = 25) and listeners with mild-moderate high frequency sensorineural hearing loss (n = 25) participated in this study. The acoustic stimuli included 200 conversationally-spoken recordings of the low predictability sentences from the revised speech perception in noise test (r-SPIN). These 200 sentences were modified to control for audibility for the hearing-impaired group and so that the acoustic energy above 2 kHz was delayed by either 0 ms (control), 4ms, 8ms, or 32 ms relative to the low frequency energy. The data were analyzed in order to find the effect of each of the four delay conditions on the intelligibility of the final key word of each sentence. Normal-hearing listeners were minimally affected by the asynchronous delay. However, the hearing-impaired listeners were deleteriously affected by increasing amounts of spectrally asynchronous delay. Although the hearing-impaired listeners performed well overall in their perception of conversationally spoken speech in quiet, the intelligibility of conversationally spoken sentences significantly decreased when the delay values were equal to or greater than 4 ms. Therefore, hearing aid manufacturers need to restrict the amount of delay introduced by DSP so that it does not distort the acoustic patterns of conversational speech

    The acoustics of place of articulation in English plosives

    Get PDF
    PhD ThesisThis thesis investigates certain aspects of the acoustics of plosives’ place of articulation that have not been addressed by most previous studies, namely: 1. To test the performance of a technique for collapsing F2onset and F2mid into a single attribute, termed F2R. Results: F2R distinguishes place with effectively the same accuracy as F2onset+F2mid, being within ±1 percentage point of F2onset+F2mid at its strongest over most of the conditions examined. 2. To compare the strength of burst-based attributes at distinguishing place of articulation with and without normalization by individual speaker. Results: Lobanov normalization on average boosted the classification of individual attributes by 1.4 percentage points, but this modest improvement shrank or disappeared when the normalized attributes were combined into a single classification. 3. To examine the effect of different spectral representations (Hz-dB, Bark-phon, and Bark-sone) on the accuracy of the burst attributes. The results are mixed but mostly suggest that the choice between these representations is not a major factor in the classification accuracy of the attributes (mean difference of 1 to 1.5 percentage points); the choice of frequency region in the burst (mid versus high) is a far more important factor (13 percentage-point difference in mean classification accuracy). 4. To compare the performance of some traditional-phonetic burst attributes with the first 12 coefficients of the discrete cosine transform (DCT). The motivation for this comparison is that phonetic science has a long tradition of developing burst attributes that are tailored to the specific task of extracting place-of-articulation information from the burst, whereas automatic speech recognition (ASR) has long used attributes that are theoretically expected to capture more of the variance in the burst. Results: the DCT coefficients yielded a higher burst classification accuracy than the traditional phonetic attributes, by 3 percentage points.Economic and Social Research Counc

    On the effects of masking of perceptual cues in hearing-impaired ears

    Get PDF
    One of the goals of the Human Speech Recognition (HSR) group is to understand the strategy of the hearing-impaired (HI) ear in detecting consonants. It has been uniformly assumed that audibility is the main factor in speech perception, for both normal-hearing (NH) and HI listeners (Zurek and Delhorne, 1987). Based on an entropy measure, Trevino and Allen (2013) have shown that at most comfortable level (MCL) audibility is not the main issue for the HI ear. This observation is counter-intuitive. In this research group, we hope to find answers to the following questions: What is the strategy of each HI ear in detecting consonants? How can we determine the subject’s strategy? From the 3DDS findings of perceptual cues (Li and Allen 2011; Li et al. 2012), results from two perceptual masking experiments (Li and Allen 2011; Kapoor and Allen 2012), and analysis of work by Han (2011) and Trevino and Allen (2013), we generalize the errors made by an HI ear with up to four strategies. S1: The frequency of the consonant’s primary cue is varied by changing the vowels, which slightly moves the cue frequency. S2: The conflicting cues are varied. Different tokens of the same consonant have different confusions, due to conflicting cues. S3: The masking of the primary cue is varied. The primary cue for many tokens of the same consonant-vowel is highly correlated with the NH SNR 90 . S4: The number of conflicting cues is varied, as measured by the error entropy. The entropy of a token tells us something about the number of conflicting cues and/or about the ambiguity of the primary cue. In this research, we focus on one strategy, the masking of the primary cue on HI ears, and hope it will lead us in a positive direction of generalization. An extension of three consonant identification experiments is proposed, derived from Miller and Nicely (1955), Li and Allen (2011), and Kapoor and Allen (2012). Both Li and Kapoor showed that masking of primary cue and/or removing the conflicting cues can improve speech perception for NH ears. To determine the strategy of the HI ear in detecting consonants, we study consonant group error patterns. If we can establish error generalizability in the HI ears, we will gain insight into that ear’s decoding strategy

    An exploration of the rhythm of Malay

    Get PDF
    In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing. The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (∆C, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English. Further analysis has been carried out in light of Fletcher’s (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvaniti’s (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listeners’ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima. This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm

    Effects of NAL-R amplification on consonant speech perception in hearing-impaired listeners

    Get PDF
    This thesis investigates speech perception in hearing impaired (HI ) subjects. Psychoacoustic experiments in different conditions were undertaken. In particular, two consonant vowel (CV ) identification experiments in masking noise were conducted at various signal-to-noise ratios (SNRs) with 16 HI ears. In one of the experiments, the CVs were presented with a uniform gain; in the other experiment, a spectral compensation (i.e. NAL–R) for the individual hearing loss was provided. In both gain conditions, the subjects were instructed to adjust the presentation level to their most comfortable loudness (MCL), which is contrary to the common approach of adjusting the presentation level depending on the pure tone thresholds (PTTs) and the long-term average speech spectrum (LTASS ) (Zurek and Delhorne (1987), Posner and Ventry (1977)). The data demonstrated that the MCL approach led to consistent responses in all subjects. Based on these results, a more rigorous definition of audibility based on entropy and the Miller and Nicely (1955) confusion groups is proposed. Furthermore, the effectiveness of NAL–R for CV perception was investigated by comparing the confusion matrices of the two experiments. In general, the error and entropy decreased with NAL–R. The average error decreased from 20.1% (σ = 3.7) to 16.3% (σ = 2.8). It was also shown that, with the help of NAL–R, the tested ears became more consistent in their responses for a given token. However, for 15.1% of the token-ear pairs (TEPs), the entropy and error increased with NAL–R. It was shown that these 15.1% of the TEPs contained all ears and a large variety of tokens. A method based on the Hellinger Distance (HD) was introduced that enabled comparison of rows of confusion matrices and to calculate distances between responses. With this method, the highly individual problems of the 15.1% of the TEPs were further investigated and compared to the results obtained in normal hearing subjects. In conclusion, it is argued that speech testing — using the proposed methods and experiments as described in this thesis — can deliver valuable and reliable information about individual hearing loss that goes beyond what can be achieved using pure tone thresholds

    The role of sound offsets in auditory temporal processing and perception

    Get PDF
    Sound-offset responses are distinct to sound onsets in their underlying neural mechanisms, temporal processing pathways and roles in auditory perception following recent neurobiological studies. In this work, I investigate the role of sound offsets and the effect of reduced sensitivity to offsets on auditory perception in humans. The implications of a 'sound-offset deficit' for speech-in-noise perception are investigated, based on a mathematical model with biological significance and independent channels for onset and offset detection. Sound offsets are important in recognising, distinguishing and grouping sounds. They are also likely to play a role in perceiving consonants that lie in the troughs of amplitude fluctuations in speech. The offset influence on the discriminability of model outputs for 48 non-sense vowel-consonant-vowel (VCV) speech stimuli in varying levels of multi-talker babble noise (-12, -6, 0, 6, 12 dB SNR) was assessed, and led to predictions that correspond to known phonetic categories. This work therefore suggests that variability in the offset salience alone can explain the rank order of consonants most affected in noisy situations. A novel psychophysical test battery for offset sensitivity was devised and assessed, followed by a study to find an electrophysiological correlate. The findings suggest that individual differences in sound-offset sensitivity may be a factor contributing to inter-subject variation in speech-in-noise discrimination ability. The promising measures from these results can be used to test between-population differences in offset sensitivity, with more support for objective than psychophysical measures. In the electrophysiological study, offset responses in a duration discrimination paradigm were found to be modulated by attention compared to onset responses. Overall, this thesis shows for the first time that the onset-offset dichotomy in the auditory system, previously explored in physiological studies, is also evident in human studies for both simple and complex speech sounds
    corecore