19 research outputs found

    ARU speech corpus (University of Liverpool)

    Get PDF
    This corpus comprises single channel recordings of IEEE (Harvard) sentences (IEEE, 1969) spoken by twelve adult native British English speakers in anechoic conditions. IEEE (1969). Recommended practice for speech quality measurements, IEEE Transactions on Audio and Electroacoustics, 17 (3), 227-246. Principal |nvestigator: Professor Carl Hopkin

    Effect of Training and Level of External Auditory Feedback on the Singing Voice: Pitch Inaccuracy.

    Get PDF
    BackgroundOne of the most important aspects of singing is the control of fundamental frequency.ObjectivesThe effects on pitch inaccuracy, defined as the distance in cents in equally tempered tuning between the reference note and the sung note, of the following conditions were evaluated: (1) level of external feedback, (2) tempo (slow or fast), (3) articulation (legato or staccato), (4) tessitura (low, medium, or high), and (5) semi-phrase direction (ascending or descending).MethodsThe subjects were 10 nonprofessional singers and 10 classically trained professional or semi-professional singers (10 men and 10 women). Subjects sang one octave and a fifth arpeggi with three different levels of external auditory feedback, two tempi, and two articulations (legato or staccato).ResultsIt was observed that inaccuracy was greatest in the descending semi-phrase arpeggi produced at a fast tempo and with a staccato articulation, especially for nonprofessional singers. The magnitude of inaccuracy was also relatively large in the high tessitura relative to the low and the medium tessitura for such singers. Contrary to predictions, when external auditory feedback was strongly attenuated by the hearing protectors, nonprofessional singers showed greater pitch accuracy than in the other external feedback conditions. This finding indicates the importance of internal auditory feedback in pitch control.ConclusionsWith an increase in training, the singer's pitch inaccuracy decreases

    Evaluation of the starting point of the Lombard Effect.

    Get PDF
    Speakers increase their vocal effort when their communication is disturbed by noise. This adaptation is termed the Lombard effect. The aim of the present study was to determine whether this effect has a starting point. Hence, the effects of noise at levels between 20 and 65 dB(A) on vocal effort (quantified by sound pressure level) and on both perceived noise disturbance and perceived vocal discomfort were evaluated. Results indicate that there is a Lombard effect change-point at a background noise level (Ln) of 43.3 dB(A). This change-point is anticipated by noise disturbance, and is followed by a high magnitude of vocal discomfort

    Overview of the 2023 ICASSP SP Clarity Challenge: Speech Enhancement for Hearing Aids

    Get PDF
    This paper reports on the design and outcomes of the ICASSP SP Clarity Challenge: Speech Enhancement for Hearing Aids. The scenario was a listener attending to a target speaker in a noisy, domestic environment. There were multiple interferers and head rotation by the listener. The challenge extended the second Clarity Enhancement Challenge (CEC2) by fixing the amplification stage of the hearing aid; evaluating with a combined metric for speech intelligibility and quality; and providing two evaluation sets, one based on simulation and the other on real-room measurements. Five teams improved on the baseline system for the simulated evaluation set, but the performance on the measured evaluation set was much poorer. Investigations are on-going to determine the exact cause of the mismatch between the simulated and measured data sets. The presence of transducer noise in the measurements, lower order Ambisonics harming the ability for systems to exploit binaural cues and the differences between real and simulated room impulse responses are suggested causes

    The 2nd Clarity Prediction Challenge: A machine learning challenge for hearing aid intelligibility prediction

    Get PDF
    This paper reports on the design and outcomes of the 2nd Clarity Prediction Challenge (CPC2) for predicting the intelligibility of hearing aid processed signals heard by individuals with a hearing impairment. The challenge was designed to promote new approaches for estimating the intelligibility of hearing aid signals that can be used in future hearing aid algorithm development. It extends an earlier round (CPC1, 2022) in a number of critical directions, including a larger dataset coming from new speech intelligibility listening experiments, a greater degree of variability in the test materials, and a design that requires prediction systems to generalise to unseen algorithms and listeners. This paper provides a full description of the new publicly available CPC2 dataset, the CPC2 challenge design, and the baseline systems. The challenge attracted 12 systems from 9 research teams. The systems are reviewed, their performance is analysed and conclusions are presented, with reference to the progress made since the earlier CPC1 challenge. In particular, it is seen how reference-free, non-intrusive systems based on pre-trained large acoustic models can perform well in this context

    Muddy, muddled, or muffled? Understanding the perception of audio quality in music by hearing aid users

    Get PDF
    Introduction: Previous work on audio quality evaluation has demonstrated a developing convergence of the key perceptual attributes underlying judgments of quality, such as timbral, spatial and technical attributes. However, across existing research there remains a limited understanding of the crucial perceptual attributes that inform audio quality evaluation for people with hearing loss, and those who use hearing aids. This is especially the case with music, given the unique problems it presents in contrast to human speech. Method: This paper presents a sensory evaluation study utilising descriptive analysis methods, in which a panel of hearing aid users collaborated, through consensus, to identify the most important perceptual attributes of music audio quality and developed a series of rating scales for future listening tests. Participants (N = 12), with a hearing loss ranging from mild to severe, first completed an online elicitation task, providing single-word terms to describe the audio quality of original and processed music samples; this was completed twice by each participant, once with hearing aids, and once without. Participants were then guided in discussing these raw terms across three focus groups, in which they reduced the term space, identified important perceptual groupings of terms, and developed perceptual attributes from these groups (including rating scales and definitions for each). Results: Findings show that there were seven key perceptual dimensions underlying music audio quality (clarity, harshness, distortion, spaciousness, treble strength, middle strength, and bass strength), alongside a music audio quality attribute and possible alternative frequency balance attributes. Discussion: We outline how these perceptual attributes align with extant literature, how attribute rating instruments might be used in future work, and the importance of better understanding the music listening difficulties of people with varied profiles of hearing loss

    Speech produced in noise: Relationship between listening difficulty and acoustic and durational parameters.

    Get PDF
    Conversational speech produced in noise can be characterised by increases in intelligibility relative to such speech produced in quiet. Listening difficulty (LD) is a metric that can be used to evaluate speech transmission performance more sensitively than intelligibility scores in situations in which performance is likely to be high. The objectives of the present study were to evaluate the LD of speech produced in different noise and style conditions, to evaluate the spectral and durational speech modifications associated with these conditions, and to determine whether any of the spectral and durational parameters predicted LD. Nineteen subjects were instructed to speak at normal and loud volumes in the presence of background noise at 40.5 dB(A) and babble noise at 61 dB(A). The speech signals were amplitude-normalised, combined with pink noise to obtain a signal-to-noise ratio of -6 dB, and presented to twenty raters who judged their LD. Vowel duration, fundamental frequency and the proportion of the spectral energy in high vs low frequencies increased with the noise level within both styles. LD was lowest when the speech was produced in the presence of high level noise and at a loud volume, indicating improved intelligibility. Spectrum balance was observed to predict LD

    An acoustic study of coarticulation: consonant-vowel and vowel-to-vowel coarticulation in four Australian languages

    Get PDF
    © 2012 Dr. N. Simone GraetzerAcoustic phonetic experiments were conducted with the aim of describing spatial coarticulation in consonants and vowels in four Australian languages: Arrernte, Burarra, Gupapuyngu and Warlpiri. Interactions were examined between coarticulation and factors such as consonant place of articulation (the location of the point of maximal consonantal constriction in the vocal tract), the position of the consonant relative to the vowel (preceding or following), prosodic prominence and language. The principal motivation was to contribute to the experimental literature on coarticulation in Australian languages, given their unusual phonological characteristics. The results of acoustic measurements show that in stop consonant and vowel production, there are systematic contrasts between consonant places of articulation, especially between peripheral (i.e., bilabial and dorso-velar) and non-peripheral categories, and there are clearly discernible consonant place-dependent differences in the degree of vowel-to-consonant and consonant-to-vowel coarticulation. Additionally, consonant place of articulation is seen to strongly modulate vowel-to-vowel coarticulation. As observed in other languages, such as Catalan, Italian and German, the degree of vowel-to-consonant coarticulation is seen to vary inversely with the degree of consonantal articulatory constraint (i.e., degree of tongue dorsum raising), as does the degree of segmental context-sensitivity. However, findings reported in this dissertation suggest that, unlike results reported previously for European languages such as English, anticipatory vowel-to-consonant coarticulation tends to exceed carryover coarticulation in these languages. With regard to prosodic effects on coarticulation, it appears that prominent vowels do not typically undergo localised hyper-articulation or acoustical expansion as in English, Dutch and German. It is concluded that these results support the view that the maintenance of consonant place of articulation distinctions is pre-eminent in Australian languages. The analyses that are presented contribute to an understanding of the role of consonant place of articulation in coarticulation and, more generally, of the relationship between the acoustics and the biomechanics of speech

    Comparison of ideal mask-based speech enhancement algorithms for speech mixed with white noise at low mixture signal-to-noise ratios

    No full text
    The literature shows that the intelligibility of noisy speech can be improved by applying an ideal binary or soft gain mask in the time-frequency domain for signal-to-noise ratios (SNRs) between –10 and +10 dB. In this study, two mask-based algorithms are compared when applied to speech mixed with white Gaussian noise (WGN) at lower SNRs, that is, SNRs from −29 to –5 dB. These comprise an Ideal Binary Mask (IBM) with a Local Criterion (LC) set to 0 dB and an Ideal Ratio Mask (IRM). The performance of three intrusive Short-Time Objective Intelligibility (STOI) variants—STOI, STOI+, and Extended Short-Time Objective Intelligibility (ESTOI)—is compared with that of other monaural intelligibility metrics that can be used before and after mask-based processing. The results show that IRMs can be used to obtain near maximal speech intelligibility (&gt;90% for sentence material) even at very low mixture SNRs, while IBMs with LC =  0 provide limited intelligibility gains for SNR &lt; −14 dB. It is also shown that, unlike STOI, STOI+ and ESTOI are suitable metrics for speech mixed with WGN at low SNRs and processed by IBMs with LC =  0 even when speech is high-pass filtered to flatten the spectral tilt before masking. </jats:p

    Clarity-2021 challenges: machine learning challenges for advancing hearing aid processing

    No full text
    In recent years, rapid advances in speech technology have been made possible by machine learning challenges such as CHiME, REVERB, Blizzard, and Hurricane. In the Clarity project, the machine learning approach is applied to the problem of hearing aid processing of speech-in-noise, where current technology in enhancing the speech signal for the hearing aid wearer is often ineffective. The scenario is a (simulated) cuboid-shaped living room in which there is a single listener, a single target speaker and a single interferer, which is either a competing talker or domestic noise. All sources are static, the target is always within ±30° azimuth of the listener and at the same elevation, and the interferer is an omnidirectional point source at the same elevation. The target speech comes from an open source 40-speaker British English speech database collected for this purpose. This paper provides a baseline description of the round one Clarity challenges for both enhancement (CEC1) and prediction (CPC1). To the authors’ knowledge, these are the first machine learning challenges to consider the problem of hearing aid speech signal processing
    corecore