80 research outputs found

    Obtaining prominence judgments from naïve listeners – Influence of rating scales, linguistic levels and normalisation

    Get PDF
    A frequently replicated finding is that higher frequency words tend to be shorter and contain more strongly reduced vowels. However, little is known about potential differences in the articulatory gestures for high vs. low frequency words. The present study made use of electromagnetic articulography to investigate the production of two German vowels, [i] and [a], embedded in high and low frequency words. We found that word frequency differently affected the production of [i] and [a] at the temporal as well as the gestural level. Higher frequency of use predicted greater acoustic durations for long vowels; reduced durations for short vowels; articulatory trajectories with greater tongue height for [i] and more pronounced downward articulatory trajectories for [a]. These results show that the phonological contrast between short and long vowels is learned better with experience, and challenge both the Smooth Signal Redundancy Hypothesis and current theories of German phonology

    Obtaining prominence judgments from naïve listeners – Influence of rating scales, linguistic levels and normalisation

    Get PDF
    Arnold D, Wagner P, Möbius B. Obtaining prominence judgments from naïve listeners – Influence of rating scales, linguistic levels and normalisation. In: Proceedings of Interspeech 2012. 2012

    L’individualità del parlante nelle scienze fonetiche: applicazioni tecnologiche e forensi

    Full text link

    Synthesising prosody with insufficient context

    Get PDF
    Prosody is a key component in human spoken communication, signalling emotion, attitude, information structure, intention, and other communicative functions through perceived variation in intonation, loudness, timing, and voice quality. However, the prosody in text-to-speech (TTS) systems is often monotonous and adds no additional meaning to the text. Synthesising prosody is difficult for several reasons: I focus on three challenges. First, prosody is embedded in the speech signal, making it hard to model with machine learning. Second, there is no clear orthography for prosody, meaning it is underspecified in the input text and making it difficult to directly control. Third, and most importantly, prosody is determined by the context of a speech act, which TTS systems do not, and will never, have complete access to. Without the context, we cannot say if prosody is appropriate or inappropriate. Context is wide ranging, but state-of-the-art TTS acoustic models only have access to phonetic information and limited structural information. Unfortunately, most context is either difficult, expensive, or impos- sible to collect. Thus, fully specified prosodic context will never exist. Given there is insufficient context, prosody synthesis is a one-to-many generative task: it necessitates the ability to produce multiple renditions. To provide this ability, I propose methods for prosody control in TTS, using either explicit prosody features, such as F0 and duration, or learnt prosody representations disentangled from the acoustics. I demonstrate that without control of the prosodic variability in speech, TTS will produce average prosody—i.e. flat and monotonous prosody. This thesis explores different options for operating these control mechanisms. Random sampling of a learnt distribution of prosody produces more varied and realistic prosody. Alternatively, a human-in-the-loop can operate the control mechanism—using their intuition to choose appropriate prosody. To improve the effectiveness of human-driven control, I design two novel approaches to make control mechanisms more human interpretable. Finally, it is important to take advantage of additional context as it becomes available. I present a novel framework that can incorporate arbitrary additional context, and demonstrate my state-of- the-art context-aware model of prosody using a pre-trained and fine-tuned language model. This thesis demonstrates empirically that appropriate prosody can be synthesised with insufficient context by accounting for unexplained prosodic variation

    Proceedings of the VIIth GSCP International Conference

    Get PDF
    The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)

    VOCAL BIOMARKERS OF CLINICAL DEPRESSION: WORKING TOWARDS AN INTEGRATED MODEL OF DEPRESSION AND SPEECH

    Get PDF
    Speech output has long been considered a sensitive marker of a person’s mental state. It has been previously examined as a possible biomarker for diagnosis and treatment response for certain mental health conditions, including clinical depression. To date, it has been difficult to draw robust conclusions from past results due to diversity in samples, speech material, investigated parameters, and analytical methods. Within this exploratory study of speech in clinically depressed individuals, articulatory and phonatory behaviours are examined in relation to psychomotor symptom profiles and overall symptom severity. A systematic review provided context from the existing body of knowledge on the effects of depression on speech, and provided context for experimental setup within this body of work. Examinations of vowel space, monophthong, and diphthong productions as well as a multivariate acoustic analysis of other speech parameters (e.g., F0 range, perturbation measures, composite measures, etc.) are undertaken with the goal of creating a working model of the effects of depression on speech. Initial results demonstrate that overall vowel space area was not different between depressed and healthy speakers, but on closer inspection, this was due to more specific deficits seen in depressed patients along the first formant (F1) axis. Speakers with depression were more likely to produce centralised vowels along F1, as compared to F2—and this was more pronounced for low-front vowels, which are more complex given the degree of tongue-jaw coupling required for production. This pattern was seen in both monophthong and diphthong productions. Other articulatory and phonatory measures were inspected in a factor analysis as well, suggesting additional vocal biomarkers for consideration in diagnosis and treatment assessment of depression—including aperiodicity measures (e.g., higher shimmer and jitter), changes in spectral slope and tilt, and additive noise measures such as increased harmonics-to-noise ratio. Intonation was also affected by diagnostic status, but only for specific speech tasks. These results suggest that laryngeal and articulatory control is reduced by depression. Findings support the clinical utility of combining Ellgring and Scherer’s (1996) psychomotor retardation and social-emotional hypotheses to explain the effects of depression on speech, which suggest observed changes are due to a combination of cognitive, psycho-physiological and motoric mechanisms. Ultimately, depressive speech is able to be modelled along a continuum of hypo- to hyper-speech, where depressed individuals are able to assess communicative situations, assess speech requirements, and then engage in the minimum amount of motoric output necessary to convey their message. As speakers fluctuate with depressive symptoms throughout the course of their disorder, they move along the hypo-hyper-speech continuum and their speech is impacted accordingly. Recommendations for future clinical investigations of the effects of depression on speech are also presented, including suggestions for recording and reporting standards. Results contribute towards cross-disciplinary research into speech analysis between the fields of psychiatry, computer science, and speech science

    Attitudes towards varieties of English by non-native and native speakers: A comparative view from Taiwan and the UK

    Get PDF
    Attitudes towards varieties of English have long been at the forefront of sociolinguistic research. Whilst most of these studies have concentrated on native varieties of English, in recent years, research has turned to non-native varieties that arose as English became the lingua franca across the globe. Research has demonstrated that whilst native varieties are generally viewed as being of a higher status, non-native varieties are sometimes considered more positively in terms of social attractiveness, or ‘solidarity’. However, in recent years, non-native speakers have begun to outnumber native English speakers, thus attitudes towards these speakers may be changing. This study contributes to research on attitudes towards native and non-native varieties of English by conducting a comparative investigation of the attitudes of 317 Taiwanese nationals living in Taiwan and 147 British nationals living in the UK towards different English accents. Online questionnaires utilising both direct (e.g., Likert scales and multiple-choice questions) and indirect (e.g., verbal guise test) methods were employed to examine Taiwanese and British attitudes towards varieties of English. The study examined seven varieties as categorised according to Kachru’s (1992a) three concentric circles: the Inner Circle: Australian English, General American English and Standard Southern British English; the Outer Circle: Indian English; and the Expanding Circle: Japanese English, Spanish English and Taiwanese English. Four key findings emerge from the study. First, both direct and indirect techniques of evaluation demonstrate that both Taiwanese and British respondents largely favour English varieties of the Inner Circle and the Outer Circle over those of the Expanding Circle. Second, the indirect attitude measurements of the verbal guise test demonstrate that both groups prefer the variety of General American English in terms of both status and solidarity. Third, the research found that a number of social variables (e.g., gender, occupation) had a significant effect on speaker evaluations. Fourth, although Taiwanese and British participants were very capable of distinguishing whether a speaker was native or non-native, there were generally no significant correlations between a speaker’s ability to identify different English varieties and their having a favourable attitude towards these. Overall, the findings demonstrated that Taiwanese and British people predominantly share similar attitudes towards varieties of English. Nevertheless, when the effects of the social variables and speaker identifications are considered, native and non-native speakers’ perceptions of different varieties of English might differ. These findings contribute to the understanding of the similarities and differences between native and non-native speakers’ attitudes towards varieties of English in the context of an increasingly globalised world and the rise of the non-native speakers of English therein

    A reliable past or a reliable pest? Testing canonical stimuli in speech perception research

    Get PDF
    A growing body of research is exploring second language (L2) learners’ listening perception of vowel contrasts. Conventionally, researchers have estimated how well listeners differentiate between L2 vowels with isolated words (or syllables) in a fixed consonantal frame, such as b-vowel-t (e.g., beat-bit). However, there is a dearth of research that systematically examines how well results generalise beyond isolated frames or the suitability of employing more phonologically and sententially diverse listening prompt types for assessing L2 vowel perception. To address this gap, two studies investigated the effects of using b-vowel-t and more diverse prompt types for assessing intermediate-advanced adult L2 perception of English /i/-/ɪ/ and /ɛ/-/æ/ vowel pairs. Prompt performance was measured for internal consistency, congruence with the Perceptual Assimilation Model for L2 speech learning (Best & Tyler, 2007), and listeners’ subjective experiences with each prompt type. Mixed effects modelling investigated the predictive power of b-vowel-t performance on more diverse prompt types. Study 1 explored prompt performance using closed-set, forced choice tasks with first language (L1) Mandarin and Korean listeners. Study 2 investigated the effect of Mandarin and Spanish L1 listeners’ target word familiarity and associations with sentence prompts using transcription-response tasks and self-report surveys. Both studies found that diverse prompts had adequate internal consistency and aligned with PAM-L2 predictions. B-vowel-t prompts poorly generalised to diverse prompts and accorded less with PAM-L2 predictions. Survey results showed increased demands from more diverse prompt types based on participants’ ratings; however, this did not always correspond to lower performance. Collectively, results indicate utility in employing prompts beyond isolated words in a fixed consonantal frame for laboratory and at-home administrations. These findings contribute to the vowel perception literature by evaluating and extending the scope of prompts which may be used
    corecore