715 research outputs found

    Articulatory Consequences of Vocal Effort Elicitation Method

    Get PDF
    Articulatory features from two datasets, Slovak and Swedish, were compared to see whether different methods of eliciting loud speech (ambient noise vs visually presented loudness target) result in different articulatory behavior. The features studied were temporal and kinematic characteristics of lip separation within the closing and opening gestures of bilabial consonants, and of the tongue body movement from /i/ to /a/ through a bilabial consonant. The results indicate larger hyper- articulation in the speech elicited with visually presented target. While individual articulatory strategies are evident, the speaker groups agree on increasing the kinematic features equally within each gesture in response to the increased vocal effort. Another concerted strategy is keeping the tongue response at a minimum, presumably to preserve acoustic prerequisites necessary for the adequate vowel identity. While the method of visually presented loudness target elicits larger span of vocal effort, the two elicitation methods achieve comparable consistency per loudness conditions.Peer reviewe

    Recognizing Speech in a Novel Accent: The Motor Theory of Speech Perception Reframed

    Get PDF
    The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    VOCAL BIOMARKERS OF CLINICAL DEPRESSION: WORKING TOWARDS AN INTEGRATED MODEL OF DEPRESSION AND SPEECH

    Get PDF
    Speech output has long been considered a sensitive marker of a person’s mental state. It has been previously examined as a possible biomarker for diagnosis and treatment response for certain mental health conditions, including clinical depression. To date, it has been difficult to draw robust conclusions from past results due to diversity in samples, speech material, investigated parameters, and analytical methods. Within this exploratory study of speech in clinically depressed individuals, articulatory and phonatory behaviours are examined in relation to psychomotor symptom profiles and overall symptom severity. A systematic review provided context from the existing body of knowledge on the effects of depression on speech, and provided context for experimental setup within this body of work. Examinations of vowel space, monophthong, and diphthong productions as well as a multivariate acoustic analysis of other speech parameters (e.g., F0 range, perturbation measures, composite measures, etc.) are undertaken with the goal of creating a working model of the effects of depression on speech. Initial results demonstrate that overall vowel space area was not different between depressed and healthy speakers, but on closer inspection, this was due to more specific deficits seen in depressed patients along the first formant (F1) axis. Speakers with depression were more likely to produce centralised vowels along F1, as compared to F2—and this was more pronounced for low-front vowels, which are more complex given the degree of tongue-jaw coupling required for production. This pattern was seen in both monophthong and diphthong productions. Other articulatory and phonatory measures were inspected in a factor analysis as well, suggesting additional vocal biomarkers for consideration in diagnosis and treatment assessment of depression—including aperiodicity measures (e.g., higher shimmer and jitter), changes in spectral slope and tilt, and additive noise measures such as increased harmonics-to-noise ratio. Intonation was also affected by diagnostic status, but only for specific speech tasks. These results suggest that laryngeal and articulatory control is reduced by depression. Findings support the clinical utility of combining Ellgring and Scherer’s (1996) psychomotor retardation and social-emotional hypotheses to explain the effects of depression on speech, which suggest observed changes are due to a combination of cognitive, psycho-physiological and motoric mechanisms. Ultimately, depressive speech is able to be modelled along a continuum of hypo- to hyper-speech, where depressed individuals are able to assess communicative situations, assess speech requirements, and then engage in the minimum amount of motoric output necessary to convey their message. As speakers fluctuate with depressive symptoms throughout the course of their disorder, they move along the hypo-hyper-speech continuum and their speech is impacted accordingly. Recommendations for future clinical investigations of the effects of depression on speech are also presented, including suggestions for recording and reporting standards. Results contribute towards cross-disciplinary research into speech analysis between the fields of psychiatry, computer science, and speech science

    Allophonic Variation in the Spanish Sibilant Fricative

    Get PDF
    In Spanish, the phoneme /s/ has two variants: [z] occurs in the coda when preceding a voiced consonant, and [s] occurs elsewhere. However, recent research has revealed irregular voicing patterns with regards to this phone. This dissertation examines two of these allophonic variations. It first investigates how speech rate and speech formality contribute to the gradient and variable nature of the voicing assimilation rule. Next, it explores possible intervocalic /s/ voicing in Highland Colombian Spanish. In accordance with other studies, the results showed partial voicing of coda position /s/ before voiced consonants (25%-80% voiced frication noise). Furthermore, there was scarce evidence for intervocalic /s/ voicing in the Colombian data (3%-35% voiced frication noise). Both studies led to the same conclusion; that gestural blending is a prominent and frequently occurring process in Spanish. In both cases, the vocal chords begin to vibrate in anticipation of the following sound (either a voiced consonant or vowel) before the constriction needed to produce the fricative has ended. The data revealed that there is a significant correlation between speech rate and the degree to which the adjacent segments overlap with one another. However, speech formality does not appear to be a function of the gestural overlap. In addition to the two factors tested (speech rate and speech formality), this dissertation also provides other possible factors which may affect the degree to which segments overlap such as its position within the syllable (onset versus coda) and following segment type (vowel versus consonant)

    Towards a clinical assessment of acquired speech dyspraxia.

    Get PDF
    No standardised assessment exists for the recognition and quantification of acquired speech dyspraxia (also called apraxia of speech, AS). This thesis aims to work towards development of such an assessment based on perceptual features. Review of previous features claimed to characterise AS and differentiate it from other acquired pronunciation problems (dysarthrias; phonemic paraphasia - PP) has proved negative. Reasons for this have been explored. A reconceptualisation of AS is attempted based on physical studies of AS, PP and the dysarthrias; their position and relationship within coalitional models of speech production; by comparison with normal action control and other dyspraxias. Contrary to the view of many it is concluded that AS and PP are dyspraxias (albeit different types). However, due to the interactive nature of speech-language production and behaviour of the vocal tract as a functional whole AS is unlikely to be distinguishable in an absolute fashion based on single speech characteristics. Rather it is predicted that pronunciation disordered groups will differ relatively on total error profiles and susceptibility to associated effects (variability; propositionality; struggle; length-complexity; latency-utterance times). Using a prototype battery and refined error transcription and analysis procedures a series of studies test predictions on three groups: spastic dysarthrics (n = 6) AS and PP without (n = 12) and with (n = 12) dysphasia. The main conclusions do not support the error profile hypotheses in any straightforward manner. Length-complexity effects and latency-utterance times fail to consistently separate groups. Variability, propositionality and struggle proved the most reliable indicators. Error profiles remain the closest indicators of speakers' intelligibility and therapeutic goals. The thesis argues for a single case approach to differential diagnosis and alternative statistical analyses to capture individual and group differences. Suggestions for changes to the prototype clinical battery and data management to effect optimal speaker differentiation conclude the work

    Multi‐speaker experimental designs: Methodological considerations

    Get PDF
    Research on language use has become increasingly interested in the multimodal and interactional aspects of language – theoretical models of dialogue, such as the Communication Accommodation Theory and the Interactive Alignment Model are examples of this. In addition, researchers have started to give more consideration to the relationship between physiological processes and language use. This article aims to contribute to the advancement in studies of physiological and/or multimodal language use in naturalistic settings. It does so by providing methodological recommendations for such multi-speaker experimental designs. It covers the topics of (a) speaker preparation and logistics, (b) experimental tasks and (c) data synchronisation and post-processing. The types of data that will be considered in further detail include audio and video, electroencephalography, respiratory data and electromagnetic articulography. This overview with recommendations is based on the answers to a questionnaire that was sent amongst the members of the Horizon 2020 research network ‘Conversational Brains’, several researchers in the field and interviews with three additional experts.H2020 Marie Skłodowska‐Curie Actions http://dx.doi.org/10.13039/100010665Peer Reviewe

    Individual differences in speech production and maximum speech performance

    Get PDF

    Can Late EFL Learners Attain Nativelike Pronunciation? Evidence from Catalan Speakers’ Production of English Low Vowels

    Get PDF
    [Abstract] Catalan learners of English have many difficulties to produce the three English low vowels /ӕ ʌ ɑ/ accurately. This is because, like Spanish, Catalan has only one low vowel /а/ so learners tend to perceive the three target vowels as instances of the same Catalan cetegory (Rallo Fabra, 2005). Flege (1995) predicts that nonnative speakers can produce L2 sounds authentically if they perceive the differences between the native and the target sounds. This paper investigates production accuracy of the three English low vowels by three groups of Catalan learners of English differing in foreign accent (FA) and a group of native English speakers. The data were obtained in an elicitation task in which participants were asked to pronounce a series of monosyllabic words containing one of the three target vowels. Production accuracy was measured quantitatively in terms of spectral data of first and second formant vowel frequencies. Results indicate that learners can produce nativelike instances of vowels /ӕ/ and /ʌ/ but not of /ɑ/. These findings do not really follow Flege’s predictions, since learners were able to produce nativelike instances of two English vowels although these vowels were heard as “similar” to Catalan /a/

    Are affective speakers effective speakers? – Exploring the link between the vocal expression of positive emotions and communicative effectiveness

    Get PDF
    This thesis explores the effect of vocal affect expression on communicative effectiveness. Two studies examined whether positive speaker affect facilitates the encoding and decoding of the message, combining methods from Phonetics and Psychology.This research has been funded through a Faculty Studentship by the University of Stirling and a Fellowship by the German Academic Exchange Service (DAAD)
    corecore