3,316 research outputs found
Applying rhythm metrics to non-native spontaneous speech
This study investigates a variety of rhythm metrics on two corpora of non-native spontaneous speech and compares the nonnative distributions to values from a corpus of native speech. Several of the metrics are shown to differentiate well between native and non-native speakers and to also have moderate correlations with English proficiency scores that were assigned to the non-native speech. The metric that had the highest correlation with English proficiency scores (apart from speaking rate) was rPVIsyl (the raw Pairwise Variability Index for syllables), with r = −0.43. Index Terms: Rhythm metrics, non-native speech, fluency 1
Speech rhythm: a metaphor?
Is speech rhythmic? In the absence of evidence for a traditional view that languages strive to coordinate either syllables or stress-feet with regular time intervals, we consider the alternative that languages exhibit contrastive rhythm subsisting merely in the alternation of stronger and weaker elements. This is initially plausible, particularly for languages with a steep ‘prominence gradient’, i.e. a large disparity between stronger and weaker elements; but we point out that alternation is poorly achieved even by a ‘stress-timed’ language such as English, and, historically, languages have conspicuously failed to adopt simple phonological remedies that would ensure alternation. Languages seem more concerned to allow ‘syntagmatic contrast’ between successive units and to use durational effects to support linguistic functions than to facilitate rhythm. Furthermore, some languages (e.g. Tamil, Korean) lack the lexical prominence which would most straightforwardly underpin prominence alternation. We conclude that speech is not incontestibly rhythmic, and may even be antirhythmic. However, its linguistic structure and patterning allow the metaphorical extension of rhythm in varying degrees and in different ways depending on the language, and that it is this analogical process which allows speech to be matched to external rhythms
Recommended from our members
Rhythm in the speech of a person with right hemisphere damage: Applying the pairwise variability index
Although several aspects of prosody have been studied in speakers with right hemisphere damage (RHD), rhythm remains largely uninvestigated. This study compares the rhythm of an Australian English speaker with right hemisphere damage (due to a stroke, but with no concomitant dysarthria) to that of a neurologically unimpaired individual. The speakers' rhythm is compared using the pairwise variability index (PVI) which allows for an acoustic characterization of rhythm by comparing the duration of successive vocalic and intervocalic intervals. A sample of speech from a structured interview between a speech and language therapist and each participant was analysed. Previous research has shown that speakers with RHD may have difficulties with intonation production, and therefore it was hypothesized that there may also be rhythmic disturbance. Results show that the neurologically normal control uses a similar rhythm to that reported for British English (there are no previous studies available for Australian English), whilst the speaker with RHD produces speech with a less strongly stress-timed rhythm. This finding was statistically significant for the intervocalic intervals measured (t(8) = 4.7, p < .01), and suggests that some aspects of prosody may be right lateralized for this speaker. The findings are discussed in relation to previous findings of dysprosody in RHD populations, and in relation to syllable-timed speech of people with other neurological conditions
Recommended from our members
Deep Learning for Automatic Assessment and Feedback of Spoken English
Growing global demand for learning a second language (L2), particularly English, has led to
considerable interest in automatic spoken language assessment, whether for use in computerassisted language learning (CALL) tools or for grading candidates for formal qualifications.
This thesis presents research conducted into the automatic assessment of spontaneous nonnative English speech, with a view to be able to provide meaningful feedback to learners. One
of the challenges in automatic spoken language assessment is giving candidates feedback on
particular aspects, or views, of their spoken language proficiency, in addition to the overall
holistic score normally provided. Another is detecting pronunciation and other types of errors
at the word or utterance level and feeding them back to the learner in a useful way.
It is usually difficult to obtain accurate training data with separate scores for different
views and, as examiners are often trained to give holistic grades, single-view scores can
suffer issues of consistency. Conversely, holistic scores are available for various standard
assessment tasks such as Linguaskill. An investigation is thus conducted into whether
assessment scores linked to particular views of the speaker’s ability can be obtained from
systems trained using only holistic scores.
End-to-end neural systems are designed with structures and forms of input tuned to single
views, specifically each of pronunciation, rhythm, intonation and text. By training each
system on large quantities of candidate data, individual-view information should be possible
to extract. The relationships between the predictions of each system are evaluated to examine
whether they are, in fact, extracting different information about the speaker. Three methods
of combining the systems to predict holistic score are investigated, namely averaging their
predictions and concatenating and attending over their intermediate representations. The
combined graders are compared to each other and to baseline approaches.
The tasks of error detection and error tendency diagnosis become particularly challenging
when the speech in question is spontaneous and particularly given the challenges posed by
the inconsistency of human annotation of pronunciation errors. An approach to these tasks is
presented by distinguishing between lexical errors, wherein the speaker does not know how a
particular word is pronounced, and accent errors, wherein the candidate’s speech exhibits
consistent patterns of phone substitution, deletion and insertion. Three annotated corpora
x
of non-native English speech by speakers of multiple L1s are analysed, the consistency of
human annotation investigated and a method presented for detecting individual accent and
lexical errors and diagnosing accent error tendencies at the speaker level
Recommended from our members
Speech rhythm: the language-specific integration of pitch and duration
Experimental phonetic research on speech rhythm seems to have reached an impasse. Recently, this research field has tended to investigate produced (rather than perceived) rhythm, focussing on timing, i.e. duration as an acoustic cue, and has not considered that rhythm perception might be influenced by native language. Yet evidence from other areas of phonetics, and other disciplines, suggests that an investigation of rhythm is needed which (i) focuses on listeners’ perception, (ii) acknowledges the role of several acoustic cues, and (iii) explores whether the relative significance of these cues differs between languages. This thesis, the originality of which derives from its adoption of these three perspectives combined, indicates new directions for progress. A series of perceptual experiments investigated the interaction of duration and f0 as perceptual cues to prosody in languages with different prosodic structures – Swiss German, Swiss French, and French (i.e. from France). The first experiment demonstrated that a dynamic f0 increases perceived syllable duration in contextually isolated pairs of monosyllables, for all three language groups. The second experiment found that dynamic f0 and increased duration interact as cues to rhythmic groups in series of monosyllabic digits and letters; the two cues were significantly more effective than one when heard simultaneously, but significantly less effective than one when heard in conflicting positions around the rhythmic-group boundary location, and native language influenced whether f0 or duration was the more effective cue.
These two experiments laid the basis for the third, which directly addressed rhythm. Listeners were asked to judge the rhythmicality of sentences with systematic duration and f0 manipulations; the results provide evidence that duration and f0 are interdependent cues in rhythm perception, and that the weighting of each cue varies in different languages. A fourth experiment applied the perceptual results to production data, to develop a rhythm metric which captures the multi-dimensional and language-specific nature of perceived rhythm in speech production. These findings have the important implication that if future phonetic research on rhythm follows these new perspectives, it may circumvent the impasse and advance our knowledge and model of speech rhythm.This work was funded by an AHRC doctoral award to the author
A description of the rhythm of Barunga Kriol using rhythm metrics and an analysis of vowel reduction
Kriol is an English-lexifier creole language spoken by over 20,000 children and adults in the Northern parts of Australia, yet much about the prosody of this language remains unknown. This thesis provides a preliminary description of the rhythm and patterns of vowel reduction of Barunga Kriol - a variety of Kriol local to Barunga Community, NT – and compares it to a relatively standard variety of Australian English. The thesis is divided into two studies. Study 1, the Rhythm Metric Study, describes the rhythm of Barunga Kriol and Australian English using rhythm metrics. Study 2, the Vowel Reduction Study, compared patterns of vowel reduction in Barunga Kriol and Australian English. This thesis contributes the first in depth studies of vowel reduction patterns and rhythm using rhythm metrics in any variety of Kriol or Australian English. The research also sets an adult baseline for metric results and patterns of vowel reduction for Barunga Kriol and Australian English, useful for future studies of child speech in these varieties. As rhythm is a major contributor to intelligibility, the findings of this thesis have the potential to inform teaching practice in English as a Second Language
Speech Rhythm in Spontaneous and Controlled L2 Speaking Modes: Exploring Differences and Distance Measures
Studies of speech rhythm have often used read speech rather than spontaneous speech in their comparisons. However, read speech has been shown to be perceptually different from spontaneous speech, which may be due to rhythmic differences between the two modes. To examine this, the effect of speaking mode (spontaneous or controlled) was assessed in a group of 82 Spanish-Catalan learners of English relative to a control group of 8 native English speakers. Results found strong rhythmic differences between the two modes, but minimal differences between the learners and native speakers. Additionally, Mahalanobis distance analyses revealed that non-native speakers differed significantly more from the native control group in the spontaneous condition than the controlled condition
“When do we get into the cultural rhythm?” A study on the effects of music-cultural perceptual narrowing
openRhythmic abilities are a fundamental aspect of daily life. Rhythm offers a predictable sequence of time intervals and accents that individuals can synchronize their actions to, enabling one to learn a language, communicate with others, move from one place to another, and synchronize movements to music. Syncing body movements with music, whether through dance or merely an individual response to music, is a common human behavior (Patel et al., 2005). But synchronization, though seemingly effortless, requires the complex integration of perceptual and sensorimotor skills. In moving to music, a beat must first be extracted and then a rhythmic motor response is integrated into that metrical framework (Ilari, 2014). But all over the world, metrical and rhythmic structures differ (Kalender et al., 2013). Hence, an individual’s perception and processing of rhythm are shaped by the unique rhythmic characteristics of the musical culture in which they are deeply ingrained.
Studies have shown that individuals of various ages and cultural backgrounds experience a phenomenon known as music-cultural perceptual narrowing (e.g., Lynch et al., 1990; Lynch & Eilers, 1992; Hannon & Trehub, 2005a,b; Hannon & Trainor, 2007). Individuals initially exhibit sensitivity to a diverse range of perceptual structures that narrow down through exposure to the specific characteristics of their musical culture, thus leading to reduced sensitivity to less conventional structures. This study explores the effect of this phenomenon on movement-to-music synchronization, putting to question whether (i) culture-specific perceptual narrowing influences how infants spontaneously move in response to music samples with meters that are either present in their day-to-day experiences with music or absent from it, and whether these responses are (ii) modulated by daily exposure to, i.e. training with, a specific rhythmic pattern, which was either native to the infants’ culture or non-native.
Italian infants aged 6 to 24 months and their parents, who were mainly exposed to music with isochronous simple meters, were presented with songs of both simple (4/4) and complex (7/8) meters and their motor behavior as a response to these songs were analyzed. Subsequently, they were invited to participate in a month-long musical training to either a song of 4/4 or 7/8 meter. They were then asked to return to the same experimental setting and tasked to do the same thing as the first experimental session.
Preliminary analysis of infants’ motor behavior during auditory stimuli exposure suggests individual differences in motor responses, potential changes in correlations between arm and leg movements, and consistent high levels of synchronization.
This thesis will first review existing literature on musicality, music processing, music-cultural perceptual narrowing, and sensorimotor synchronization (Chapter 1), then detail the research methods and materials (Chapter 2). Preliminary results will be presented (Chapter 3), and the theoretical and educational implications of these findings for our understanding of music-motor synchrony and future research directions will be detailed (Chapter 4).Rhythmic abilities are a fundamental aspect of daily life. Rhythm offers a predictable sequence of time intervals and accents that individuals can synchronize their actions to, enabling one to learn a language, communicate with others, move from one place to another, and synchronize movements to music. Syncing body movements with music, whether through dance or merely an individual response to music, is a common human behavior (Patel et al., 2005). But synchronization, though seemingly effortless, requires the complex integration of perceptual and sensorimotor skills. In moving to music, a beat must first be extracted and then a rhythmic motor response is integrated into that metrical framework (Ilari, 2014). But all over the world, metrical and rhythmic structures differ (Kalender et al., 2013). Hence, an individual’s perception and processing of rhythm are shaped by the unique rhythmic characteristics of the musical culture in which they are deeply ingrained.
Studies have shown that individuals of various ages and cultural backgrounds experience a phenomenon known as music-cultural perceptual narrowing (e.g., Lynch et al., 1990; Lynch & Eilers, 1992; Hannon & Trehub, 2005a,b; Hannon & Trainor, 2007). Individuals initially exhibit sensitivity to a diverse range of perceptual structures that narrow down through exposure to the specific characteristics of their musical culture, thus leading to reduced sensitivity to less conventional structures. This study explores the effect of this phenomenon on movement-to-music synchronization, putting to question whether (i) culture-specific perceptual narrowing influences how infants spontaneously move in response to music samples with meters that are either present in their day-to-day experiences with music or absent from it, and whether these responses are (ii) modulated by daily exposure to, i.e. training with, a specific rhythmic pattern, which was either native to the infants’ culture or non-native.
Italian infants aged 6 to 24 months and their parents, who were mainly exposed to music with isochronous simple meters, were presented with songs of both simple (4/4) and complex (7/8) meters and their motor behavior as a response to these songs were analyzed. Subsequently, they were invited to participate in a month-long musical training to either a song of 4/4 or 7/8 meter. They were then asked to return to the same experimental setting and tasked to do the same thing as the first experimental session.
Preliminary analysis of infants’ motor behavior during auditory stimuli exposure suggests individual differences in motor responses, potential changes in correlations between arm and leg movements, and consistent high levels of synchronization.
This thesis will first review existing literature on musicality, music processing, music-cultural perceptual narrowing, and sensorimotor synchronization (Chapter 1), then detail the research methods and materials (Chapter 2). Preliminary results will be presented (Chapter 3), and the theoretical and educational implications of these findings for our understanding of music-motor synchrony and future research directions will be detailed (Chapter 4)
An exploration of the rhythm of Malay
In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing.
The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (∆C, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English.
Further analysis has been carried out in light of Fletcher’s (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvaniti’s (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listeners’ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima.
This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm
- …