17,274 research outputs found

    Language identification with suprasegmental cues: A study based on speech resynthesis

    Get PDF
    This paper proposes a new experimental paradigm to explore the discriminability of languages, a question which is crucial to the child born in a bilingual environment. This paradigm employs the speech resynthesis technique, enabling the experimenter to preserve or degrade acoustic cues such as phonotactics, syllabic rhythm or intonation from natural utterances. English and Japanese sentences were resynthesized, preserving broad phonotactics, rhythm and intonation (Condition 1), rhythm and intonation (Condition 2), intonation only (Condition 3), or rhythm only (Condition 4). The findings support the notion that syllabic rhythm is a necessary and sufficient cue for French adult subjects to discriminate English from Japanese sentences. The results are consistent with previous research using low-pass filtered speech, as well as with phonological theories predicting rhythmic differences between languages. Thus, the new methodology proposed appears to be well-suited to study language discrimination. Applications for other domains of psycholinguistic research and for automatic language identification are considered

    Rhythmic unit extraction and modelling for automatic language identification

    Get PDF
    International audienceThis paper deals with an approach to Automatic Language Identification based on rhythmic modelling. Beside phonetics and phonotactics, rhythm is actually one of the most promising features to be considered for language identification, even if its extraction and modelling are not a straightforward issue. Actually, one of the main problems to address is what to model. In this paper, an algorithm of rhythm extraction is described: using a vowel detection algorithm, rhythmic units related to syllables are segmented. Several parameters are extracted (consonantal and vowel duration, cluster complexity) and modelled with a Gaussian Mixture. Experiments are performed on read speech for 7 languages (English, French, German, Italian, Japanese, Mandarin and Spanish) and results reach up to 86 ± 6% of correct discrimination between stress-timed mora-timed and syllable-timed classes of languages, and to 67 ± 8% percent of correct language identification on average for the 7 languages with utterances of 21 seconds. These results are commented and compared with those obtained with a standard acoustic Gaussian mixture modelling approach (88 ± 5% of correct identification for the 7-languages identification task)

    Using the beat histogram for speech rhythm description and language identification

    Get PDF
    In this paper we present a novel approach for the description of speech rhythm and the extraction of rhythm-related features for automatic language identification (LID). Previous methods have extracted speech rhythm through the calculation of features based on salient elements of speech such as consonants, vowels and syllables. We present how an automatic rhythm extraction method borrowed from music information retrieval, the beat histogram, can be adapted for the analysis of speech rhythm by defining the most relevant novelty functions in the speech signal and extracting features describing their periodicities. We have evaluated those features in a rhythm-based LID task for two multilingual speech corpora using support vector machines, including feature selection methods to identify the most informative descriptors. Results suggest that the method is successful in describing speech rhythm and provides LID classification accuracy comparable to or better than that of other approaches, without the need for a preceding segmentation or annotation of the speech signal. Concerning rhythm typology, the rhythm class hypothesis in its original form seems to be only partly confirmed by our results

    Rhythm and Vowel Quality in Accents of English

    Get PDF
    In a sample of 27 speakers of Scottish Standard English two notoriously variable consonantal features are investigated: the contrast of /m/ and /w/ and non-prevocalic /r/, the latter both in terms of its presence or absence and the phonetic form it takes, if present. The pattern of realisation of non-prevocalic /r/ largely confirms previously reported findings. But there are a number of surprising results regarding the merger of /m/ and /w/ and the loss of non-prevocalic /r/: While the former is more likely to happen in younger speakers and females, the latter seems more likely in older speakers and males. This is suggestive of change in progress leading to a loss of the /m/ - /w/ contrast, while the variation found in non-prevocalic /r/ follows an almost inverse sociolinguistic pattern that does not suggest any such change and is additionally largely explicable in language-internal terms. One phenomenon requiring further investigation is the curious effect direct contact with Southern English accents seems to have on non-prevocalic /r/: innovation on the structural level (i.e. loss) and conservatism on the realisational level (i.e. increased incidence of [r] and [r]) appear to be conditioned by the same sociolinguistic factors

    Speech and music discrimination: Human detection of differences between music and speech based on rhythm

    Get PDF
    Rhythm in speech and singing forms one of its basic acoustic components. Therefore, it is interesting to investigate the capability of subjects to distinguish between speech and singing when only the rhythm remains as an acoustic cue. For this study we developed a method to eliminate all linguistic components but rhythm from the speech and singing signals. The study was conducted online and participants could listen to the stimuli via loudspeakers or headphones. The analysis of the survey shows that people are able to significantly discriminate between speech and singing after they have been altered. Furthermore, our results reveal specific features, which supported participants in their decision, such as differences in regularity and tempo between singing and speech samples. The hypothesis that music trained people perform more successfully on the task was not proved. The results of the study are important for the understanding of the structure of and differences between speech and singing, for the use in further studies and for future application in the field of speech recognition

    Rhythm Class Perception by Expert Phoneticians

    Get PDF
    This paper contributes to the recent debate in linguistic-phonetic rhythm research dominated by the idea of a perceptual dichotomy involving “syllable-timed” and “stress-timed” rhythm classes. Some previous studies have shown that it is difficult both to find reliable acoustic correlates of these classes and also to obtain reliable perceptual data for their support. In an experiment, we asked 12 British English phoneticians to classify the rhythm class of 36 samples spoken by 24 talkers in six dialects of British English. Expert listeners’ perception was shown to be guided by two factors: (1) the assumed rhythm class affiliation of a particular dialect and (2) one acoustic cue related to the prosodic hierarchy, namely the degree of accentual lengthening. We argue that the rhythm class hypothesis has reached its limits in informing empirical enquiry into linguistic rhythm, and new research avenues are needed to understand this multi-layered phenomenon

    Acoustic correlates of linguistic rhythm: Perspectives

    Get PDF
    The empirical grounding of a typology of languages' rhythm is again a hot issue. The currently popular approach is based on the durations of vocalic and intervocalic intervals and their variability. Despite some successes, many questions remain. The main findings still need to be generalised to much larger corpora including many more languages. But a straightforward continuation of the current work faces many difficulties. Perspectives are outlined for future work, including proposals for the cross-linguistic control of speech rate, improvements on the statistical analyses, and prospects raised by automatic speech processing

    Speech rhythm: a metaphor?

    Get PDF
    Is speech rhythmic? In the absence of evidence for a traditional view that languages strive to coordinate either syllables or stress-feet with regular time intervals, we consider the alternative that languages exhibit contrastive rhythm subsisting merely in the alternation of stronger and weaker elements. This is initially plausible, particularly for languages with a steep ‘prominence gradient’, i.e. a large disparity between stronger and weaker elements; but we point out that alternation is poorly achieved even by a ‘stress-timed’ language such as English, and, historically, languages have conspicuously failed to adopt simple phonological remedies that would ensure alternation. Languages seem more concerned to allow ‘syntagmatic contrast’ between successive units and to use durational effects to support linguistic functions than to facilitate rhythm. Furthermore, some languages (e.g. Tamil, Korean) lack the lexical prominence which would most straightforwardly underpin prominence alternation. We conclude that speech is not incontestibly rhythmic, and may even be antirhythmic. However, its linguistic structure and patterning allow the metaphorical extension of rhythm in varying degrees and in different ways depending on the language, and that it is this analogical process which allows speech to be matched to external rhythms

    Some acoustic and articulatory correlates of phrasal stress in Spanish

    Full text link
    All spoken languages show rhythmic patterns. Recent work with a number of different languages (English, Japanese, Mandarin Chinese, and French) suggests that metrically (hierarchically) assigned stress levels of the utterance show strong correlations with the amount of jaw displacement, and corresponding F1 values. This paper examines some articulatory and acoustic correlates of Spanish rhythm; specifically, we ask if there is a correlation between phrasal stress values metrically assigned to each syllable and acoustic/articulatory values. We used video recordings of three Salvadoran Spanish speakers to measure maximum jaw displacement, mean F0, mean intensity, mean duration, and mid-vowel F1 for each vowel in two Spanish sentences. The results show strong correlations between stress and duration, and between stress and F1, but weak correlations between stress and both mean vowel intensity and maximum jaw displacement. We also found weak correlations between jaw displacement and both mean vowel intensity and F1
    • 

    corecore