9,098 research outputs found

    The new accent technologies:recognition, measurement and manipulation of accented speech

    Get PDF

    The Role of Speaker Identification in Taiwanese Attitudes Towards Varieties of English

    Get PDF
    No abstract available

    Analyzing Prosody with Legendre Polynomial Coefficients

    Full text link
    This investigation demonstrates the effectiveness of Legendre polynomial coefficients representing prosodic contours within the context of two different tasks: nativeness classification and sarcasm detection. By making use of accurate representations of prosodic contours to answer fundamental linguistic questions, we contribute significantly to the body of research focused on analyzing prosody in linguistics as well as modeling prosody for machine learning tasks. Using Legendre polynomial coefficient representations of prosodic contours, we answer prosodic questions about differences in prosody between native English speakers and non-native English speakers whose first language is Mandarin. We also learn more about prosodic qualities of sarcastic speech. We additionally perform machine learning classification for both tasks, (achieving an accuracy of 72.3% for nativeness classification, and achieving 81.57% for sarcasm detection). We recommend that linguists looking to analyze prosodic contours make use of Legendre polynomial coefficients modeling; the accuracy and quality of the resulting prosodic contour representations makes them highly interpretable for linguistic analysis

    Exploring the influence of suprasegmental features of speech on rater judgements of intelligibility

    Get PDF
    A thesis submitted to the University of Bedfordshire in partial fulfilment of the requirements for the degree of Doctor of PhilosophyThe importance of suprasegmental features of speech to pronunciation proficiency is well known, yet limited research has been undertaken to identify how raters attend to suprasegmental features in the English-language speaking test encounter. Currently, such features appear to be underrepresented in language learning frameworks and are not always satisfactorily incorporated into the analytical rating scales that are used by major language testing organisations. This thesis explores the influence of lexical stress, rhythm and intonation on rater decision making in order to provide insight into their proper place in rating scales and frameworks. Data were collected from 30 raters, half of whom were experienced professional raters and half of whom lacked rater training and a background in language learning or teaching. The raters were initially asked to score 12 test taker performances using a 9-point intelligibility scale. The performances were taken from the long turn of Cambridge English Main Suite exams and were selected on the basis of the inclusion of a range of notable suprasegmental features. Following scoring, the raters took part in a stimulated recall procedure to report the features that influenced their decisions. The resulting scores were quantitatively analysed using many-facet Rasch measurement analysis. Transcriptions of the verbal reports were analysed using qualitative methods. Finally, an integrated analysis of the quantitative and qualitative data was undertaken to develop a series of suprasegmental rating scale descriptors. The results showed that experienced raters do appear to attend to specific suprasegmental features in a reliable way, and that their decisions have a great deal in common with the way non-experienced raters regard such features. This indicates that stress, rhythm, and intonation may be somewhat underrepresented on current speaking proficiency scales and frameworks. The study concludes with the presentation of a series of suprasegmental rating scale descriptors

    Proposing a hybrid approach for emotion classification using audio and video data

    Get PDF
    Emotion recognition has been a research topic in the field of Human-Computer Interaction (HCI) during recent years. Computers have become an inseparable part of human life. Users need human-like interaction to better communicate with computers. Many researchers have become interested in emotion recognition and classification using different sources. A hybrid approach of audio and text has been recently introduced. All such approaches have been done to raise the accuracy and appropriateness of emotion classification. In this study, a hybrid approach of audio and video has been applied for emotion recognition. The innovation of this approach is selecting the characteristics of audio and video and their features as a unique specification for classification. In this research, the SVM method has been used for classifying the data in the SAVEE database. The experimental results show the maximum classification accuracy for audio data is 91.63% while by applying the hybrid approach the accuracy achieved is 99.26%

    An exploration of the rhythm of Malay

    Get PDF
    In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing. The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (∆C, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English. Further analysis has been carried out in light of Fletcher’s (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvaniti’s (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listeners’ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima. This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm
    corecore