412 research outputs found

    Assessing the Prosody of Non-Native Speakers of English: Measures and Feature Sets

    Get PDF
    In this paper, we describe a new database with audio recordings of non-native (L2) speakers of English, and the perceptual evaluation experiment conducted with native English speakers for assessing the prosody of each recording. These annotations are then used to compute the gold standard using different methods, and a series of regression experiments is conducted to evaluate their impact on the performance of a regression model predicting the degree of Abstract naturalness of L2 speech. Further, we compare the relevance of different feature groups modelling prosody in general (without speech tempo), speech rate and pauses modelling speech tempo (fluency), voice quality, and a variety of spectral features. We also discuss the impact of various fusion strategies on performance.Overall, our results demonstrate that the prosody of non-native speakers of English as L2 can be reliably assessed using supra- segmental audio features; prosodic features seem to be the most important ones

    Phonetic and prosodic analysis of speech

    Get PDF
    In order to cope with the problems of spontaneous speech (including, for example, hesitations and non-words) it is necessary to extract from the speech signal all information it contains. Modeling of words by segmental units should be supported by suprasegmental units since valuable information is represented in the prosody of an utterance. We present an approach to flexible and efficient modeling of speech by segmental units and describe extraction and use of suprasegmental information

    1 The Prosodic Marking of Phrase Boundaries: Expectations and Results

    Get PDF
    ABSTRACT Using sentence templates and a stochastic context-free grammar a large corpus (10,000 sentences) has been created, where prosodic phrase boundaries are labeled in the sentences automatically during sentence generation. With perception experiments on a subset of 500 utterances we verified that 92 % of the automatically marked boundaries were perceived as prosodically marked. In initial automatic classification experiments for three levels of boundaries recognition rates up to 81 % could be achieved. 1.1 Introduction and Material A successful automatic detection of phrase boundaries can be of great help for parsing a word hypotheses graph in an automatic speech understanding (ASU) system. Our recognition paradigm lies within the statistical approach; we therefore need a large training database, i.e. a corpus with reference labels for prosodically marked phrase boundaries. In this paper we wil

    Automatic classification of prosodically marked phrase boundaries in German

    Get PDF
    A large corpus has been created automatically and read by speakers. Phrase boundaries were labeled in the sentences automatically during sentence generation. Perception experiments on a subset of 500 utterances showed a high agreement between the automatically generated boundary markers and the ones perceived by listeners. Gaussian distribution and polynomial classifiers were trained on a set of prosodic features computed from the speech signal using the automatically generated boundary markers. Comparing the classification results with the judgments of the listeners yielded in a recognition rate of 87%. A combination with stochastic language models improved the recognition rate to 90%. We found that the pause and the durational features are most important for the classification, but that the influence of F0 is not neglectable

    Pitch determination considering laryngealization effects in spoken dialogs

    Get PDF
    A frequent phenomenon in spoken dialogs of the information seeking type are short elliptic utterances whose mood (declarative or interrogative) can only be distinguished by intonation. The main acoustic evidence is conveyed by the fundamental frequency or Fo-contour. Many algorithms for Fo determination have been reported in the literature. A common problem are irregularities of speech known as "laryngealizations". This article describes an approach based on neural network techniques for the improved determination of fundamental frequency. First, an improved version of our neural network algorithm for reconstruction of the voice source signal (glottis signal) is presented. Second, the reconstructed voice source signal is used as input to another neural network distinguishing the three classes "voiceless", "voiced non-laryngealized", and "voiced laryngealized". Third, the results are used to improve an existing Fo algorithm. Results of this approach are presented and discussed in the context of the application in a spoken dialog system

    Going back to the source : inverse filtering of the speech signal with ANNs

    Get PDF
    In this paper we present a new method transforming speech signals to voice source signals (VSS) using artificial neural networks (ANN). We will point out that the ANN mapping of speech signals into source signals is quite accurate, and most of the irregularities in the speech signal will lead to an irregularity in the source signal, produced by the ANN (ANN-VSS). We will show that the mapping of the ANN is robust with respect to untrained speakers, different recording conditions and facilities, and different vocabularies. We will also present preliminary results which show that from the ANN source signal pitch periods can be determined accurately

    "Roger", "Sorry", "I'm still listening" : dialog guiding signals in information retrieval dialogs

    Get PDF
    During any kind of information retrieval dialog, the repetition of parts of information just given by the dialog partner can often be observed. As these repetitions are usually elliptic, the intonation is very important for determining the speakers intention. In this paper prototypically the times of day repeated by the customer in train table inquiry dialogs are investigated. A scheme is developed for the officers reactions depending on the intonation of these repetitions; it has been integrated into our speech understanding and dialog system EVAR (cf. [6]). Gaussian classifiers were trained for distinguishing the dialog guiding signals confirmation, question and feedback; recognition rates of up to 87.5% were obtained

    Characterisation of voice quality of Parkinson’s disease using differential phonological posterior features

    Get PDF
    Change in voice quality (VQ) is one of the first precursors of Parkinson’s disease (PD). Specifically, impacted phonation and articulation causes the patient to have a breathy, husky-semiwhisper and hoarse voice. A goal of this paper is to characterise a VQ spectrum – the composition of non-modal phonations – of voice in PD. The paper relates non-modal healthy phonations: breathy, creaky, tense, falsetto and harsh, with disordered phonation in PD. First, statistics are learned to differentiate the modal and non-modal phonations. Statistics are computed using phonological posteriors, the probabilities of phonological features inferred from the speech signal using a deep learning approach. Second, statistics of disordered speech are learned from PD speech data comprising 50 patients and 50 healthy controls. Third, Euclidean distance is used to calculate similarity of non-modal and disordered statistics, and the inverse of the distances is used to obtain the composition of non-modal phonation in PD. Thus, pathological voice quality is characterised using healthy non-modal voice quality “base/eigenspace”. The obtained results are interpreted as the voice of an average patient with PD and can be characterised by the voice quality spectrum composed of 30% breathy voice, 23% creaky voice, 20% tense voice, 15% falsetto voice and 12% harsh voice. In addition, the proposed features were applied for prediction of the dysarthria level according to the Frenchay assessment score related to the larynx, and significant improvement is obtained for reading speech task. The proposed characterisation of VQ might also be applied to other kinds of pathological speech
    corecore