32,388 research outputs found

    Teaching Pronunciation from the Top Down

    Get PDF
    In this paper, a theoretical and pedagogical foundation for research efforts is provided. Pronunciation is examined from a contextual, "top-down" perspective from which segmental articulation assumes less importance than more general properties of speech such as rhythm and voice quality. Pronunciation is described as conveying many different types of messages to a hearer related to the information structure of a discourse, the speaker's attitude and mood, and other social and psychological features of the speaker or of the relationship between the speaker and hearer. Moreover, various aspects of pronunciation are shown to relate to specific gestures. The aim is to present a more descriptively enlightening and pedagogically useful characterization of second language phonology than traditional treatments, in which phonology was identified with discrete articulations and in which suprasegmental features were relegated to the periphery of language per se, i.e., to the paralinguistic and in some cases the extralinguistic domains of communication. Suggestions for teaching pronunciation are set in a context of research and theory, and a focus on the non-segmental characteristics of speech is advocated. This discussion makes reference to the use of video and computer media in pronunciation training (see Pennington forthcoming for further discussion), as well as to the use of more traditional types of audiovisual aids. The paper concludes with a set of research questions on pronunciation instruction derived from this investigation

    Characterizing intonation deficit in motor speech disorders : an autosegmental-metrical analysis of spontaneous speech in hypokinetic dysarthria, ataxic dysarthria and foreign accent syndrome

    Get PDF
    The autosegmental-metrical (AM) framework represents an established methodology for intonational analysis in unimpaired speaker populations but has found little application in describing intonation in motor speech disorders (MSDs). This study compared the intonation patterns of unimpaired participants (CON) and those with Parkinson's disease (PD), ataxic dysarthria (AT), and foreign accent syndrome (FAS) to evaluate the approach's potential for distinguishing types of MSDs from each other and from unimpaired speech. Spontaneous speech from 8 PD, 8 AT, 4 FAS, and 10 CON speakers were analyzed in relation to inventory and prevalence of pitch patterns, accentuation, and phrasing. Acoustic-phonetic baseline measures (maximum-phonation-duration, speech rate, and F0-variability) were also performed. Results: The analyses yielded differences between MSD and CON groups and between the clinical groups in regard to prevalence, accentuation, and phrasing. AT and FAS speakers used more rising and high pitch accents than PD and CON speakers. The AT group used the highest number of pitch accents per phrase, and all 3 MSD groups produced significantly shorter phrases than the CON group. The study succeeded in differentiating MSDs on the basis of intonational performances by using the AM approach, thus, demonstrating its potential for charting intonational profiles in clinical populations

    Investigation of Frame Alignments for GMM-based Digit-prompted Speaker Verification

    Full text link
    Frame alignments can be computed by different methods in GMM-based speaker verification. By incorporating a phonetic Gaussian mixture model (PGMM), we are able to compare the performance using alignments extracted from the deep neural networks (DNN) and the conventional hidden Markov model (HMM) in digit-prompted speaker verification. Based on the different characteristics of these two alignments, we present a novel content verification method to improve the system security without much computational overhead. Our experiments on the RSR2015 Part-3 digit-prompted task show that, the DNN based alignment performs on par with the HMM alignment. The results also demonstrate the effectiveness of the proposed Kullback-Leibler (KL) divergence based scoring to reject speech with incorrect pass-phrases.Comment: accepted by APSIPA ASC 201

    Finding the Most Uniform Changes in Vowel Polygon Caused by Psychological Stress

    Get PDF
    Using vowel polygons, exactly their parameters, is chosen as the criterion for achievement of differences between normal state of speaker and relevant speech under real psychological stress. All results were experimentally obtained by created software for vowel polygon analysis applied on ExamStress database. Selected 6 methods based on cross-correlation of different features were classified by the coefficient of variation and for each individual vowel polygon, the efficiency coefficient marking the most significant and uniform differences between stressed and normal speech were calculated. As the best method for observing generated differences resulted method considered mean of cross correlation values received for difference area value with vector length and angle parameter couples. Generally, best results for stress detection are achieved by vowel triangles created by /i/-/o/-/u/ and /a/-/i/-/o/ vowel triangles in formant planes containing the fifth formant F5 combined with other formants

    Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification

    Get PDF
    There are a number of studies about extraction of bottleneck (BN) features from deep neural networks (DNNs)trained to discriminate speakers, pass-phrases and triphone states for improving the performance of text-dependent speaker verification (TD-SV). However, a moderate success has been achieved. A recent study [1] presented a time contrastive learning (TCL) concept to explore the non-stationarity of brain signals for classification of brain states. Speech signals have similar non-stationarity property, and TCL further has the advantage of having no need for labeled data. We therefore present a TCL based BN feature extraction method. The method uniformly partitions each speech utterance in a training dataset into a predefined number of multi-frame segments. Each segment in an utterance corresponds to one class, and class labels are shared across utterances. DNNs are then trained to discriminate all speech frames among the classes to exploit the temporal structure of speech. In addition, we propose a segment-based unsupervised clustering algorithm to re-assign class labels to the segments. TD-SV experiments were conducted on the RedDots challenge database. The TCL-DNNs were trained using speech data of fixed pass-phrases that were excluded from the TD-SV evaluation set, so the learned features can be considered phrase-independent. We compare the performance of the proposed TCL bottleneck (BN) feature with those of short-time cepstral features and BN features extracted from DNNs discriminating speakers, pass-phrases, speaker+pass-phrase, as well as monophones whose labels and boundaries are generated by three different automatic speech recognition (ASR) systems. Experimental results show that the proposed TCL-BN outperforms cepstral features and speaker+pass-phrase discriminant BN features, and its performance is on par with those of ASR derived BN features. Moreover,....Comment: Copyright (c) 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work

    Determination: a universal dimension for inter-language comparison : (preliminary version)

    Get PDF
    The basic idea I want to develop and to substantiate in this paper consists in replacing – where necessary – the traditional concept of linguistic category or linguistic relation understood as 'things', as reified hypostases, by the more dynamic concept of dimension. A dimension of language structure is not coterminous with one single category or relation but, instead, accommodates several of them. It corresponds to certain well circumscribed purposive functions of linguistic activity as well as to certain definite principles and techniques for satisfying these functions. The true universals of language are represented by these dimensions, principles, and techniques which constitute the true basis for non-historical inter-language comparison. The categories and relations used in grammar are condensations – hypostases as it were – of such dimensions, principles, and techniques. Elsewhere I have outlined the theory which I want to test here in a case study

    A Linguistic Specification of Aesthetic Judgments

    Get PDF
    This paper aims to delineate the class of aesthetic judgments linguistically. The main idea is that aesthetic judgments can be specified by a certain set of assertibility conditions, i.e., by norms that govern appropriate speech-acts. This idea is spelled out in detail and defended against various objections. The suggestion leads to an interesting account of aesthetic judgments that is theoretically fruitful: It provides the basis for a non-circular and satisfying characterization of the whole domain of aesthetic research and it marks an important linguistic difference between aesthetic judgments and judgments of personal taste
    corecore