787 research outputs found

    Language identification with suprasegmental cues: A study based on speech resynthesis

    Get PDF
    This paper proposes a new experimental paradigm to explore the discriminability of languages, a question which is crucial to the child born in a bilingual environment. This paradigm employs the speech resynthesis technique, enabling the experimenter to preserve or degrade acoustic cues such as phonotactics, syllabic rhythm or intonation from natural utterances. English and Japanese sentences were resynthesized, preserving broad phonotactics, rhythm and intonation (Condition 1), rhythm and intonation (Condition 2), intonation only (Condition 3), or rhythm only (Condition 4). The findings support the notion that syllabic rhythm is a necessary and sufficient cue for French adult subjects to discriminate English from Japanese sentences. The results are consistent with previous research using low-pass filtered speech, as well as with phonological theories predicting rhythmic differences between languages. Thus, the new methodology proposed appears to be well-suited to study language discrimination. Applications for other domains of psycholinguistic research and for automatic language identification are considered

    Gradient Acceptability in Mandarin Nonword Judgment

    Get PDF
    Syllable well-formedness judgment experiments reveal that speakers exhibit gradient judgment on novel words, and the gradience has been attributed to both grammatical factors and lexical statistics (e.g., Coetzee, 2008). This study investigates gradient phonotactics stemming from the violations of four types of grammatical constraints in Mandarin Chinese: 1) principled phonotactic constraints, 2) accidental phonotactic constraints, 3) allophonic restrictions, and 4) segmental-tonal cooccurrence restrictions. A syllable well-formedness judgment experiment was conducted with native Mandarin speakers to examine how the grammatical and lexical statistics factors contribute to the variation in phonotactic acceptability judgment

    Segmental and prosodic improvements to speech generation

    Get PDF

    Sperry Univac speech communications technology

    Get PDF
    Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described

    The recognition of New Zealand English closing diphthongs using time-delay neural networks

    Get PDF
    As a step towards the development of a modular time-delay neural network (TDNN) for recognizing phonemes realized with a New Zealand English accent, this thesis focuses on the development of an expert module for closing diphthong recognition. The performances of traditional and squad-based expert modules are compared speaker-dependently for two New Zealand English speakers (one male and one female). Examples of each kind of expert module are formed from one of three types of TDNN, referred to as basic-token TDNN, extended-token TDNN and sequence-token TDNN. Of the traditional expert modules tested, those comprising extended-token TDNNs are found to afford the best performance compromises and are, therefore, preferable if a traditional expert module is to be used. Comparing the traditional and squad-based expert modules tested, the latter afford significantly better recognition and/or false-positive error performances than the former, irrespective of the type of TDNN used. Consequently, it is concluded that squad-based expert modules are preferable to their traditional counterparts for closing diphthong recognition. Of the squad-based expert modules tested, those comprising sequence-token TDNNs are found to afford consistently better false-positive error performances than those comprising basic- or extended-token TDNNs, while similar recognition performances are afforded by all. Consequently, squad-based expert modules comprising sequence-token TDNNs are recommended as the preferred method of recognizing closing diphthongs realized with a New Zealand accent. This thesis also presents results demonstrating that squad-based expert modules comprising sequence-token TDNN s may be trained to accommodate multiple speakers and in a manner capable of handling both uncorrupted and highly corrupted speech utterances

    Phonetically transparent technique for the automatic transcription of speech

    Get PDF

    Variable Glide Formation in Hexagonal French

    Get PDF
    This thesis examines phonetic and phonological aspects of gliding in Hexagonal French. In particular, we ask: Are glide phenomena as predictable as portrayed in modern descriptions? Do all three glides /j, w, ɥ/ or corresponding high vowels /i, u, y/ behave alike in all potential glide contexts? Given the duality of French glides (vowel and consonant), we use the term vocoid and the archiphoneme convention /I, U, Y/ in our discussion of glide contexts and glide phenomena. Our historical survey shows the glides of French (/j, ɥ, w/) evolve separately and during this period the high front vocoid /I/ occurs early and is involved in greater variety of contexts showing considerable variability. The other two glides emerge later, primarily through diphthongisation, and show less variability. In a study of glide contexts in the spontaneous speech of native speakers from three regions of France (data from the Phonologie du Français Contemporain project), we examine the distribution of all three high vocoids and their surface realisations. For 3415 tokens identified, we determine if HVV (high vocoid plus vowel) tokens are realised with dieresis, with syneresis, or with the high vocoid deleted. Our findings show glide contexts are consistently distributed at a rate of about 85% lexicalised and 15% derived. The limited variability in lexicalised contexts involves mainly the non-round vocoid /I/ realised with dieresis. Distribution across the three-glide inventory of French shows that lexicalised glide contexts follows a general markedness hierarchy: I ⨠ U ⨠ Y. Tokens involving the front non-round vocoid /I/ are most prevalent followed by the back rounded vocoid /U/ and finally the front rounded /Y/. Derived contexts include word medial tautomorphemic high vowel + vowel /HV+V/ sequences resulting from suffixation or inflection, and also cross-word-boundary /HV+V/ sequences which have very rarely been studied before; we show that cross-word-boundary data largely follow the same phonological constraints as derivational data. In each of these contexts the general markedness hierarchy observed above is changed, giving preference to the front rounded /Y/ over the back rounded /U/ while /I/ remains most prevalent

    Investigating the build-up of precedence effect using reflection masking

    Get PDF
    The auditory processing level involved in the build‐up of precedence [Freyman et al., J. Acoust. Soc. Am. 90, 874–884 (1991)] has been investigated here by employing reflection masked threshold (RMT) techniques. Given that RMT techniques are generally assumed to address lower levels of the auditory signal processing, such an approach represents a bottom‐up approach to the buildup of precedence. Three conditioner configurations measuring a possible buildup of reflection suppression were compared to the baseline RMT for four reflection delays ranging from 2.5–15 ms. No buildup of reflection suppression was observed for any of the conditioner configurations. Buildup of template (decrease in RMT for two of the conditioners), on the other hand, was found to be delay dependent. For five of six listeners, with reflection delay=2.5 and 15 ms, RMT decreased relative to the baseline. For 5‐ and 10‐ms delay, no change in threshold was observed. It is concluded that the low‐level auditory processing involved in RMT is not sufficient to realize a buildup of reflection suppression. This confirms suggestions that higher level processing is involved in PE buildup. The observed enhancement of reflection detection (RMT) may contribute to active suppression at higher processing levels

    Speech Intelligibility from Image Processing

    Get PDF
    Hearing loss research has traditionally been based on perceptual criteria, speech intelligibility and threshold levels. The development of computational models of the auditory-periphery has allowed experimentation via simulation to provide quantitative, repeatable results at a more granular level than would be practical with clinical research on human subjects
    corecore