2,021 research outputs found

    Phoneme Recognition Using Acoustic Events

    Get PDF
    This paper presents a new approach to phoneme recognition using nonsequential sub--phoneme units. These units are called acoustic events and are phonologically meaningful as well as recognizable from speech signals. Acoustic events form a phonologically incomplete representation as compared to distinctive features. This problem may partly be overcome by incorporating phonological constraints. Currently, 24 binary events describing manner and place of articulation, vowel quality and voicing are used to recognize all German phonemes. Phoneme recognition in this paradigm consists of two steps: After the acoustic events have been determined from the speech signal, a phonological parser is used to generate syllable and phoneme hypotheses from the event lattice. Results obtained on a speaker--dependent corpus are presented.Comment: 4 pages, to appear at ICSLP'94, PostScript version (compressed and uuencoded

    Pauses and the temporal structure of speech

    Get PDF
    Natural-sounding speech synthesis requires close control over the temporal structure of the speech flow. This includes a full predictive scheme for the durational structure and in particuliar the prolongation of final syllables of lexemes as well as for the pausal structure in the utterance. In this chapter, a description of the temporal structure and the summary of the numerous factors that modify it are presented. In the second part, predictive schemes for the temporal structure of speech ("performance structures") are introduced, and their potential for characterising the overall prosodic structure of speech is demonstrated

    The role of gesture delay in coda /r/ weakening: an articulatory, auditory and acoustic study

    Get PDF
    The cross-linguistic tendency of coda consonants to weaken, vocalize, or be deleted is shown to have a phonetic basis, resulting from gesture reduction, or variation in gesture timing. This study investigates the effects of the timing of the anterior tongue gesture for coda /r/ on acoustics and perceived strength of rhoticity, making use of two sociolects of Central Scotland (working- and middle-class) where coda /r/ is weakening and strengthening, respectively. Previous articulatory analysis revealed a strong tendency for these sociolects to use different coda /r/ tongue configurations—working- and middle-class speakers tend to use tip/front raised and bunched variants, respectively; however, this finding does not explain working-class /r/ weakening. A correlational analysis in the current study showed a robust relationship between anterior lingual gesture timing, F3, and percept of rhoticity. A linear mixed effects regression analysis showed that both speaker social class and linguistic factors (word structure and the checked/unchecked status of the prerhotic vowel) had significant effects on tongue gesture timing and formant values. This study provides further evidence that gesture delay can be a phonetic mechanism for coda rhotic weakening and apparent loss, but social class emerges as the dominant factor driving lingual gesture timing variation

    On segments and syllables in the sound structure of language: Curve-based approaches to phonology and the auditory representation of speech.

    Get PDF
    http://msh.revues.org/document7813.htmlInternational audienceRecent approaches to the syllable reintroduce continuous and mathematical descriptions of sound objects designed as ''curves''. Psycholinguistic research on oral language perception usually refer to symbolic and highly hierarchized approaches to the syllable which strongly differenciate segments (phones) and syllables. Recent work on the auditory bases of speech perception evidence the ability of listeners to extract phonetic information when strong degradations of the speech signal have been produced in the spectro-temporal domain. Implications of these observations for the modelling of syllables in the fields of speech perception and phonology are discussed.Les approches récentes de la syllabe réintroduisent une description continue et descriptible mathématiquement des objets sonores: les courbes. Les recherches psycholinguistiques sur la perception du langage parlé ont plutôt recours à des descriptions symboliques et hautement hiérarchisées de la syllabe dans le cadre desquelles segments (phones) et syllabes sont strictement différenciés. Des travaux récents sur les fondements auditifs de la perception de la parole mettent en évidence la capacité qu'ont les locuteurs à extraire une information phonétique alors même que des dégradations majeures du signal sont effectuées dans le domaine spectro-temporel. Les implications de ces observations pour la conception de la syllabe dans le champ de la perception de la parole et en phonologie sont discutées

    Word recognition from tiered phonological models

    Get PDF
    Phonologically constrained morphological analysis (PCMA) is the decomposition of words into their component morphemes conditioned by both orthography and pronunciation. This article describes PCMA and its application in large-vocabulary continuous speech recognition to enhance recognition performance in some tasks. Our experiments, based on the British National Corpus and the LOB Corpus for training data and WSJCAM0 for test data, show clearly that PCMA leads to smaller lexicon size, smaller language models, superior word lattices and a decrease in word error rates. PCMA seems to show most benefit in open-vocabulary tasks, where the productivity of a morph unit lexicon makes a substantial reduction in out-ofvocabulary rates

    Modeling the development of pronunciation in infant speech acquisition.

    Get PDF
    Pronunciation is an important part of speech acquisition, but little attention has been given to the mechanism or mechanisms by which it develops. Speech sound qualities, for example, have just been assumed to develop by simple imitation. In most accounts this is then assumed to be by acoustic matching, with the infant comparing his output to that of his caregiver. There are theoretical and empirical problems with both of these assumptions, and we present a computational model- Elija-that does not learn to pronounce speech sounds this way. Elija starts by exploring the sound making capabilities of his vocal apparatus. Then he uses the natural responses he gets from a caregiver to learn equivalence relations between his vocal actions and his caregiver's speech. We show that Elija progresses from a babbling stage to learning the names of objects. This demonstrates the viability of a non-imitative mechanism in learning to pronounce

    The Self-Organization of Speech Sounds

    Get PDF
    The speech code is a vehicle of language: it defines a set of forms used by a community to carry information. Such a code is necessary to support the linguistic interactions that allow humans to communicate. How then may a speech code be formed prior to the existence of linguistic interactions? Moreover, the human speech code is discrete and compositional, shared by all the individuals of a community but different across communities, and phoneme inventories are characterized by statistical regularities. How can a speech code with these properties form? We try to approach these questions in the paper, using the ``methodology of the artificial''. We build a society of artificial agents, and detail a mechanism that shows the formation of a discrete speech code without pre-supposing the existence of linguistic capacities or of coordinated interactions. The mechanism is based on a low-level model of sensory-motor interactions. We show that the integration of certain very simple and non language-specific neural devices leads to the formation of a speech code that has properties similar to the human speech code. This result relies on the self-organizing properties of a generic coupling between perception and production within agents, and on the interactions between agents. The artificial system helps us to develop better intuitions on how speech might have appeared, by showing how self-organization might have helped natural selection to find speech

    Creating the cognitive form of phonological units: The speech sound correspondence problem in infancy could be solved by mirrored vocal interactions rather than by imitation

    Get PDF
    Theories about the cognitive nature of phonological units have been constrained by the assumption that young children solve the correspondence problem for speech sounds by imitation, whether by an auditory- or gesture-based matching to target process. Imitation on the part of the child implies that he makes a comparison within one of these domains, which is presumed to be the modality of the underlying representation of speech sounds. However, there is no evidence that the correspondence problem is solved in this way. Instead we argue that the child can solve it through the mirroring behaviour of his caregivers within imitative interactions and that this mechanism is more consistent with the developmental data. The underlying representation formed by mirroring is intrinsically perceptuo-motor. It is created by the association of a vocal action performed by the child and the reformulation of this into an L1 speech token that he hears in return. Our account of how production and perception develop incorporating this mechanism explains some longstanding problems in speech and reconciles data from psychology and neuroscience

    The left inferior frontal gyrus under focus: an fMRI study of the production of deixis via syntactic extraction and prosodic focus: An fMRI study of the production of deixis

    Get PDF
    International audienceThe left inferior frontal gyrus (LIFG, BA 44, 45, 47) has been associated with linguistic processing (from sentence- to syllable- parsing) as well as action analysis. We hypothesize that the function of the LIFG may be the monitoring of action, a function well adapted to agent deixis (verbal pointing at the agent of an action). The aim of this fMRI study was therefore to test the hypothesis that the LIFG is involved during the production of agent deixis. We performed an experiment whereby three kinds of deictic sentences were pronounced, involving prosodic focus, syntactic extraction and prosodic focus with syntactic extraction. A common pattern of activation was found for the three deixis conditions in the LIFG (BA 45 and/or 47), the left insula and the bilateral premotor (BA 6) cortex. Prosodic deixis additionally activated the left anterior cingulate gyrus (BA 24, 32), the left supramarginal gyrus (LSMG, BA 40) and Wernicke's area (BA 22). Our results suggest that the LIFG is involved during agent deixis, through either prosody or syntax, and that the LSMG and Wernicke's area are additionally required in prosody-driven deixis. Once grammaticalized, deixis would be handled solely by the LIFG, without the LSMG and Wernicke's area
    corecore