92 research outputs found

    Simulating vocal learning of spoken language: Beyond imitation

    Get PDF
    Computational approaches have an important role to play in understanding the complex process of speech acquisition, in general, and have recently been popular in studies of vocal learning in particular. In this article we suggest that two significant problems associated with imitative vocal learning of spoken language, the speaker normalisation and phonological correspondence problems, can be addressed by linguistically grounded auditory perception. In particular, we show how the articulation of consonant-vowel syllables may be learnt from auditory percepts that can represent either individual utterances by speakers with different vocal tract characteristics or ideal phonetic realisations. The result is an optimisation-based implementation of vocal exploration – incorporating semantic, auditory, and articulatory signals – that can serve as a basis for simulating vocal learning beyond imitation

    COSMO (“Communicating about Objects using Sensory–Motor Operations”): A Bayesian modeling framework for studying speech communication and the emergence of phonological systems

    Get PDF
    International audienceWhile the origin of language remains a somewhat mysterious process, understanding how human language takes specific forms appears to be accessible by the experimental method. Languages, despite their wide variety, display obvious regularities. In this paper, we attempt to derive some properties of phonological systems (the sound systems for human languages) from speech communication principles. We introduce a model of the cognitive architecture of a communicating agent, called COSMO (for “Communicating about Objects using Sensory–Motor Operations') that allows a probabilistic expression of the main theoretical trends found in the speech production and perception literature. This enables a computational comparison of these theoretical trends, which helps us to identify the conditions that favor the emergence of linguistic codes. We present realistic simulations of phonological system emergence showing that COSMO is able to predict the main regularities in vowel, stop consonant and syllable systems in human languages

    The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE)

    Get PDF

    Variability in Singing and in Song in the Zebra Finch

    Get PDF
    Variability is a defining feature of the oscine song learning process, reflected in song and in the neural pathways involved in song learning. For the zebra finch, juveniles learning to sing typically exhibit a high degree of vocal variability, and this variability appears to be driven by a key brain nucleus. It has been suggested that this variability is a necessary part of a trial-­â€and-­â€error learning process in which the bird must search for possible improvements to its song. Our work examines the role this variability plays in learning in two ways: through behavioral experiments with juvenile zebra finches, and through a computational model of parts of the oscine brain. Previous studies have shown that some finches exhibit less variability during the learning process than others by producing repetitive vocalizations. A constantly changing song model was played to juvenile zebra finches to determine whether auditory stimuli can affect this behavior. This stimulus was shown to cause an overall increase in repetitiveness; furthermore, there was a correlation between repetitiveness at an early stage in the learning process and the length of time a bird is repetitive overall, and birds that were repetitive tended to repeat the same thing over an extended period of time. The role of a key brain nucleus involved in song learning was examined through computational modeling. Previous studies have shown that this nucleus produces variability in song, but can also bias the song of a bird in such a way as to reduce errors while singing. Activity within this nucleus during singing is predominantly uncorrelated with the timing of the song, however a portion of this activity is correlated in such a manner. The modeling experiments consider the possibility that this persistent signal is part of a trial-­â€and-­â€error search and contrast this with the possibility that the persistent signal is the product of some mechanism to directly improve song. Simulation results show that a mixture of timing-­â€dependent and timing-­â€independent activity in this nucleus produces optimal learning results for the case where the persistent signal is a key component of a trial-­â€and-­â€error search, but not in the case where this signal will directly improve song. Although a mixture of timing-­â€locked and timing-­â€independent activity produces optimal results, the ratio found to be optimal within the model differs from what has been observed in vivo. Finally, novel methods for the analysis of birdsong, motivated by the high variability of juvenile song, are presented. These methods are designed to work with sets of song samples rather than through pairwise comparison. The utility of these methods is demonstrated, as well as results illustrating how such methods can be used as the basis for aggregate measures of song such as repertoire complexity

    Spatial and temporal lingual coarticulation and motor control in preadolescents

    Get PDF
    Purpose: Coarticulation and lingual kinematics were compared in preadolescents and adults, in order to establish whether preadolescents had a greater degree of random variability in tongue posture and whether their patterns of lingual coarticulation differed from those of adults. Method: High-speed ultrasound tongue contour data synchronised with the acoustic signal were recorded from 15 children aged between 10 and 12 years old, and 15 adults. Tongue shape contours were analysed at nine normalised time-points during the fricative phase of schwa-fricative-/a/ and schwa-fricative-/i/ sequences with the consonants /s/ and /ʃ/. Results: There was no significant age-related difference in random variability. Where a significant vowel effect occurred, the amount of coarticulation was similar in the two groups. However, the onset of the coarticulatory effect on preadolescent /ʃ/ was significantly later than on preadolescent /s/, and also later than on adult /s/ and /ʃ/. Conclusions: Preadolescents have adult-like precision of tongue control and adult-like anticipatory lingual coarticulation with respect to spatial characteristics of tongue posture. However, there remains some immaturity in the motor programming of certain complex tongue movements.casl57pub3410pu

    'What does the cow say?' An exploratory analysis of onomatopoeia in early phonological development

    Get PDF
    This thesis presents an in-depth analysis of infants’ acquisition of onomatopoeia – an area of phonological development that until now has been largely overlooked. Infants produce many onomatopoeia in their earliest words, which are often disregarded in phonological analyses owing to their marginal status in adult languages. It is often suggested that onomatopoeia may be easier for infants to learn because of the iconicity that is present in these forms; this corresponds to Imai and Kita’s (2014) ‘sound symbolism bootstrapping hypothesis’, as well as Werner and Kaplan’s theoretical work Symbol Formation (1963). However, neither of these accounts considers the role of phonological development in infants’ acquisition of onomatopoeia. This thesis presents a series of six studies with a range of perspectives on our central research question: is there a role for onomatopoeia in phonological development? Two analyses of longitudinal diary data address the nature of onomatopoeia in early production, while two eye-tracking studies consider the nature of iconicity in onomatopoeia and whether or not this has a perceptual advantage in early development. The role of the caregiver is then considered, with a prosodic analysis of onomatopoeia in infant-directed speech and a longitudinal perspective of the role of onomatopoeia in infant-caregiver interactions. The contributions from thesis are threefold. First, we offer empirical evidence towards an understanding of how onomatopoeia fit within an infant’s wider phonological development, by showing how onomatopoeia facilitate early perception, production and interactions. Second, our results illustrate how these forms are an important aspect of phonological development and should not be overlooked in infant language research, as has often been the case in the development literature. Finally, these findings expand the iconicity research by showing that onomatopoeia do not present an iconic advantage in language learning, as has so often been assumed by theorists in the field

    Directional adposition use in English, Swedish and Finnish

    Get PDF
    Directional adpositions such as to the left of describe where a Figure is in relation to a Ground. English and Swedish directional adpositions refer to the location of a Figure in relation to a Ground, whether both are static or in motion. In contrast, the Finnish directional adpositions edellä (in front of) and jäljessä (behind) solely describe the location of a moving Figure in relation to a moving Ground (Nikanne, 2003). When using directional adpositions, a frame of reference must be assumed for interpreting the meaning of directional adpositions. For example, the meaning of to the left of in English can be based on a relative (speaker or listener based) reference frame or an intrinsic (object based) reference frame (Levinson, 1996). When a Figure and a Ground are both in motion, it is possible for a Figure to be described as being behind or in front of the Ground, even if neither have intrinsic features. As shown by Walker (in preparation), there are good reasons to assume that in the latter case a motion based reference frame is involved. This means that if Finnish speakers would use edellä (in front of) and jäljessä (behind) more frequently in situations where both the Figure and Ground are in motion, a difference in reference frame use between Finnish on one hand and English and Swedish on the other could be expected. We asked native English, Swedish and Finnish speakers’ to select adpositions from a language specific list to describe the location of a Figure relative to a Ground when both were shown to be moving on a computer screen. We were interested in any differences between Finnish, English and Swedish speakers. All languages showed a predominant use of directional spatial adpositions referring to the lexical concepts TO THE LEFT OF, TO THE RIGHT OF, ABOVE and BELOW. There were no differences between the languages in directional adpositions use or reference frame use, including reference frame use based on motion. We conclude that despite differences in the grammars of the languages involved, and potential differences in reference frame system use, the three languages investigated encode Figure location in relation to Ground location in a similar way when both are in motion. Levinson, S. C. (1996). Frames of reference and Molyneux’s question: Crosslingiuistic evidence. In P. Bloom, M.A. Peterson, L. Nadel & M.F. Garrett (Eds.) Language and Space (pp.109-170). Massachusetts: MIT Press. Nikanne, U. (2003). How Finnish postpositions see the axis system. In E. van der Zee & J. Slack (Eds.), Representing direction in language and space. Oxford, UK: Oxford University Press. Walker, C. (in preparation). Motion encoding in language, the use of spatial locatives in a motion context. Unpublished doctoral dissertation, University of Lincoln, Lincoln. United Kingdo

    Artificial cognitive architecture with self-learning and self-optimization capabilities. Case studies in micromachining processes

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de lectura : 22-09-201

    Towards an Integrative Information Society: Studies on Individuality in Speech and Sign

    Get PDF
    The flow of information within modern information society has increased rapidly over the last decade. The major part of this information flow relies on the individual’s abilities to handle text or speech input. For the majority of us it presents no problems, but there are some individuals who would benefit from other means of conveying information, e.g. signed information flow. During the last decades the new results from various disciplines have all suggested towards the common background and processing for sign and speech and this was one of the key issues that I wanted to investigate further in this thesis. The basis of this thesis is firmly within speech research and that is why I wanted to design analogous test batteries for widely used speech perception tests for signers – to find out whether the results for signers would be the same as in speakers’ perception tests. One of the key findings within biology – and more precisely its effects on speech and communication research – is the mirror neuron system. That finding has enabled us to form new theories about evolution of communication, and it all seems to converge on the hypothesis that all communication has a common core within humans. In this thesis speech and sign are discussed as equal and analogical counterparts of communication and all research methods used in speech are modified for sign. Both speech and sign are thus investigated using similar test batteries. Furthermore, both production and perception of speech and sign are studied separately. An additional framework for studying production is given by gesture research using cry sounds. Results of cry sound research are then compared to results from children acquiring sign language. These results show that individuality manifests itself from very early on in human development. Articulation in adults, both in speech and sign, is studied from two perspectives: normal production and re-learning production when the apparatus has been changed. Normal production is studied both in speech and sign and the effects of changed articulation are studied with regards to speech. Both these studies are done by using carrier sentences. Furthermore, sign production is studied giving the informants possibility for spontaneous speech. The production data from the signing informants is also used as the basis for input in the sign synthesis stimuli used in sign perception test battery. Speech and sign perception were studied using the informants’ answers to questions using forced choice in identification and discrimination tasks. These answers were then compared across language modalities. Three different informant groups participated in the sign perception tests: native signers, sign language interpreters and Finnish adults with no knowledge of any signed language. This gave a chance to investigate which of the characteristics found in the results were due to the language per se and which were due to the changes in modality itself. As the analogous test batteries yielded similar results over different informant groups, some common threads of results could be observed. Starting from very early on in acquiring speech and sign the results were highly individual. However, the results were the same within one individual when the same test was repeated. This individuality of results represented along same patterns across different language modalities and - in some occasions - across language groups. As both modalities yield similar answers to analogous study questions, this has lead us to providing methods for basic input for sign language applications, i.e. signing avatars. This has also given us answers to questions on precision of the animation and intelligibility for the users – what are the parameters that govern intelligibility of synthesised speech or sign and how precise must the animation or synthetic speech be in order for it to be intelligible. The results also give additional support to the well-known fact that intelligibility in fact is not the same as naturalness. In some cases, as shown within the sign perception test battery design, naturalness decreases intelligibility. This also has to be taken into consideration when designing applications. All in all, results from each of the test batteries, be they for signers or speakers, yield strikingly similar patterns, which would indicate yet further support for the common core for all human communication. Thus, we can modify and deepen the phonetic framework models for human communication based on the knowledge obtained from the results of the test batteries within this thesis.Siirretty Doriast

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
    corecore