11 research outputs found

    Variable Food Begging Calls Are Harbingers of Vocal Learning

    Get PDF
    Vocal learning has evolved in only a few groups of mammals and birds. The developmental and evolutionary origins of vocal learning remain unclear. The imitation of a memorized sound is a clear example of vocal learning, but is that when vocal learning starts? Here we use an ontogenetic approach to examine how vocal learning emerges in a songbird, the chipping sparrow. The first vocalizations of songbirds, food begging calls, were thought to be innate, and vocal learning emerges later during subsong, a behavior reminiscent of infant babbling. Here we report that the food begging calls of male sparrows show several characteristics associated with learned song: male begging calls are highly variable between individuals and are altered by deafening; the production of food begging calls induces c-fos expression in a forebrain motor nucleus, RA, that is involved with the production of learned song. Electrolytic lesions of RA significantly reduce the variability of male calls. The male begging calls are subsequently incorporated into subsong, which in turn transitions into recognizable attempts at vocal imitation. Females do not sing and their begging calls are not affected by deafening or RA lesion. Our results suggest that, in chipping sparrows, intact hearing can influence the quality of male begging calls, auditory-sensitive vocal variability during food begging calls is the first step in a modification of vocal output that eventually culminates with vocal imitation

    Prosody in Speech Produced by Deaf Persons

    Get PDF
    Cilj istraživanja bio je ispitati obilježja prozodije (osnovnog laringalnog tona, intonacije, naglaska, stanka i govornog tempa) govora gluhih osoba i utvrditi postoji li povezanost između prosječnog gubitka sluha i kontrole pojedinih elemenata prozodije. U ispitivanju je sudjelovalo 12 prelingvalno gluhih adolescenata oba spola, ciji je zadatak bio procitati ispitni materijal konstruiran za ispitivanje sposobnosti kontrole pojedinih prozodijskih elemenata. Utjecaj stupnja oštecenja sluha na prozodiju ispitan je podjelom ispitanika na skupinu s oštecenjem sluha do 110 dB (N=8) i skupinu s oštecenjem iznad 110 dB (N=4). Razlike medju skupinama u postignutim rezultatima na ispitnim zadacima testirane su robusnom diskriminacijskom analizom. Rezultati su pokazali da je stupanj gluhoce utjecao na stupanj kontrole nekih prozodijskih obilježja - skupine su se znacajno razlikovale u kontroli intonacije, stanki i govornog tempa - što pokazuje da prozodija nije uniformno narušena u kategoriji gluhoce, vec da neki prozodijski elementi ostaju sacuvani kod nižih stupnjeva gluhoce. S druge strane, deskriptivna analiza pokazala je znatne razlike unutar samih skupina ispitanika što ukazuje da neke osobe s visokim stupnjem gluhoce imaju rehabilitacijski potencijal koji dozvoljava razvoj kontrole pojedinih prozodijskih obilježja do razine prisutne u osoba s nižim stupnjem gluhoce. Oba spomenuta rezultata analize podataka upucuju da stupanj oštecenja sluha nije mjera koja dovoljno precizno predocava preostalu sposobnost gluhe osobe da kontrolira prozodiju u govoru.The aim of this paper was to examine prosody in speech produced by deaf persons, and to determine whether there is connection between average hearing loss and control of certain prosodic feature (speech fundamental frequency, intonation, accent, pauses and speech rate). 12 prelingual deaf adolescents read testing material constructed to examine their control of selected prosodic features. In order to determine possible influence of pure tone average (PTA) on the observed prosodic features, the subjects were divided in two groups: subjects with PTA between 90 and 110 dB (N=8), and subjects with PTA above 110 dB (N=4). The differences between these groups were tested by using robust discriminant analysis. The results showed that the control of some prosodic features is related to PTA score - significant differences between groups of subjects were found in the ability to control the intonation, the use of pauses and the speech rate – which indicates that prosody is not uniformly affected in the category of deafness, but that some prosodic features remain preserved with lower degrees of deafness. On the other hand, the descriptive analysis showed substantial differences within groups, which indicates that some deaf persons with extremely high PTA score have potential to develop the same amount of control of some prosodic features as deaf persons with lower PTA score. Results of both descriptive and robust discriminant analysis indicate that PTA is not the measure sensitive enough to express the remaining ability of deaf persons to control prosody

    Towards an Integrative Information Society: Studies on Individuality in Speech and Sign

    Get PDF
    The flow of information within modern information society has increased rapidly over the last decade. The major part of this information flow relies on the individual’s abilities to handle text or speech input. For the majority of us it presents no problems, but there are some individuals who would benefit from other means of conveying information, e.g. signed information flow. During the last decades the new results from various disciplines have all suggested towards the common background and processing for sign and speech and this was one of the key issues that I wanted to investigate further in this thesis. The basis of this thesis is firmly within speech research and that is why I wanted to design analogous test batteries for widely used speech perception tests for signers – to find out whether the results for signers would be the same as in speakers’ perception tests. One of the key findings within biology – and more precisely its effects on speech and communication research – is the mirror neuron system. That finding has enabled us to form new theories about evolution of communication, and it all seems to converge on the hypothesis that all communication has a common core within humans. In this thesis speech and sign are discussed as equal and analogical counterparts of communication and all research methods used in speech are modified for sign. Both speech and sign are thus investigated using similar test batteries. Furthermore, both production and perception of speech and sign are studied separately. An additional framework for studying production is given by gesture research using cry sounds. Results of cry sound research are then compared to results from children acquiring sign language. These results show that individuality manifests itself from very early on in human development. Articulation in adults, both in speech and sign, is studied from two perspectives: normal production and re-learning production when the apparatus has been changed. Normal production is studied both in speech and sign and the effects of changed articulation are studied with regards to speech. Both these studies are done by using carrier sentences. Furthermore, sign production is studied giving the informants possibility for spontaneous speech. The production data from the signing informants is also used as the basis for input in the sign synthesis stimuli used in sign perception test battery. Speech and sign perception were studied using the informants’ answers to questions using forced choice in identification and discrimination tasks. These answers were then compared across language modalities. Three different informant groups participated in the sign perception tests: native signers, sign language interpreters and Finnish adults with no knowledge of any signed language. This gave a chance to investigate which of the characteristics found in the results were due to the language per se and which were due to the changes in modality itself. As the analogous test batteries yielded similar results over different informant groups, some common threads of results could be observed. Starting from very early on in acquiring speech and sign the results were highly individual. However, the results were the same within one individual when the same test was repeated. This individuality of results represented along same patterns across different language modalities and - in some occasions - across language groups. As both modalities yield similar answers to analogous study questions, this has lead us to providing methods for basic input for sign language applications, i.e. signing avatars. This has also given us answers to questions on precision of the animation and intelligibility for the users – what are the parameters that govern intelligibility of synthesised speech or sign and how precise must the animation or synthetic speech be in order for it to be intelligible. The results also give additional support to the well-known fact that intelligibility in fact is not the same as naturalness. In some cases, as shown within the sign perception test battery design, naturalness decreases intelligibility. This also has to be taken into consideration when designing applications. All in all, results from each of the test batteries, be they for signers or speakers, yield strikingly similar patterns, which would indicate yet further support for the common core for all human communication. Thus, we can modify and deepen the phonetic framework models for human communication based on the knowledge obtained from the results of the test batteries within this thesis.Siirretty Doriast

    Paralinguistic vocal control of interactive media: how untapped elements of voice might enhance the role of non-speech voice input in the user's experience of multimedia.

    Get PDF
    Much interactive media development, especially commercial development, implies the dominance of the visual modality, with sound as a limited supporting channel. The development of multimedia technologies such as augmented reality and virtual reality has further revealed a distinct partiality to visual media. Sound, however, and particularly voice, have many aspects which have yet to be adequately investigated. Exploration of these aspects may show that sound can, in some respects, be superior to graphics in creating immersive and expressive interactive experiences. With this in mind, this thesis investigates the use of non-speech voice characteristics as a complementary input mechanism in controlling multimedia applications. It presents a number of projects that employ the paralinguistic elements of voice as input to interactive media including both screen-based and physical systems. These projects are used as a means of exploring the factors that seem likely to affect users’ preferences and interaction patterns during non-speech voice control. This exploration forms the basis for an examination of potential roles for paralinguistic voice input. The research includes the conceptual and practical development of the projects and a set of evaluative studies. The work submitted for Ph.D. comprises practical projects (50 percent) and a written dissertation (50 percent). The thesis aims to advance understanding of how voice can be used both on its own and in combination with other input mechanisms in controlling multimedia applications. It offers a step forward in the attempts to integrate the paralinguistic components of voice as a complementary input mode to speech input applications in order to create a synergistic combination that might let the strengths of each mode overcome the weaknesses of the other

    Automatic vocal recognition of a child's perceived emotional state within the Speechome corpus

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 137-149).With over 230,000 hours of audio/video recordings of a child growing up in the home setting from birth to the age of three, the Human Speechome Project has pioneered a comprehensive, ecologically valid observational dataset that introduces far-reaching new possibilities for the study of child development. By offering In vivo observation of a child's daily life experience at ultra-dense, longitudinal time scales, the Speechome corpus holds great potential for discovering developmental insights that have thus far eluded observation. The work of this thesis aspires to enable the use of the Speechome corpus for empirical study of emotional factors in early child development. To fully harness the benefits of Speechome for this purpose, an automated mechanism must be created to perceive the child's emotional state within this medium. Due to the latent nature of emotion, we sought objective, directly measurable correlates of the child's perceived emotional state within the Speechome corpus, focusing exclusively on acoustic features of the child's vocalizations and surrounding caretaker speech. Using Partial Least Squares regression, we applied these features to build a model that simulates human perceptual heuristics for determining a child's emotional state. We evaluated the perceptual accuracy of models built across child-only, adult-only, and combined feature sets within the overall sampled dataset, as well as controlling for social situations, vocalization behaviors (e.g. crying, laughing, babble), individual caretakers, and developmental age between 9 and 24 months. Child and combined models consistently demonstrated high perceptual accuracy, with overall adjusted R-squared values of 0.54 and 0.58, respectively, and an average of 0.59 and 0.67 per month. Comparative analysis across longitudinal and socio-behavioral contexts yielded several notable developmental and dyadic insights. In the process, we have developed a data mining and analysis methodology for modeling perceived child emotion and quantifying caretaker intersubjectivity that we hope to extend to future datasets across multiple children, as new deployments of the Speechome recording technology are established. Such large-scale comparative studies promise an unprecedented view into the nature of emotional processes in early childhood and potentially enlightening discoveries about autism and other developmental disorders.by Sophia Yuditskaya.S.M

    Paralinguistic vocal control of interactive media : how untapped elements of voice might enhance the role of non-speech voice input in the user's experience of multimedia

    Get PDF
    Much interactive media development, especially commercial development, implies the dominance of the visual modality, with sound as a limited supporting channel. The development of multimedia technologies such as augmented reality and virtual reality has further revealed a distinct partiality to visual media. Sound, however, and particularly voice, have many aspects which have yet to be adequately investigated. Exploration of these aspects may show that sound can, in some respects, be superior to graphics in creating immersive and expressive interactive experiences. With this in mind, this thesis investigates the use of non-speech voice characteristics as a complementary input mechanism in controlling multimedia applications. It presents a number of projects that employ the paralinguistic elements of voice as input to interactive media including both screen-based and physical systems. These projects are used as a means of exploring the factors that seem likely to affect users' preferences and interaction patterns during non-speech voice control. This exploration forms the basis for an examination of potential roles for paralinguistic voice input. The research includes the conceptual and practical development of the projects and a set of evaluative studies. The work submitted for Ph.D. comprises practical projects (50 percent) and a written dissertation (50 percent). The thesis aims to advance understanding of how voice can be used both on its own and in combination with other input mechanisms in controlling multimedia applications. It offers a step forward in the attempts to integrate the paralinguistic components of voice as a complementary input mode to speech input applications in order to create a synergistic combination that might let the strengths of each mode overcome the weaknesses of the other.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Proizvodnja i percepcija govora

    Get PDF
    Zbornik radova okupio je 57 domaćih i inozemnih autora/ica, koji/e kroz 33 rada, iz različitih istraživačkih kutova, obrađuju recentne teme o proizvodnji i percepciji govora, te o njihovoj međuovisnosti u govornom procesu. Knjiga je posvećena profesoru Damiru Horgi povodom njegova sedamdesetog rođendana. Uz svaki rad naveden je sažetak na hrvatskom i engleskom jeziku. Zbornik je objavljen u suizdavaštvu Odsjeka za fonetiku Filozofskog fakulteta Sveučilišta u Zagrebu, Odjela za fonetiku Hrvatskoga filološkog društva i FF-pressa.Zbornik radova okupio je 57 domaćih i inozemnih autora/ica, koji/e kroz 33 rada, iz različitih istraživačkih kutova, obrađuju recentne teme o proizvodnji i percepciji govora, te o njihovoj međuovisnosti u govornom procesu. Knjiga je posvećena profesoru Damiru Horgi povodom njegova sedamdesetog rođendana. Uz svaki rad naveden je sažetak na hrvatskom i engleskom jeziku. Zbornik je objavljen u suizdavaštvu Odsjeka za fonetiku Filozofskog fakulteta Sveučilišta u Zagrebu, Odjela za fonetiku Hrvatskoga filološkog društva i FF-pressa
    corecore