72 research outputs found

    Sensorimotor processing in speech examined in automatic imitation tasks

    Get PDF
    The origin of humans’ imitative capacity to quickly map observed actions onto their motor repertoire has been the source of much debate in cognitive psychology. Past research has provided a comprehensive account of how sensorimotor associative experience forges and modulates the imitative capacity underlying familiar, visually transparent manual gestures. Yet, little is known about whether the same associative mechanism is also involved in imitation of visually opaque orofacial movements or novel actions that were not part of the observers’ motor repertoire. This thesis aims to establish the role of sensorimotor experience in modulating the imitative capacity underlying communicative orofacial movements, namely speech actions, that are either familiar or novel to perceivers. Chapter 3 first establishes that automatic imitation of speech occurs due to perception- induced motor activation and thus can be used as a behavioural measure to index the imitative capacity underlying speech. Chapter 4 demonstrates that the flexibility observed for the imitative capacity underlying manual gestures extends to the imitative capacity underlying visually perceived speech actions, suggesting that the associative mechanism is also involved in imitation of visually opaque orofacial movements. Chapter 5 further shows that sensorimotor experience with novel speech actions modulates the imitative capacity underlying both novel and familiar speech actions produced using the same articulators. Thus, findings from Chapter 5 suggest that the associative mechanism is also involved in imitation of novel actions and that experience-induced modification probably occurs at the feature level in the perception-production link presumably underlying the imitative capacity. Results are discussed with respect to previous imitation research and more general action-perception research in cognitive and experimental psychology, sensorimotor interaction studies in speech science, and native versus non-native processing in second language research. Overall, it is concluded that the development of speech imitation follows the same basic associative learning rules as the development of imitation in other effector systems

    Effects of aging and cognitive abilities on multimodal language production and comprehension in context

    Get PDF

    The shrink point: audiovisual integration of speech-gesture synchrony

    Get PDF
    Kirchhof C. The shrink point: audiovisual integration of speech-gesture synchrony. Bielefeld: Universität Bielefeld; 2017.Up to now, the focus in gesture research has long been on the production of speech-accompanying gestures and on how speech-gesture utterances contribute to communication. An issue that has mostly been neglected is in how far listeners even perceive the gesture-part of a multimodal utterance. For instance, there has been a major focus on the lexico-semiotic connection between spontaneously coproduced gestures and speech in gesture research (e.g., de Ruiter, 2007; Kita & Özyürek, 2003; Krauss, Chen & Gottesman, 2000). Due to the rather precise timing between the prosodic peak in speech with the most prominent stroke of the gesture phrase in production, Schegloff (1984) and Krauss, Morrel-Samuels and Colasante (1991; also Rauscher, Krauss & Chen, 1996), among others, coined the phenomenon of lexical affiliation. By following Krauss et al. (1991), the first empirical study of this dissertation investigates the nature of the semiotic relation between speech and gestures, focusing on its applicability to temporal perception and comprehension. When speech and lip movements diverge too far from the original production synchrony, this can be highly irritating to the viewer, even when audio and video stem from the same original recording (e.g., Vatakis, Navarra, Soto-Faraco & Spence, 2008; Feyereisen, 2007) – there is only a small temporal window of audiovisual integration (AVI) within which viewer-listeners can internally align discrepancies between lip movements and the speech supposedly produced by these (e.g. McGurk & MacDonald, 1976). Several studies in the area of psychophysics (e.g., Nishida, 2006; Fujisaki & Nishida, 2005) found that there is also a time window for the perceptual alignment of nonspeech visual and auditory signals. These and further studies on the AVI of speech-lip asynchronies have inspired research on the perception of speech-gesture utterances. McNeill, Cassell, and McCullough (1994; Cassell, McNeill & McCullough, 1999), for instance, discovered that listeners take up information even from artificially combined speech and gestures. More recent studies researching the AVI of speech and gestures have employed event-related potential (ERP) monitoring as a methodological means to investigate the perception of multimodal utterances (e.g., Gullberg & Holmqvist, 1999; 2006; Özyürek, Willems, Kita & Hagoort, 2007; Habets, Kita, Shao, Özyürek & Hagoort, 2011). While the aforementioned studies from the fields of psychophysics and speech-only and speech-gesture research have contributed greatly to theories of how listeners perceive multimodal signals, there has been a lack of explorations of natural data and of dyadic situations. This dissertation investigates the perception of naturally produced speech-gesture utterances by having participants rate the naturalness of synchronous and asynchronous versions of speech-gesture utterances using different qualitative and quantitative methodologies such as an online rating study and a preference task. Drawing, for example, from speech-gesture production models based on Levelt's (1989) model of speech production (e.g., de Ruiter, 1998; 2007; Krauss et al., 2000; Kita & Özyürek, 2003) and founding on the results and analyses of the studies conducted for this dissertation, I finally propose a model draft of a possible transmission cycle between Growth Point (e.g., McNeill, 1985; 1992) and Shrink Point, the perceptual counterpart to the Growth Point. This model includes the temporal and semantic alignment of speech and different gesture types as well as their audiovisual and conceptual integration during perception. The perceptual studies conducted within the scope of this dissertation have revealed varying temporal ranges in which an asynchrony in speechgesture utterances is integrable by the listener, especially iconic gestures

    Face processing : the role of dynamic information

    Get PDF
    This thesis explores the effects of movement on various face processing tasks. In Experiments One to Four, unfamiliar face recognition was investigated using identical numbers of frames in the learning phase; these were viewed as a series of static images, or in moving sequences (using computer animation). There was no additional benefit from studying the moving sequences, but signal detection measurements showed an advantage for using dynamic sequences at test. In Experiments Five and Six, moving and static images of unfamiliar faces were matched for expression or identity. Without prior study, movement only helped in matching the expression. It was proposed that motion provided more effective access to a stored representation of an emotional expression. Brief familiarisation with the faces led to an advantage for dynamic presentations in referring to a stored representation of identity as well as expression. Experiments Seven to Nine explored the suggestion that motion is beneficial when accessinga pre-existingd escription. Significantly more famous faces were recognised in inverted and negated formats when shown in dynamic clips, compared with recognition using static images. This benefit may be through detecting idiosyncratic gesture patterns at test, or extracting spatial and temporal relationships which overlapped the stored kinematic details. Finally, unfamiliar faces were studied as moving or static images; recognition was tested under dynamic or fixed conditions using inverted or negated formats. As there was no difference between moving and static study phases, it was unlikely that idiosyncratic gesture patterns were being detected, so the significant advantage for motion at test seemed due to an overlap with the stored description. However, complex interactions were found, and participants demonstrated bias when viewing motion at test. Future work utilising dynamic image-manipulated displays needs to be undertaken before we fully understand the processing of facial movement

    Development and evaluation of an interactive virtual audience for a public speaking training application

    Get PDF
    Einleitung: Eine der häufigsten sozialen Ängste ist die Angst vor öffentlichem Sprechen. Virtual-Reality- (VR-) Trainingsanwendungen sind ein vielversprechendes Instrument, um die Sprechangst zu reduzieren und die individuellen Sprachfähigkeiten zu verbessern. Grundvoraussetzung hierfür ist die Implementierung eines realistischen und interaktiven Sprecher-Publikum-Verhaltens. Ziel: Die Studie zielte darauf ab, ein realistisches und interaktives Publikum für eine VR-Anwendung zu entwickeln und zu bewerten, welches für die Trainingsanwendung von öffentlichem Sprechen angewendet wird. Zunächst wurde eine Beobachtungsstudie zu den Verhaltensmustern von Sprecher und Publikum durchgeführt. Anschließend wurden die identifizierten Muster in eine VR-Anwendung implementiert. Die Wahrnehmung der implementierten Interaktionsmuster wurde in einer weiteren Studie aus Sicht der Nutzer evaluiert. Beobachtungsstudie (1): Aufgrund der nicht ausreichenden Datengrundlage zum realen interaktiven Verhalten zwischen Sprecher und Publikum lautet die erste Forschungsfrage "Welche Sprecher-Publikums-Interaktionsmuster können im realen Umfeld identifiziert werden?". Es wurde eine strukturierte, nicht teilnehmende, offene Beobachtungsstudie durchgeführt. Ein reales Publikum wurde auf Video aufgezeichnet und die Inhalte analysiert. Die Stichprobe ergab N = 6484 beobachtete Interaktionsmuster. Es wurde festgestellt, dass Sprecher mehr Dialoge als das Publikum initiieren und wie die Zuschauer auf Gesichtsausdrücke und Gesten der Sprecher reagieren. Implementierungsstudie (2): Um effiziente Wege zur Implementierung der Ergebnisse der Beobachtungsstudie in die Trainingsanwendung zu finden, wurde die Forschungsfrage wie folgt formuliert: "Wie können Interaktionsmuster zwischen Sprecher und Publikum in eine virtuelle Anwendung implementiert werden?". Das Hardware-Setup bestand aus einer CAVE, Infitec-Brille und einem ART Head-Tracking. Die Software wurde mit 3D-Excite RTT DeltaGen 12.2 realisiert. Zur Beantwortung der zweiten Forschungsfrage wurden mehrere mögliche technische Lösungen systematisch untersucht, bis effiziente Lösungen gefunden wurden. Infolgedessen wurden die selbst erstellte Audioerkennung, die Kinect-Bewegungserkennung, die Affectiva-Gesichtserkennung und die selbst erstellten Fragen implementiert, um das interaktive Verhalten des Publikums in der Trainingsanwendung für öffentliches Sprechen zu realisieren. Evaluationsstudie (3): Um herauszufinden, ob die Implementierung interaktiver Verhaltensmuster den Erwartungen der Benutzer entsprach, wurde die dritte Forschungsfrage folgendermaßen formuliert: “Wie beeinflusst die Interaktivität einer virtuellen Anwendung für öffentliches Reden die Benutzererfahrung?”. Eine experimentelle Benutzer-Querschnittsstudie wurde mit N = 57 Teilnehmerinnen (65% Männer, 35% Frauen; Durchschnittsalter = 25.98, SD = 4.68) durchgeführt, die entweder der interaktiven oder nicht-interaktiven VR-Anwendung zugewiesen wurden. Die Ergebnisse zeigten, dass, es einen signifikanten Unterschied in der Wahrnehmung zwischen den beiden Anwendungen gab. Allgemeine Schlussfolgerungen: Interaktionsmuster zwischen Sprecher und Publikum, die im wirklichen Leben beobachtet werden können, wurden in eine VR-Anwendung integriert, die Menschen dabei hilft, Angst vor dem öffentlichen Sprechen zu überwinden und ihre öffentlichen Sprechfähigkeiten zu trainieren. Die Ergebnisse zeigten eine hohe Relevanz der VR-Anwendungen für die Simulation öffentlichen Sprechens. Obwohl die Fragen des Publikums manuell gesteuert wurden, konnte das neu gestaltete Publikum mit den Versuchspersonen interagieren. Die vorgestellte VR-Anwendung zeigt daher einen hohen potenziellen Nutzen, Menschen beim Trainieren von Sprechfähigkeiten zu unterstützen. Die Fragen des Publikums wurden immer noch manuell von einem Bediener reguliert und die Studie wurde mit Teilnehmern durchgeführt, die nicht unter einem hohen Grad an Angst vor öffentlichem Sprechen leiden. Bei zukünftigen Studien sollten fortschrittlichere Technologien eingesetzt werden, beispielsweise Spracherkennung, 3D-Aufzeichnungen oder 3D-Livestreams einer realen Person und auch Teilnehmer mit einem hohen Grad an Angst vor öffentlichen Ansprachen beziehungsweise Sprechen in der Öffentlichkeit.Introduction: Fear of public speaking is the most common social fear. Virtual reality (VR) training applications are a promising tool to improve public speaking skills. To be successful, applications should feature a high scenario fidelity. One way to improve it is to implement realistic speaker-audience interactive behavior. Objective: The study aimed to develop and evaluate a realistic and interactive audience for a VR public speaking training application. First, an observation study on real speaker-audience interactive behavior patterns was conducted. Second, identified patterns were implemented in the VR application. Finally, an evaluation study identified users’ perceptions of the training application. Observation Study (1): Because of the lack of data on real speaker-audience interactive behavior, the first research question to be answered was “What speaker-audience interaction patterns can be identified in real life?”. A structured, non-participant, overt observation study was conducted. A real audience was video recorded, and content analyzed. The sample resulted in N = 6,484 observed interaction patterns. It was found that speakers, more often than audience members, initiate dialogues and how audience members react to speakers’ facial expressions and gestures. Implementation Study (2): To find efficient ways of implementing the results of the observation study in the training application, the second research question was formulated as: “How can speaker-audience interaction patterns be implemented into the virtual public speaking application?”. The hardware setup comprised a CAVE, Infitec glasses, and ART head tracking. The software was realized with 3D-Excite RTT DeltaGen 12.2. To answer the second research question, several possible technical solutions were explored systematically, until efficient solutions were found. As a result, self-created audio recognition, Kinect motion recognition, Affectiva facial recognition, and manual question generation were implemented to provide interactive audience behavior in the public speaking training application. Evaluation Study (3): To find out if implementing interactive behavior patterns met users’ expectations, the third research question was formulated as “How does interactivity of a virtual public speaking application affect user experience?”. An experimental, cross-sectional user study was conducted with (N = 57) participants (65% men, 35% women; Mage = 25.98, SD = 4.68) who used either an interactive or a non-interactive VR application condition. Results revealed that there was a significant difference in users’ perception of the two conditions. General Conclusions: Speaker-audience interaction patterns that can be observed in real life were incorporated into a VR application that helps people to overcome the fear of public speaking and train their public speaking skills. The findings showed a high relevance of interactivity for VR public speaking applications. Although questions from the audience were still regulated manually, the newly designed audience could interact with the speakers. Thus, the presented VR application is of potential value in helping people to train their public speaking skills. The questions from the audience were still regulated manually by an operator and we conducted the study with participants not suffering from high degrees of public speaking fear. Future work may use more advanced technology, such as speech recognition, 3D-records, or live 3D-streams of an actual person and include participants with high degrees of public speaking fear

    Universal and language-specific processing : the case of prosody

    Get PDF
    A key question in the science of language is how speech processing can be influenced by both language-universal and language-specific mechanisms (Cutler, Klein, & Levinson, 2005). My graduate research aimed to address this question by adopting a crosslanguage approach to compare languages with different phonological systems. Of all components of linguistic structure, prosody is often considered to be one of the most language-specific dimensions of speech. This can have significant implications for our understanding of language use, because much of speech processing is specifically tailored to the structure and requirements of the native language. However, it is still unclear whether prosody may also play a universal role across languages, and very little comparative attempts have been made to explore this possibility. In this thesis, I examined both the production and perception of prosodic cues to prominence and phrasing in native speakers of English and Mandarin Chinese. In focus production, our research revealed that English and Mandarin speakers were alike in how they used prosody to encode prominence, but there were also systematic language-specific differences in the exact degree to which they enhanced the different prosodic cues (Chapter 2). This, however, was not the case in focus perception, where English and Mandarin listeners were alike in the degree to which they used prosody to predict upcoming prominence, even though the precise cues in the preceding prosody could differ (Chapter 3). Further experiments examining prosodic focus prediction in the speech of different talkers have demonstrated functional cue equivalence in prosodic focus detection (Chapter 4). Likewise, our experiments have also revealed both crosslanguage similarities and differences in the production and perception of juncture cues (Chapter 5). Overall, prosodic processing is the result of a complex but subtle interplay of universal and language-specific structure

    The production and perception of peripheral geminate/singleton coronal stop contrasts in Arabic

    Get PDF
    Gemination is typologically common word-medially but is rare at the periphery of the word (word-initially and -finally). In line with this observation, prior research on production and perception of gemination has focused primarily on medial gemination. Much less is known about the production and perception of peripheral gemination. This PhD thesis reports on comprehensive articulatory, acoustic and perceptual investigations of geminate-singleton contrasts according to the position of the contrast in the word and in the utterance. The production component of the project investigated the articulatory and acoustic features of medial and peripheral gemination of voiced and voiceless coronal stops in Modern standard Arabic and regional Arabic vernacular dialects, as produced by speakers from two disparate and geographically distant countries, Morocco and Lebanon. The perceptual experiment investigated how standard and dialectal Arabic gemination contrasts in each word position were categorised and discriminated by three groups of non-native listeners, each differing in their native language experience with gemination at different word positions. The first experiment used ultrasound and acoustic recordings to address the extent to which word-initial gemination in Moroccan and Lebanese dialectal Arabic is maintained, as well as the articulatory and acoustic variability of the contrast according to the position of the gemination contrast in the utterance (initial vs. medial) and between the two dialects. The second experiment compared the production of word-medial and -final gemination in Modern Standard Arabic as produced by Moroccan and Lebanese speakers. The aim of the perceptual experiment was to disentangle the contribution of phonological and phonetic effects of the listeners’ native languages on the categorisation and discrimination of non-lexical Moroccan gemination by three groups of non-native listeners varying in their phonological (native Lebanese group and heritage Lebanese group, for whom Moroccan is unintelligible, i.e., non-native language) and phonetic-only (native English group) experience with gemination across the three word positions. The findings in this thesis constitute important contributions about positional and dialectal effects on the production and perception of gemination contrasts, going beyond medial gemination (which was mainly included as control) and illuminating in particular the typologically rare peripheral gemination
    • …
    corecore