3,143 research outputs found

    Shifting embodied participation in multiparty university student meetings

    Get PDF
    PhD ThesisStudent group work has been used in higher education as an effective means to cultivate students’ work-related skills and cooperative learning. These encounters of small groups are the sites where, through talk and other resources, university students get their educational tasks done as well as acquire essential workplace skills such as problem-solving, team working, decision-making and leadership. However, settings of educational talk-as-work, such as student group meetings, remain under-researched (Stokoe, Benwell, & Attenborough, 2013). The present study therefore attempts to bridge this gap by investigating the professional and academic abilities of university students to participate in multiparty group meetings, drawing upon a dataset of video- and audio-recorded meetings from the Newcastle University Corpus of Academic English (NUCASE). The dataset consists of ten hours of meetings in which a group of naval architecture undergraduate students work cooperatively on their final year project – to design and build a wind turbine. The study applies the methodological approach of conversation analysis (CA) with a multimodal perspective. It presents a fine-detailed, sequential multimodal analysis of a collection of cases of speaker transitions, and reveals how meeting participants display speakership and recipiency with their verbal/vocal and bodily-visual coordination. In this respect, the present study is the first to offer a systematic collection, as well as a thorough investigation, of speaker transition and turn-taking practices from a multimodal perspective, especially with the scope of analysis beyond pre-turn and turn-beginning positions. It shows how speaker transitions through ‘current speaker selects next’ and ‘next speaker self-selects’ are joint-undertakings not only between the self-selecting/current speaker, and the target recipient/addressed next speaker, but also among other co-present participants. Especially, by mobilising the whole set of multimodal resources, participants are able to display their multiple orientations toward their co-participants, project, pursue and accomplish multiple courses of action in concurrence, and intricately coordinate their mutual orientation toward the shifting and emerging participation framework during the transition, establishment and maintenance of the speakership and recipiency. By presenting the data and analysis, this study extends ii boundaries of existing understandings on the temporality, sequentiality and systematicity of multimodal resources in talk-and-bodies-in-interaction. The thesis also contributes to interaction research in the particular context of student group work in higher education contexts, by providing a ‘screenshot’ of students’ academic lives as it unfolds ‘in flight’. Particularly, it reveals how students competently participate in multiparty group meetings (e.g., taking and allocating turns), co-construct the unfolding meeting procedures (e.g., roundtable update discussion), and jointly achieve the local interactional goals (e.g., sharing work progress, reaching an agreement). Acquiring such skills is, as it argues above, not only crucial for accomplishing the educational tasks, but also necessary for preparing university students to fulfill their future workplace expectations. The study therefore further informs the practices of university students and professional practitioners in multiparty meetings, and also draws on methodological implications for multimodal CA research

    Sensing, interpreting, and anticipating human social behaviour in the real world

    Get PDF
    Low-level nonverbal social signals like glances, utterances, facial expressions and body language are central to human communicative situations and have been shown to be connected to important high-level constructs, such as emotions, turn-taking, rapport, or leadership. A prerequisite for the creation of social machines that are able to support humans in e.g. education, psychotherapy, or human resources is the ability to automatically sense, interpret, and anticipate human nonverbal behaviour. While promising results have been shown in controlled settings, automatically analysing unconstrained situations, e.g. in daily-life settings, remains challenging. Furthermore, anticipation of nonverbal behaviour in social situations is still largely unexplored. The goal of this thesis is to move closer to the vision of social machines in the real world. It makes fundamental contributions along the three dimensions of sensing, interpreting and anticipating nonverbal behaviour in social interactions. First, robust recognition of low-level nonverbal behaviour lays the groundwork for all further analysis steps. Advancing human visual behaviour sensing is especially relevant as the current state of the art is still not satisfactory in many daily-life situations. While many social interactions take place in groups, current methods for unsupervised eye contact detection can only handle dyadic interactions. We propose a novel unsupervised method for multi-person eye contact detection by exploiting the connection between gaze and speaking turns. Furthermore, we make use of mobile device engagement to address the problem of calibration drift that occurs in daily-life usage of mobile eye trackers. Second, we improve the interpretation of social signals in terms of higher level social behaviours. In particular, we propose the first dataset and method for emotion recognition from bodily expressions of freely moving, unaugmented dyads. Furthermore, we are the first to study low rapport detection in group interactions, as well as investigating a cross-dataset evaluation setting for the emergent leadership detection task. Third, human visual behaviour is special because it functions as a social signal and also determines what a person is seeing at a given moment in time. Being able to anticipate human gaze opens up the possibility for machines to more seamlessly share attention with humans, or to intervene in a timely manner if humans are about to overlook important aspects of the environment. We are the first to propose methods for the anticipation of eye contact in dyadic conversations, as well as in the context of mobile device interactions during daily life, thereby paving the way for interfaces that are able to proactively intervene and support interacting humans.Blick, Gesichtsausdrücke, Körpersprache, oder Prosodie spielen als nonverbale Signale eine zentrale Rolle in menschlicher Kommunikation. Sie wurden durch vielzählige Studien mit wichtigen Konzepten wie Emotionen, Sprecherwechsel, Führung, oder der Qualität des Verhältnisses zwischen zwei Personen in Verbindung gebracht. Damit Menschen effektiv während ihres täglichen sozialen Lebens von Maschinen unterstützt werden können, sind automatische Methoden zur Erkennung, Interpretation, und Antizipation von nonverbalem Verhalten notwendig. Obwohl die bisherige Forschung in kontrollierten Studien zu ermutigenden Ergebnissen gekommen ist, bleibt die automatische Analyse nonverbalen Verhaltens in weniger kontrollierten Situationen eine Herausforderung. Darüber hinaus existieren kaum Untersuchungen zur Antizipation von nonverbalem Verhalten in sozialen Situationen. Das Ziel dieser Arbeit ist, die Vision vom automatischen Verstehen sozialer Situationen ein Stück weit mehr Realität werden zu lassen. Diese Arbeit liefert wichtige Beiträge zur autmatischen Erkennung menschlichen Blickverhaltens in alltäglichen Situationen. Obwohl viele soziale Interaktionen in Gruppen stattfinden, existieren unüberwachte Methoden zur Augenkontakterkennung bisher lediglich für dyadische Interaktionen. Wir stellen einen neuen Ansatz zur Augenkontakterkennung in Gruppen vor, welcher ohne manuelle Annotationen auskommt, indem er sich den statistischen Zusammenhang zwischen Blick- und Sprechverhalten zu Nutze macht. Tägliche Aktivitäten sind eine Herausforderung für Geräte zur mobile Augenbewegungsmessung, da Verschiebungen dieser Geräte zur Verschlechterung ihrer Kalibrierung führen können. In dieser Arbeit verwenden wir Nutzerverhalten an mobilen Endgeräten, um den Effekt solcher Verschiebungen zu korrigieren. Neben der Erkennung verbessert diese Arbeit auch die Interpretation sozialer Signale. Wir veröffentlichen den ersten Datensatz sowie die erste Methode zur Emotionserkennung in dyadischen Interaktionen ohne den Einsatz spezialisierter Ausrüstung. Außerdem stellen wir die erste Studie zur automatischen Erkennung mangelnder Verbundenheit in Gruppeninteraktionen vor, und führen die erste datensatzübergreifende Evaluierung zur Detektion von sich entwickelndem Führungsverhalten durch. Zum Abschluss der Arbeit präsentieren wir die ersten Ansätze zur Antizipation von Blickverhalten in sozialen Interaktionen. Blickverhalten hat die besondere Eigenschaft, dass es sowohl als soziales Signal als auch der Ausrichtung der visuellen Wahrnehmung dient. Somit eröffnet die Fähigkeit zur Antizipation von Blickverhalten Maschinen die Möglichkeit, sich sowohl nahtloser in soziale Interaktionen einzufügen, als auch Menschen zu warnen, wenn diese Gefahr laufen wichtige Aspekte der Umgebung zu übersehen. Wir präsentieren Methoden zur Antizipation von Blickverhalten im Kontext der Interaktion mit mobilen Endgeräten während täglicher Aktivitäten, als auch während dyadischer Interaktionen mittels Videotelefonie

    Turn-Taking in Human Communicative Interaction

    Get PDF
    The core use of language is in face-to-face conversation. This is characterized by rapid turn-taking. This turn-taking poses a number central puzzles for the psychology of language. Consider, for example, that in large corpora the gap between turns is on the order of 100 to 300 ms, but the latencies involved in language production require minimally between 600ms (for a single word) or 1500 ms (for as simple sentence). This implies that participants in conversation are predicting the ends of the incoming turn and preparing in advance. But how is this done? What aspects of this prediction are done when? What happens when the prediction is wrong? What stops participants coming in too early? If the system is running on prediction, why is there consistently a mode of 100 to 300 ms in response time? The timing puzzle raises further puzzles: it seems that comprehension must run parallel with the preparation for production, but it has been presumed that there are strict cognitive limitations on more than one central process running at a time. How is this bottleneck overcome? Far from being 'easy' as some psychologists have suggested, conversation may be one of the most demanding cognitive tasks in our everyday lives. Further questions naturally arise: how do children learn to master this demanding task, and what is the developmental trajectory in this domain? Research shows that aspects of turn-taking such as its timing are remarkably stable across languages and cultures, but the word order of languages varies enormously. How then does prediction of the incoming turn work when the verb (often the informational nugget in a clause) is at the end? Conversely, how can production work fast enough in languages that have the verb at the beginning, thereby requiring early planning of the whole clause? What happens when one changes modality, as in sign languages -- with the loss of channel constraints is turn-taking much freer? And what about face-to-face communication amongst hearing individuals -- do gestures, gaze, and other body behaviors facilitate turn-taking? One can also ask the phylogenetic question: how did such a system evolve? There seem to be parallels (analogies) in duetting bird species, and in a variety of monkey species, but there is little evidence of anything like this among the great apes. All this constitutes a neglected set of problems at the heart of the psychology of language and of the language sciences. This research topic welcomes contributions from right across the board, for example from psycholinguists, developmental psychologists, students of dialogue and conversation analysis, linguists interested in the use of language, phoneticians, corpus analysts and comparative ethologists or psychologists. We welcome contributions of all sorts, for example original research papers, opinion pieces, and reviews of work in subfields that may not be fully understood in other subfields

    Meeting decision detection: multimodal information fusion for multi-party dialogue understanding

    Get PDF
    Modern advances in multimedia and storage technologies have led to huge archives of human conversations in widely ranging areas. These archives offer a wealth of information in the organization contexts. However, retrieving and managing information in these archives is a time-consuming and labor-intensive task. Previous research applied keyword and computer vision-based methods to do this. However, spontaneous conversations, complex in the use of multimodal cues and intricate in the interactions between multiple speakers, have posed new challenges to these methods. We need new techniques that can leverage the information hidden in multiple communication modalities – including not just “what” the speakers say but also “how” they express themselves and interact with others. In responding to this need, the thesis inquires into the multimodal nature of meeting dialogues and computational means to retrieve and manage the recorded meeting information. In particular, this thesis develops the Meeting Decision Detector (MDD) to detect and track decisions, one of the most important outcomes of the meetings. The MDD involves not only the generation of extractive summaries pertaining to the decisions (“decision detection”), but also the organization of a continuous stream of meeting speech into locally coherent segments (“discourse segmentation”). This inquiry starts with a corpus analysis which constitutes a comprehensive empirical study of the decision-indicative and segment-signalling cues in the meeting corpora. These cues are uncovered from a variety of communication modalities, including the words spoken, gesture and head movements, pitch and energy level, rate of speech, pauses, and use of subjective terms. While some of the cues match the previous findings of speech segmentation, some others have not been studied before. The analysis also provides empirical grounding for computing features and integrating them into a computational model. To handle the high-dimensional multimodal feature space in the meeting domain, this thesis compares empirically feature discriminability and feature pattern finding criteria. As the different knowledge sources are expected to capture different types of features, the thesis also experiments with methods that can harness synergy between the multiple knowledge sources. The problem formalization and the modeling algorithm so far correspond to an optimal setting: an off-line, post-meeting analysis scenario. However, ultimately the MDD is expected to be operated online – right after a meeting, or when a meeting is still in progress. Thus this thesis also explores techniques that help relax the optimal setting, especially those using only features that can be generated with a higher degree of automation. Empirically motivated experiments are designed to handle the corresponding performance degradation. Finally, with the users in mind, this thesis evaluates the use of query-focused summaries in a decision debriefing task, which is common in the organization context. The decision-focused extracts (which represent compressions of 1%) is compared against the general-purpose extractive summaries (which represent compressions of 10-40%). To examine the effect of model automation on the debriefing task, this evaluation experiments with three versions of decision-focused extracts, each relaxing one manual annotation constraint. Task performance is measured in actual task effectiveness, usergenerated report quality, and user-perceived success. The users’ clicking behaviors are also recorded and analyzed to understand how the users leverage the different versions of extractive summaries to produce abstractive summaries. The analysis framework and computational means developed in this work is expected to be useful for the creation of other dialogue understanding applications, especially those that require to uncover the implicit semantics of meeting dialogues

    Acomodación fonética durante las interacciones conversacionales: una visión general

    Get PDF
    During conversational interactions such as tutoring, instruction-giving tasks, verbal negotiations, or just talking with friends, interlocutors’ behaviors experience a series of changes due to the characteristics of their counterpart and to the interaction itself. These changes are pervasively present in every social interaction, and most of them occur in the sounds and rhythms of our speech, which is known as acoustic-prosodic accommodation, or simply phonetic accommodation. The consequences, linguistic and social constraints, and underlying cognitive mechanisms of phonetic accommodation have been studied for at least 50 years, due to the importance of the phenomenon to several disciplines such as linguistics, psychology, and sociology. Based on the analysis and synthesis of the existing empirical research literature, in this paper we present a structured and comprehensive review of the qualities, functions, onto- and phylogenetic development, and modalities of phonetic accommodation.Durante las interacciones conversacionales como dar una tutoría, dar instrucciones, las negociaciones verbales, o simplemente hablar con amigos, los comportamientos de las personas experimentan una serie de cambios debido a las características de su interlocutor y a la interacción en sí. Estos cambios están presentes en cada interacción social, y la mayoría de ellos ocurre en los sonidos y ritmos del habla, lo cual se conoce como acomodación acústico-prosódica, o simplemente acomodación fonética. Las consecuencias, las limitaciones lingüísticas y sociales, y los mecanismos cognitivos subyacentes a la acomodación fonética se han estudiado durante al menos 50 años, debido a la importancia del fenómeno para varias disciplinas como la lingüística, la psicología, y la sociología. A partir del análisis y síntesis de la literatura de investigación empírica existente, en este artículo presentamos una revisión estructurada y exhaustiva de las cualidades, funciones, desarrollo onto- y filogenético, y modalidades de la acomodación fonética
    corecore