75 research outputs found

    Leader Identification Using Multimodal Information in Multi-party Conversations

    Get PDF
    It is one of the important tasks to predict a participant\u27s role in a multi-party conversation. Many previous studies utilized only verbal or non-verbal features to construct models for the role recognition task. In this paper, we propose a model that combines verbal and non-verbal features for leader identification. We add non-verbal features and construct our prediction model with utterance, pose, facial, and prosodic features. In our experiments, we compare our model with a baseline model that is based on only utterance features. The results show the effectiveness of our multimodal approach. In addition, we improve the performance of the baseline model to add some new utterance features.International Conference on Asian Language Processing (IALP 2020), 4-6 December, 2020, Kuala Lumpur, Malaysia(新型コロナ感染拡大に伴い、オンライン開催に変更

    An investigation into interactional patterns for Alzheimer's Disease recognition in Natural dialogues

    Get PDF
    Alzheimer's disease (AD) is a complex neurodegenerative disorder characterized by memory loss, together with cognitive deficits affecting language, emotional affect, and interactional communication. Diagnosis and assessment of AD is formally based on the judgment of clinicians, commonly using semi-structured interviews in a clinical setting. Manual diagnosis is therefore slow, resource-heavy, and hard to access, so many people don't get diagnosed - and therefore using some kind of automatic method would help. Using the most recent advances in deep learning, machine learning, and natural language processing, this thesis empirically explores how content-free, interaction patterns are helpful in developing models capable of identifying AD from natural conversations with a focus on particular phenomena found useful in conversational analysis studies. The models presented in this thesis use lexical, disfluency, interactional, acoustic, and pause information to learn the symptoms of Alzheimer's disease from text and audio modalities. This thesis comprises two parts. In the first part, by studying a conversational corpus, we find there are certain phenomena that are really strongly indicative of differences between AD and Non-AD. This analysis shows that interaction patterns are different between an AD patient and a Non-AD patient, including types of questions asked from patients, their responses, delay in responses in the form of pauses, clarification questions, signaling non-understanding, and repetition of questions. Although it is a challenging problem due to the fact that these dialogue acts are so rare, we show that it is possible to develop models that can automatically detect these classes. The second part then shifts to look at AD diagnosis itself by looking into interactional features including pause information, disfluencies within patients speech, communication breakdowns at speaker changes in certain situations, Ngram dialogue act sequences. We found out that there are longer pauses within the AD patients utterances and more attributable silences in response to questions as compared to Non-AD patients. It also showed that using different fusion techniques with speech and text modality has maximise the combination and use of different feature sets showing that these features/techniques can give quite good accurate and effective AD diagnosis. These interaction patterns may serve as an index of internal cognitive processes that help in differentiating AD patients and Non-AD patients and may be used as an integral part of language assessment in clinical settings

    Interpersonal stance in police interviews: content analysis

    Get PDF
    A serious game for learning the social skills required for effective police interviewing is a challenging idea. Building artificial conversational characters that play the role of a suspect in a police interrogation game requires computational models of police interviews as well as of the internal psychological mechanisms that determine the behaviour of suspects in this special type of dialogues. Leary's interactional circumplex is used in police interview training as a theoretical framework to understand how suspects take stance during an interview and how this is related to the stance and the strategy that the interviewer takes. Interactional stance is a fuzzy notion. The question that we consider here is whether different observers of police nterviews agree on the type of stance that suspect and policemen take and express in a face-to-face interview. We analyzed police interviews and report about a stance annotation exercise. We conclude that although inter-annotator agreement on stance labeling on the level of speech segments is low, a majority voting meta-annotator" is able to reveal the important dynamics in stance taking in a police interview. Then we explore the relation between the stance taken by the suspect and turn-taking behaviour, overlaps, interruptions, pauses and silences. Our findings contribute to building computational models of non-player characters that allow more natural turn-taking behaviour in serious games instead of the one-at-a-time regime in interview training games

    Interpersonal stance in police interviews: content analysis

    Get PDF
    A serious game for learning the social skills required for effective police interviewing is a challenging idea. Building artificial conversational characters that play the role of a suspect in a police interrogation game requires computational models of police interviews as well as of the internal psychological mechanisms that determine the behaviour of suspects in this special type of dialogues. Leary's interactional circumplex is used in police interview training as a theoretical framework to understand how suspects take stance during an interview and how this is related to the stance and the strategy that the interviewer takes. Interactional stance is a fuzzy notion. The question that we consider here is whether different observers of police nterviews agree on the type of stance that suspect and policemen take and express in a face-to-face interview. We analyzed police interviews and report about a stance annotation exercise. We conclude that although inter-annotator agreement on stance labeling on the level of speech segments is low, a majority voting meta-annotator" is able to reveal the important dynamics in stance taking in a police interview. Then we explore the relation between the stance taken by the suspect and turn-taking behaviour, overlaps, interruptions, pauses and silences. Our findings contribute to building computational models of non-player characters that allow more natural turn-taking behaviour in serious games\ud instead of the one-at-a-time regime in interview training games

    Infinite Hidden Conditional Random Fields for the Recognition of Human Behaviour

    No full text
    While detecting and interpreting temporal patterns of nonverbal behavioral cues in a given context is a natural and often unconscious process for humans, it remains a rather difficult task for computer systems. In this thesis we are primarily motivated by the problem of recognizing expressions of high--level behavior, and specifically agreement and disagreement. We thoroughly dissect the problem by surveying the nonverbal behavioral cues that could be present during displays of agreement and disagreement; we discuss a number of methods that could be used or adapted to detect these suggested cues; we list some publicly available databases these tools could be trained on for the analysis of spontaneous, audiovisual instances of agreement and disagreement, we examine the few existing attempts at agreement and disagreement classification, and we discuss the challenges in automatically detecting agreement and disagreement. We present experiments that show that an existing discriminative graphical model, the Hidden Conditional Random Field (HCRF) is the best performing on this task. The HCRF is a discriminative latent variable model which has been previously shown to successfully learn the hidden structure of a given classification problem (provided an appropriate validation of the number of hidden states). We show here that HCRFs are also able to capture what makes each of these social attitudes unique. We present an efficient technique to analyze the concepts learned by the HCRF model and show that these coincide with the findings from social psychology regarding which cues are most prevalent in agreement and disagreement. Our experiments are performed on a spontaneous expressions dataset curated from real televised debates. The HCRF model outperforms conventional approaches such as Hidden Markov Models and Support Vector Machines. Subsequently, we examine existing graphical models that use Bayesian nonparametrics to have a countably infinite number of hidden states and adapt their complexity to the data at hand. We identify a gap in the literature that is the lack of a discriminative such graphical model and we present our suggestion for the first such model: an HCRF with an infinite number of hidden states, the Infinite Hidden Conditional Random Field (IHCRF). In summary, the IHCRF is an undirected discriminative graphical model for sequence classification and uses a countably infinite number of hidden states. We present two variants of this model. The first is a fully nonparametric model that relies on Hierarchical Dirichlet Processes and a Markov Chain Monte Carlo inference approach. The second is a semi--parametric model that uses Dirichlet Process Mixtures and relies on a mean--field variational inference approach. We show that both models are able to converge to a correct number of represented hidden states, and perform as well as the best finite HCRFs ---chosen via cross--validation--- for the difficult tasks of recognizing instances of agreement, disagreement, and pain in audiovisual sequences.Open Acces

    LAIX-score : a design framework for live audience interaction management systems

    Get PDF
    This study focuses on computer-supported live audience interaction. In conventional lectures audience interacts explicitly with the performer for example by waving hand and asking question directly or clapping hands. For decades, non digital audience response systems have enabled simple multiple option audience interaction patterns. Modern mobile personal computing devices, digital projectors, wireless networks and real time software platforms enable creation of new kinds of interaction patterns that can significantly increase the amount of audience interaction during events. Audience interaction can make events for example more engaging and productive. This research presents a design framework for computer-supported live audience interaction called the LAIX-score. LAIX stands for Live Audience Interac(X)tion and the “score” refers to the musical notation language. Musical notation has been an inspiration for the development of the framework and illustrates how LAIX-score is intended as generic and practical framework for coordinating live audience interaction similarly as musical notation is generic and practical framework for coordinating musical performances. However, while musical notation is important inspiration, it is not the core reference for the LAIX-score. LAIX-score core references are the live audio mixing and live light control frameworks, which are technologyenabled frameworks for supporting and producing live performances. The LAIX-score framework is composed of five core elements: Interaction activities, interface channels, state control matrix, temporal management of interactions and participant’s identity management. These five core elements compose a concrete and comprehensive framework that can be directly applied in the design of live audience interaction management system and in the development of live audience interaction production practices. The research is a constructive and practice-led in the wild research (Chapter 2) that borrows aspects from design research, artistic research and human-computer interaction research. The LAIX-score framework is based on three core requirements identified during a five years of practice-led domain exploration (Chapter 3). (Requirement 1) Live audience interaction must support different kinds of interaction patterns. Hence, the framework should acknowledge that live audience interaction is more than questions and answers (Q&A) and poll type interaction patterns. (Requirement 2) Live audience interaction must support different roles. Hence, the role configuration in live audience interaction can include several different performer, audience and orchestrator roles. (Requirement 3) Live audience interaction framework must also support different kinds and parallel functions live audience interaction function. Hence, in the same event production live audience interaction may be used for example for audience activation, workshop facilitation, participatory decision making and catalyzing social networking, and these functions may take place concurrently. None of the existing live audience interaction systems satisfy all of the core requirements. This is explained in more detail in Section 4.2. Lack of adequate designs that meets the above mentioned criterias justifies the development of a new design framework. The LAIX-score (Chapter 5) follows a two dimensional matrix type control framework, which is called state control matrix. Also the core references, live audio mixing and live light control (Sections 4.3 –4.5), have similar control framework. Rows in the state control matrix are called as interaction activities. Columns in the state control matrix are interface channels, which is the system equivalent for supporting different roles and user interfaces (requirement 2). The matrix is used for visibility control of the interaction activities. The visibility of interaction activities can be manipulated independently in each interface channel. The matrix form satisfies the three core requirements. The first requirement is satisfied since the matrix format is agnostic to what kind of interactions are controlled in the system. The second requirement is satisfied since the matrix format allows introduction of new roles and there is fundamentally no fixed number for rows. The third requirement is satisfied since multiple interaction activities can be active in any channel and each interaction activity state can be controlled independently. The core framework is implemented as a functional live audience interaction management system called Presemo (version 4) (Chapter 6). The evaluation of the design of Presemo reveals more detailed fivetier structure for the control of interaction activities . The interaction activity control levels in LAIX-score design framework are (1.) creation and deletion, (2.) state control matrix, (3.) interaction pattern specific control, (4.) content management and (5.) presentation management. Presemo is limited implementation of the framework since the basic version supports only four interface channels. Presemo is a commercial level system and it has been utilized in thousands of live audience interaction situations and we have used it to produce more than 100 live audience interaction productions. The research investigates four case studies in more detail (Chapter 7). These four case studies are produced in different environments and this way demonstrate the generic qualities of Presemo and the LAIX-score design framework. One of the case study production focuses on professional event productions, another in application of Presemo in University context, third one focuses on use of live audience interaction in large scale computer-supported workshops and fourth one presents use of live audience interaction techniques in a pervasive adventure designed for K 12 students. The case studies validate the three core requirements and identifies 11 new additional requirements for the LAIX-score matrix. The case studies also reveal a more detailed interface channel structure. The revised LAIX-score design framework divides interface channels in three groups: organizer channels, audience channels and screen channels. Organizer channels combines performer and orchestrator roles, since these are roles that have some kind of control over interaction activities. Audience interface channels can be divided in groups. Screen channels are public channels whereas organizer and audience channels are personal channels. The 11 new requirements are further elaborated as two new core elements of the LAIX-score framework (Chapter 8): temporal management and identity management. Temporal management is divided in three parts; the functional cue list realizes the future temporal management, state control matrix realizes the real time management, and the production log realizes the management of past events. Identity management core element can be visualized as a table that lists all identities on one axis and different identity parameters on another axis. The study has identified six different types of identity attribute categories: identifiers, group membership, access rights, privacy settings, other identity and profile parameters and score attributes used for gamification. Identity attributes and privacy settings are used to manage identity parameters in order to achieve privacy and anonymity, which are important characteristics for most live audience interaction productions. Case studies have shown also that gamification is an important feature for live audience interaction. The core objective of the research is to create a framework for live audience interaction that could be generic and practical. As uch, the study is directly relevant extensive case reference of a live audience interaction system researchers and live audience interaction producers. The framework is adequately described so that any developer can utilize it in their own live audience interaction system designs. Methodologically the research has some areas of improvements mainly due to challenges in organizing data collection in demanding production environments (Section 9.3). These problems are common for in the wild research. The strengths of this research are extensive coverage of the live audience interaction domain and concrete validation of the framework as a production level implemented software system. While we have been developing the LAIX-score framework we have also identified several other research topics for live audience interaction (explained in Section 10.3) that are beyond the scope of the LAIX-score framework. There are for example several issues related to human and organizational factors of live audience interaction that are not covered in the LAIX-score framework, which is designed for the development of the computer system and production practices. These other research topics demonstrate how live audience interaction domain is still emerging domain with many interesting research possibilities. During the study, we have been involved in commercial development of live audience interaction. The business and marketing development (Section 10.4) will most probably be the driving force for the development of new interaction patterns, live audience interaction production formats, professional practices and generally new applications for live audience interaction. The further business and marketing development will define how organizations can adopt live audience interaction techniques and integrate them in to their communication and participation processes. The study proposes that standards organization would start defining protocols for live audience interaction. Details of wider adoption will ultimately define what kind of further research is relevant and feasible in the live audience interaction domain. The five core elements of the LAIX-score are integrated to each other and together they compose a comprehensive framework that can be used as design guideline for generic live audience interaction system (LAIMS). A LAIMS that is based on LAIX-score can host modularly different kinds of interaction patterns (Section 10.2). Modular approach can be also called s interaction agnostic approach. The modular approach may have several implications: modular approach makes development of new interaction patterns easier, support event productions that host different live audience interaction approaches, support sustainable system evolution and establishment of management practices for live audience interaction productions

    A practical guide to conversation research: how to study what people say to each other

    Get PDF
    Conversation—a verbal interaction between two or more people—is a complex, pervasive, and consequential human behavior. Conversations have been studied across many academic disciplines. However, advances in recording and analysis techniques over the last decade have allowed researchers to more directly and precisely examine conversations in natural contexts and at a larger scale than ever before, and these advances open new paths to understand humanity and the social world. Existing reviews of text analysis and conversation research have focused on text generated by a single author (e.g., product reviews, news articles, and public speeches) and thus leave open questions about the unique challenges presented by interactive conversation data (i.e., dialogue). In this article, we suggest approaches to overcome common challenges in the workflow of conversation science, including recording and transcribing conversations, structuring data (to merge turn-level and speaker-level data sets), extracting and aggregating linguistic features, estimating effects, and sharing data. This practical guide is meant to shed light on current best practices and empower more researchers to study conversations more directly—to expand the community of conversation scholars and contribute to a greater cumulative scientific understanding of the social world

    Fostering awareness and collaboration in large-class lectures

    Get PDF
    For decades, higher education has been shaped by large-class lectures, which are characterized by large anonymous audiences. Well known issues of large-class lectures are a rather low degree of interactivity and a notable passivity of students, which are aggravated by the social environment created by large audiences. However, research indicates that an active involvement is indispensable for learning to be successful. Active partaking in lectures is thus often a goal of technology- supported lectures. An outstanding feature of social media is certainly their capabilities of facilitating interactions in large groups of participants. Social media thus seem to be a suitable basis for technology-enhanced learning in large-class lectures. However, existing general-purpose social media are often accompanied by several shortcomings that are assumed to hinder their proper use in lectures. This thesis therefore deals with the conception of a social medium, called Backstage, specially tailored for use in large-class lectures. Backstage provides both lecturer- as well as student-initiated communication by means of an Audience Response System and a backchannel. Audience Response Systems allow running quizzes in lectures, e.g., to assess knowledge, and can thus be seen as a technological support of question asking by the lecturer. These systems collect and aggregate the students' answers and report the results back to the audience in real-time. Audience Response Systems have shown to be a very effective means for sustaining lecture- relevant interactivity in lectures. Using a backchannel, students can initiate communication with peers or the lecturer. The backchannel is built upon microblogging, which has become a very popular communication medium in recent years. A key characteristic of microblogging is that messages are very concise, comprising only few words. The brief form of communication makes microblogging quite appealing for a backchannel in lectures. A preliminary evaluation of a first prototype conducted at an early stage of the project, however, indicated that a conventional digital backchannel is prone to information overload. Even a relatively small group can quickly render the backchannel discourse incomprehensible. This incomprehensibility is rooted in a lack of interactional coherence, a rather low communication efficiency, a high information entropy, and a lack of connection between the backchannel and the frontchannel, i.e., the lecture’s discourse. This thesis investigates remedies to these issues. To this aim, lecture slides are integrated in the backchannel to structure and to provide context for the backchannel discourse. The backchannel communication is revised to realize a collaborative annotation of slides by typed backchannel posts. To reduce information entropy backchannel posts have to be assigned to predefined categories. To establish a connection with the frontchannel, backchannel posts have to be stuck on appropriate locations on slides. The lecture slides also improve communication efficiency by routing, which means that the backchannel can filter such that it only shows the posts belonging to the currently displayed slide. Further improvements and modifications, e.g., of the Audience Response System, are described in this thesis. This thesis also reports on an evaluation of Backstage in four courses. The outcomes are promising. Students welcomed the use of Backstage. Backstage not only succeeded in increasing interactivity but also contributed to social awareness, which is a prerequisite of active participation. Furthermore, the backchannel communication was highly lecture-relevant. As another important result, an additional study conducted in collaboration with educational scientists was able to show that students in Backstage-supported lectures used their mobile devices to a greater extent for lecture-relevant activities compared to students in conventional lectures, in which mobile devices were mostly used for lecture-unrelated activities. To establish social control of the backchannel, this thesis investigates rating and ranking of backchannel posts. Furthermore, this thesis proposes a reputation system that aims at incentivizing desirable behavior in the backchannel. The reputation system is based on an eigenvector centrality similar to Google's PageRank. It is highly customizable and also allows considering quiz performance in the computation of reputation. All these approaches, rating, ranking as well as reputation systems have proven to be very effective mechanisms of social control in general-purpose social media.Seit Jahrzenten wird die universitäre Lehre durch Massenvorlesungen, die sich durch sehr große anonyme Hörerschaften auszeichnen, geprägt. Wohlbekannte Probleme von Massenvorlesungen sind ein sehr niedriger Grad an Interaktivität als auch eine augeprägte Passivität von Studenten, die auch durch die sozialen Rahmenbedingungen in großen Hörerschaften begünstigt werden. Dabei ist bekannt, dass eine aktive Auseinandersetzung mit dem Lernstoff für ein erfolgreiches Lernen unabdingbar ist. Eine aktive Teilnahme in Vorlesungen ist daher oft ein Ziel technologieunterstützter Vorlesungen. Ein herausragendes Merkmal von sozialen Medien ist sicherlich die Fähigkeit, Interaktionen in großen Gruppen zu ermöglichen. Soziale Medien scheinen deshalb eine geeignete Grundlage für technologie- unterstütztes Lernen zu sein. Jedoch sind allgemeine soziale Medien häufig auch mit Unzulänglichkeiten behaftet, die eine zweckmäßige Nutzung in Vorlesungen erschweren. Diese Arbeit beschäftigt sich deshalb mit der Konzipierung eines sozialen Mediums genannt Backstage, das speziell für die Nutzung in Vorlesungen zugeschnitten ist. Backstage ermöglicht sowohl dozenten- als auch eine studenteninitiierte Kommunikation mit Hilfe eines Audience Response Systems und eines Backchannels. Audience Response Systeme ermöglichen die Durchführung von Quizzen in Vorlesungen, beispielsweise um Wissen abzufragen, und können so als eine technologische Unterstützung des Fragenstellen durch den Dozenten betrachtet werden. Diese Systeme sammeln und aggregieren die Antworten der Studenten und liefern in Echtzeit die Ergebnisse zurück an die Hörerschaft. Es konnte gezeigt werden, dass Audience Response Systeme effektive Mittel zur Aufrechterhaltung vorlesungsbezogener Interaktivität sind. Durch einen Backchannel können auch Studenten Kommunikation mit anderen Studenten oder dem Dozenten initiieren. Der auf Backstage verfügbare Backchannel basiert auf Microblogging, was sich über die letzten Jahre zu einem sehr beliebten Kommunikationsmedium entwickelt hat. Eine Schlüsseleigenschaft des Microbloggings ist die Kürze von Nachrichten, die aus nur wenigen Wörtern bestehen. Die knappe Kommunikationsform macht Microblogging sehr attraktiv als Backchannel für Vorlesungen. Eine vorläufige Evaluation des ersten Prototyps, die zu einem frühen Zeitpunkt im Projekt durchgeführt wurde, zeigte jedoch, dass ein konventionelles Backchannel dazu neigt, die Teilnehmer zu überlasten. Sogar der Backchannel-Diskurs einer relativ kleinen Gruppe kann schnell unüberschaubar werden. Die Unüberschaubarkeit hat ihre Ursachen in einer mangelnden interaktionalen Kohärenz, eine vergleichsweise niedrige Kommunikationseffizienz, eine hohe Informationsentropie und eine fehlende Verknüpfung zwischen Backchannel und dem Vorlesungsvortrag. Diese Arbeit untersucht mögliche Abhilfen für die genannten Probleme. So werden Vorlesungsfolien integriert, um damit den Austausch auf dem Backchannel zu strukturieren und in einen Kontext zu bringen. Die Backchannel-Kommunikation wird zudem neu konzipiert, so dass es ein kollaboratives Annotieren von Folien mit Hilfe von getypten Backchannel-Nachrichten umsetzt. Die Typisierung von Backchannel-Nachrichten dient dazu, die Informationsentropie zu reduzieren. Um eine Verknüpfung mit dem Vorlesungsvortrag herzustellen, müssen zudem Backchannel-Nachrichten an die betreffenden Stellen auf Folien positioniert werden. Die Vorlesungsfolien verbessern auch die Kommunikationseffizienz durch das Routing, so dass der Backchannel nur die Nachrichten anzeigt, die zur aktuell angezeigten Folie gehören. Weitere Verbesserungen und Anpassungen des Systems, z.B. des Audience Response Systems, werden in dieser Arbeit beschrieben. Diese Arbeit berichtet über eine Evaluation von Backstage in vier großen Vorlesungen. Die Ergebnisse sind vielversprechend. So begrüßten die Studenten den Einsatz von Backstage. Backstage erhöhte nicht nur die Interaktivität sondern trug auch zur sozialen Awareness bei, die eine Voraussetzung für eine aktive Teilnahme ist. Die Backchannel-Kommunikation war zu einem hohen Grad vorlesungsbezogen. Zudem konnte in einer weiteren Studie, die zusammen mit Pädagogen durchgeführt wurde, gezeigt werden, dass Studenten ihre mobilen Endgeräte in Backstage-unterstützten Vorlesungen mehr für vorlesungsbezogene Aktivitäten genutzt haben als in konventionellen Vorlesungen, in welchen die mobilen Endgeräte hauptsächlich für vorlesungsfremde Aktivitäten genutzt wurden. Um soziale Kontrolle auf dem Backchannel zu etablieren, untersucht diese Arbeit Rating und Ranking von Backchannel-Nachrichten. Darüber hinaus schlägt diese Arbeit ein Reputationssystem vor, das als Ziel hat, einen Anreiz für erwünschtes Verhalten auf dem Backchannel zu schaffen. Das Reputationssystem basiert auf einer Eigenvektor-Zentralität, die an Googles PageRank angelehnt ist. Es ist zu einem hohen Grad anpassbar und ermöglicht auch die Berücksichtigung von Quizleistungen in der Berechnung von Reputation. Alle diese Ansätze, Rating, Ranking und Reputationssysteme haben sich in allgemeinen sozialen Medien als sehr effektive Mittel für soziale Kontrolle erwiesen
    corecore