3,541 research outputs found

    A Virtual Conversational Agent for Teens with Autism: Experimental Results and Design Lessons

    Full text link
    We present the design of an online social skills development interface for teenagers with autism spectrum disorder (ASD). The interface is intended to enable private conversation practice anywhere, anytime using a web-browser. Users converse informally with a virtual agent, receiving feedback on nonverbal cues in real-time, and summary feedback. The prototype was developed in consultation with an expert UX designer, two psychologists, and a pediatrician. Using the data from 47 individuals, feedback and dialogue generation were automated using a hidden Markov model and a schema-driven dialogue manager capable of handling multi-topic conversations. We conducted a study with nine high-functioning ASD teenagers. Through a thematic analysis of post-experiment interviews, identified several key design considerations, notably: 1) Users should be fully briefed at the outset about the purpose and limitations of the system, to avoid unrealistic expectations. 2) An interface should incorporate positive acknowledgment of behavior change. 3) Realistic appearance of a virtual agent and responsiveness are important in engaging users. 4) Conversation personalization, for instance in prompting laconic users for more input and reciprocal questions, would help the teenagers engage for longer terms and increase the system's utility

    Social behavior modeling based on Incremental Discrete Hidden Markov Models

    No full text
    12 pagesInternational audienceModeling multimodal face-to-face interaction is a crucial step in the process of building social robots or users-aware Embodied Conversational Agents (ECA). In this context, we present a novel approach for human behavior analysis and generation based on what we called "Incremental Discrete Hidden Markov Model" (IDHMM). Joint multimodal activities of interlocutors are first modeled by a set of DHMMs that are specific to supposed joint cognitive states of the interlocutors. Respecting a task-specific syntax, the IDHMM is then built from these DHMMs and split into i) a recognition model that will determine the most likely sequence of cognitive states given the multimodal activity of the in- terlocutor, and ii) a generative model that will compute the most likely activity of the speaker given this estimated sequence of cognitive states. Short-Term Viterbi (STV) decoding is used to incrementally recognize and generate behav- ior. The proposed model is applied to parallel speech and gaze data of interact- ing dyads

    Towards responsive Sensitive Artificial Listeners

    Get PDF
    This paper describes work in the recently started project SEMAINE, which aims to build a set of Sensitive Artificial Listeners – conversational agents designed to sustain an interaction with a human user despite limited verbal skills, through robust recognition and generation of non-verbal behaviour in real-time, both when the agent is speaking and listening. We report on data collection and on the design of a system architecture in view of real-time responsiveness

    The 'who' and 'what' of #diabetes on Twitter

    Get PDF
    Social media are being increasingly used for health promotion, yet the landscape of users, messages and interactions in such fora is poorly understood. Studies of social media and diabetes have focused mostly on patients, or public agencies addressing it, but have not looked broadly at all the participants or the diversity of content they contribute. We study Twitter conversations about diabetes through the systematic analysis of 2.5 million tweets collected over 8 months and the interactions between their authors. We address three questions: (1) what themes arise in these tweets?, (2) who are the most influential users?, (3) which type of users contribute to which themes? We answer these questions using a mixed-methods approach, integrating techniques from anthropology, network science and information retrieval such as thematic coding, temporal network analysis, and community and topic detection. Diabetes-related tweets fall within broad thematic groups: health information, news, social interaction, and commercial. At the same time, humorous messages and references to popular culture appear consistently, more than any other type of tweet. We classify authors according to their temporal 'hub' and 'authority' scores. Whereas the hub landscape is diffuse and fluid over time, top authorities are highly persistent across time and comprise bloggers, advocacy groups and NGOs related to diabetes, as well as for-profit entities without specific diabetes expertise. Top authorities fall into seven interest communities as derived from their Twitter follower network. Our findings have implications for public health professionals and policy makers who seek to use social media as an engagement tool and to inform policy design.Comment: 25 pages, 11 figures, 7 tables. Supplemental spreadsheet available from http://journals.sagepub.com/doi/suppl/10.1177/2055207616688841, Digital Health, Vol 3, 201

    A Neural Computation for Visual Acuity in the Presence of Eye Movements

    Get PDF
    Humans can distinguish visual stimuli that differ by features the size of only a few photoreceptors. This is possible despite the incessant image motion due to fixational eye movements, which can be many times larger than the features to be distinguished. To perform well, the brain must identify the retinal firing patterns induced by the stimulus while discounting similar patterns caused by spontaneous retinal activity. This is a challenge since the trajectory of the eye movements, and consequently, the stimulus position, are unknown. We derive a decision rule for using retinal spike trains to discriminate between two stimuli, given that their retinal image moves with an unknown random walk trajectory. This algorithm dynamically estimates the probability of the stimulus at different retinal locations, and uses this to modulate the influence of retinal spikes acquired later. Applied to a simple orientation-discrimination task, the algorithm performance is consistent with human acuity, whereas naive strategies that neglect eye movements perform much worse. We then show how a simple, biologically plausible neural network could implement this algorithm using a local, activity-dependent gain and lateral interactions approximately matched to the statistics of eye movements. Finally, we discuss evidence that such a network could be operating in the primary visual cortex

    A dynamic texture based approach to recognition of facial actions and their temporal models

    Get PDF
    In this work, we propose a dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modeling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Nonrigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2 percent for the MHI method and 94.3 percent for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener data set

    Interaction Recognition in a Paired Egocentric Video

    Get PDF
    Wearable devices and affective computing have gained popularity in the recent times. Egocentric videos recorded using these devices can be used to understand the emotions of the camera wearer and the person interacting with the camera wearer. Emotions affect the facial expression, head movement and various other physiological factors. In order to perform this study we collected dyadic conversations (dialogues between two people) data from two different groups; one where two individuals agree on certain topic and second where two individuals disagree on certain topics. This data was collected using a wearable smart glass for video collection and a smart wristband for physiological data collection. Building this unique dataset was one of the significant contributions of this study. Using this data we extracted various features that include Galvanic Skin Response (GSR) data, facial expressions and 3D motion of a camera within an environment which is termed as Egomotion. We built two different machine learning models to model this data. In the first approach we use an application of Bayesian Hidden Markov model for classifying these individual videos from the paired conversations. In the second approach we use a Random Forest classifier to classify the data based on the Dynamic Time Warping data between the paired videos and individual average data for all the features in individual videos.The study found that in the presence of the limited data used in this work, individual behaviors were slightly more indicative of the type of discussion (85.43% accuracy) than the coupled behaviors (83.33% accuracy)

    Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

    Full text link
    This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System

    An Eye Gaze Model for Controlling the Display of Social Status in Believable Virtual Humans

    Get PDF
    Abstract—Designing highly believable characters remains a major concern within digital games. Matching a chosen personality and other dramatic qualities to displayed behavior is an important part of improving overall believability. Gaze is a critical component of social exchanges and serves to make characters engaging or aloof, as well as to establish character’s role in a conversation. In this paper, we investigate the communication of status related social signals by means of a virtual human’s eye gaze. We constructed a cross-domain verbal-conceptual computational model of gaze for virtual humans to facilitate the display of social status. We describe the validation of the model’s parameters, including the length of eye contact and gazes, movement velocity, equilibrium response, and head and body posture. In a first set of studies, conducted on Amazon Mechanical Turk using prerecorded video clips of animated characters, we found statistically significant differences in how the characters’ status was rated based on the variation in social status. In a second step based on these empirical findings, we designed an interactive system that incorporates dynamic eye tracking and spoken dialog, along with real-time control of a virtual character. We evaluated the model using a presential, interactive scenario of a simulated hiring interview. Corroborating our previous finding, the interactive study yielded significant differences in perception of status were found (p = .046). Thus, we believe status is an important aspect of dramatic believability, and accordingly, this paper presents our social eye gaze model for realistic procedurally animated characters and shows its efficacy. Index Terms—procedural animation, believable characters, virtual human, gaze, social interaction, nonverbal behaviour, video game
    corecore