495 research outputs found

    Sensing, interpreting, and anticipating human social behaviour in the real world

    Get PDF
    Low-level nonverbal social signals like glances, utterances, facial expressions and body language are central to human communicative situations and have been shown to be connected to important high-level constructs, such as emotions, turn-taking, rapport, or leadership. A prerequisite for the creation of social machines that are able to support humans in e.g. education, psychotherapy, or human resources is the ability to automatically sense, interpret, and anticipate human nonverbal behaviour. While promising results have been shown in controlled settings, automatically analysing unconstrained situations, e.g. in daily-life settings, remains challenging. Furthermore, anticipation of nonverbal behaviour in social situations is still largely unexplored. The goal of this thesis is to move closer to the vision of social machines in the real world. It makes fundamental contributions along the three dimensions of sensing, interpreting and anticipating nonverbal behaviour in social interactions. First, robust recognition of low-level nonverbal behaviour lays the groundwork for all further analysis steps. Advancing human visual behaviour sensing is especially relevant as the current state of the art is still not satisfactory in many daily-life situations. While many social interactions take place in groups, current methods for unsupervised eye contact detection can only handle dyadic interactions. We propose a novel unsupervised method for multi-person eye contact detection by exploiting the connection between gaze and speaking turns. Furthermore, we make use of mobile device engagement to address the problem of calibration drift that occurs in daily-life usage of mobile eye trackers. Second, we improve the interpretation of social signals in terms of higher level social behaviours. In particular, we propose the first dataset and method for emotion recognition from bodily expressions of freely moving, unaugmented dyads. Furthermore, we are the first to study low rapport detection in group interactions, as well as investigating a cross-dataset evaluation setting for the emergent leadership detection task. Third, human visual behaviour is special because it functions as a social signal and also determines what a person is seeing at a given moment in time. Being able to anticipate human gaze opens up the possibility for machines to more seamlessly share attention with humans, or to intervene in a timely manner if humans are about to overlook important aspects of the environment. We are the first to propose methods for the anticipation of eye contact in dyadic conversations, as well as in the context of mobile device interactions during daily life, thereby paving the way for interfaces that are able to proactively intervene and support interacting humans.Blick, GesichtsausdrĂŒcke, Körpersprache, oder Prosodie spielen als nonverbale Signale eine zentrale Rolle in menschlicher Kommunikation. Sie wurden durch vielzĂ€hlige Studien mit wichtigen Konzepten wie Emotionen, Sprecherwechsel, FĂŒhrung, oder der QualitĂ€t des VerhĂ€ltnisses zwischen zwei Personen in Verbindung gebracht. Damit Menschen effektiv wĂ€hrend ihres tĂ€glichen sozialen Lebens von Maschinen unterstĂŒtzt werden können, sind automatische Methoden zur Erkennung, Interpretation, und Antizipation von nonverbalem Verhalten notwendig. Obwohl die bisherige Forschung in kontrollierten Studien zu ermutigenden Ergebnissen gekommen ist, bleibt die automatische Analyse nonverbalen Verhaltens in weniger kontrollierten Situationen eine Herausforderung. DarĂŒber hinaus existieren kaum Untersuchungen zur Antizipation von nonverbalem Verhalten in sozialen Situationen. Das Ziel dieser Arbeit ist, die Vision vom automatischen Verstehen sozialer Situationen ein StĂŒck weit mehr RealitĂ€t werden zu lassen. Diese Arbeit liefert wichtige BeitrĂ€ge zur autmatischen Erkennung menschlichen Blickverhaltens in alltĂ€glichen Situationen. Obwohl viele soziale Interaktionen in Gruppen stattfinden, existieren unĂŒberwachte Methoden zur Augenkontakterkennung bisher lediglich fĂŒr dyadische Interaktionen. Wir stellen einen neuen Ansatz zur Augenkontakterkennung in Gruppen vor, welcher ohne manuelle Annotationen auskommt, indem er sich den statistischen Zusammenhang zwischen Blick- und Sprechverhalten zu Nutze macht. TĂ€gliche AktivitĂ€ten sind eine Herausforderung fĂŒr GerĂ€te zur mobile Augenbewegungsmessung, da Verschiebungen dieser GerĂ€te zur Verschlechterung ihrer Kalibrierung fĂŒhren können. In dieser Arbeit verwenden wir Nutzerverhalten an mobilen EndgerĂ€ten, um den Effekt solcher Verschiebungen zu korrigieren. Neben der Erkennung verbessert diese Arbeit auch die Interpretation sozialer Signale. Wir veröffentlichen den ersten Datensatz sowie die erste Methode zur Emotionserkennung in dyadischen Interaktionen ohne den Einsatz spezialisierter AusrĂŒstung. Außerdem stellen wir die erste Studie zur automatischen Erkennung mangelnder Verbundenheit in Gruppeninteraktionen vor, und fĂŒhren die erste datensatzĂŒbergreifende Evaluierung zur Detektion von sich entwickelndem FĂŒhrungsverhalten durch. Zum Abschluss der Arbeit prĂ€sentieren wir die ersten AnsĂ€tze zur Antizipation von Blickverhalten in sozialen Interaktionen. Blickverhalten hat die besondere Eigenschaft, dass es sowohl als soziales Signal als auch der Ausrichtung der visuellen Wahrnehmung dient. Somit eröffnet die FĂ€higkeit zur Antizipation von Blickverhalten Maschinen die Möglichkeit, sich sowohl nahtloser in soziale Interaktionen einzufĂŒgen, als auch Menschen zu warnen, wenn diese Gefahr laufen wichtige Aspekte der Umgebung zu ĂŒbersehen. Wir prĂ€sentieren Methoden zur Antizipation von Blickverhalten im Kontext der Interaktion mit mobilen EndgerĂ€ten wĂ€hrend tĂ€glicher AktivitĂ€ten, als auch wĂ€hrend dyadischer Interaktionen mittels Videotelefonie

    LifeLogging: personal big data

    Get PDF
    We have recently observed a convergence of technologies to foster the emergence of lifelogging as a mainstream activity. Computer storage has become significantly cheaper, and advancements in sensing technology allows for the efficient sensing of personal activities, locations and the environment. This is best seen in the growing popularity of the quantified self movement, in which life activities are tracked using wearable sensors in the hope of better understanding human performance in a variety of tasks. This review aims to provide a comprehensive summary of lifelogging, to cover its research history, current technologies, and applications. Thus far, most of the lifelogging research has focused predominantly on visual lifelogging in order to capture life details of life activities, hence we maintain this focus in this review. However, we also reflect on the challenges lifelogging poses to an information retrieval scientist. This review is a suitable reference for those seeking a information retrieval scientist’s perspective on lifelogging and the quantified self

    Computer-aided investigation of interaction mediated by an AR-enabled wearable interface

    Get PDF
    Dierker A. Computer-aided investigation of interaction mediated by an AR-enabled wearable interface. Bielefeld: UniversitĂ€tsbibliothek Bielefeld; 2012.This thesis provides an approach on facilitating the analysis of nonverbal behaviour during human-human interaction. Thereby, much of the work that researchers do starting with experiment control, data acquisition, tagging and finally the analysis of the data is alleviated. For this, software and hardware techniques are used as sensor technology, machine learning, object tracking, data processing, visualisation and Augmented Reality. These are combined into an Augmented-Reality-enabled Interception Interface (ARbInI), a modular wearable interface for two users. The interface mediates the users’ interaction thereby intercepting and influencing it. The ARbInI interface consists of two identical setups of sensors and displays, which are mutually coupled. Combining cameras and microphones with sensors, the system offers to record rich multimodal interaction cues in an efficient way. The recorded data can be analysed online and offline for interaction features (e. g. head gestures in head movements, objects in joint attention, speech times) using integrated machine-learning approaches. The classified features can be tagged in the data. For a detailed analysis, the recorded multimodal data is transferred automatically into file bundles loadable in a standard annotation tool where the data can be further tagged by hand. For statistic analyses of the complete multimodal corpus, a toolbox for use in a standard statistics program allows to directly import the corpus and to automate the analysis of multimodal and complex relationships between arbitrary data types. When using the optional multimodal Augmented Reality techniques integrated into ARbInI, the camera records exactly what the participant can see and nothing more or less. The following additional advantages can be used during the experiment: (a) the experiment can be controlled by using the auditory or visual displays thereby ensuring controlled experimental conditions, (b) the experiment can be disturbed, thus offering to investigate how problems in interaction are discovered and solved, and (c) the experiment can be enhanced by interactively comprising the behaviour of the user thereby offering to investigate how users cope with novel interaction channels. This thesis introduces criteria for the design of scenarios in which interaction analysis can benefit from the experimentation interface and presents a set of scenarios. These scenarios are applied in several empirical studies thereby collecting multimodal corpora that particularly include head gestures. The capabilities of computer-aided interaction analysis for the investigation of speech, visual attention and head movements are illustrated on this empirical data. The effects of the head-mounted display (HMD) are evaluated thoroughly in two studies. The results show that the HMD users need more head movements to achieve the same shift of gaze direction and perform less head gestures with slower velocity and fewer repetitions compared to non-HMD users. From this, a reduced willingness to perform head movements if not necessary can be concluded. Moreover, compensation strategies are established like leaning backwards to enlarge the field of view, and increasing the number of utterances or changing the reference to objects to compensate for the absence of mutual eye contact. Two studies investigate the interaction while actively inducing misunderstandings. The participants here use compensation strategies like multiple verification questions and arbitrary gaze movements. Additionally, an enhancement method that highlights the visual attention of the interaction partner is evaluated in a search task. The results show a significantly shorter reaction time and fewer errors

    From head to toe:body movement for human-computer interaction

    Get PDF
    Our bodies are the medium through which we experience the world around us, so human-computer interaction can highly benefit from the richness of body movements and postures as an input modality. In recent years, the widespread availability of inertial measurement units and depth sensors led to the development of a plethora of applications for the body in human-computer interaction. However, the main focus of these works has been on using the upper body for explicit input. This thesis investigates the research space of full-body human-computer interaction through three propositions. The first proposition is that there is more to be inferred by natural users’ movements and postures, such as the quality of activities and psychological states. We develop this proposition in two domains. First, we explore how to support users in performing weight lifting activities. We propose a system that classifies different ways of performing the same activity; an object-oriented model-based framework for formally specifying activities; and a system that automatically extracts an activity model by demonstration. Second, we explore how to automatically capture nonverbal cues for affective computing. We developed a system that annotates motion and gaze data according to the Body Action and Posture coding system. We show that quality analysis can add another layer of information to activity recognition, and that systems that support the communication of quality information should strive to support how we implicitly communicate movement through nonverbal communication. Further, we argue that working at a higher level of abstraction, affect recognition systems can more directly translate findings from other areas into their algorithms, but also contribute new knowledge to these fields. The second proposition is that the lower limbs can provide an effective means of interacting with computers beyond assistive technology To address the problem of the dispersed literature on the topic, we conducted a comprehensive survey on the lower body in HCI, under the lenses of users, systems and interactions. To address the lack of a fundamental understanding of foot-based interactions, we conducted a series of studies that quantitatively characterises several aspects of foot-based interaction, including Fitts’s Law performance models, the effects of movement direction, foot dominance and visual feedback, and the overhead incurred by using the feet together with the hand. To enable all these studies, we developed a foot tracker based on a Kinect mounted under the desk. We show that the lower body can be used as a valuable complementary modality for computing input. Our third proposition is that by treating body movements as multiple modalities, rather than a single one, we can enable novel user experiences. We develop this proposition in the domain of 3D user interfaces, as it requires input with multiple degrees of freedom and offers a rich set of complex tasks. We propose an approach for tracking the whole body up close, by splitting the sensing of different body parts across multiple sensors. Our setup allows tracking gaze, head, mid-air gestures, multi-touch gestures, and foot movements. We investigate specific applications for multimodal combinations in the domain of 3DUI, specifically how gaze and mid-air gestures can be combined to improve selection and manipulation tasks; how the feet can support the canonical 3DUI tasks; and how a multimodal sensing platform can inspire new 3D game mechanics. We show that the combination of multiple modalities can lead to enhanced task performance, that offloading certain tasks to alternative modalities not only frees the hands, but also allows simultaneous control of multiple degrees of freedom, and that by sensing different modalities separately, we achieve a more detailed and precise full body tracking
    • 

    corecore