1,459 research outputs found

    Sensing, interpreting, and anticipating human social behaviour in the real world

    Get PDF
    Low-level nonverbal social signals like glances, utterances, facial expressions and body language are central to human communicative situations and have been shown to be connected to important high-level constructs, such as emotions, turn-taking, rapport, or leadership. A prerequisite for the creation of social machines that are able to support humans in e.g. education, psychotherapy, or human resources is the ability to automatically sense, interpret, and anticipate human nonverbal behaviour. While promising results have been shown in controlled settings, automatically analysing unconstrained situations, e.g. in daily-life settings, remains challenging. Furthermore, anticipation of nonverbal behaviour in social situations is still largely unexplored. The goal of this thesis is to move closer to the vision of social machines in the real world. It makes fundamental contributions along the three dimensions of sensing, interpreting and anticipating nonverbal behaviour in social interactions. First, robust recognition of low-level nonverbal behaviour lays the groundwork for all further analysis steps. Advancing human visual behaviour sensing is especially relevant as the current state of the art is still not satisfactory in many daily-life situations. While many social interactions take place in groups, current methods for unsupervised eye contact detection can only handle dyadic interactions. We propose a novel unsupervised method for multi-person eye contact detection by exploiting the connection between gaze and speaking turns. Furthermore, we make use of mobile device engagement to address the problem of calibration drift that occurs in daily-life usage of mobile eye trackers. Second, we improve the interpretation of social signals in terms of higher level social behaviours. In particular, we propose the first dataset and method for emotion recognition from bodily expressions of freely moving, unaugmented dyads. Furthermore, we are the first to study low rapport detection in group interactions, as well as investigating a cross-dataset evaluation setting for the emergent leadership detection task. Third, human visual behaviour is special because it functions as a social signal and also determines what a person is seeing at a given moment in time. Being able to anticipate human gaze opens up the possibility for machines to more seamlessly share attention with humans, or to intervene in a timely manner if humans are about to overlook important aspects of the environment. We are the first to propose methods for the anticipation of eye contact in dyadic conversations, as well as in the context of mobile device interactions during daily life, thereby paving the way for interfaces that are able to proactively intervene and support interacting humans.Blick, Gesichtsausdrücke, Körpersprache, oder Prosodie spielen als nonverbale Signale eine zentrale Rolle in menschlicher Kommunikation. Sie wurden durch vielzählige Studien mit wichtigen Konzepten wie Emotionen, Sprecherwechsel, Führung, oder der Qualität des Verhältnisses zwischen zwei Personen in Verbindung gebracht. Damit Menschen effektiv während ihres täglichen sozialen Lebens von Maschinen unterstützt werden können, sind automatische Methoden zur Erkennung, Interpretation, und Antizipation von nonverbalem Verhalten notwendig. Obwohl die bisherige Forschung in kontrollierten Studien zu ermutigenden Ergebnissen gekommen ist, bleibt die automatische Analyse nonverbalen Verhaltens in weniger kontrollierten Situationen eine Herausforderung. Darüber hinaus existieren kaum Untersuchungen zur Antizipation von nonverbalem Verhalten in sozialen Situationen. Das Ziel dieser Arbeit ist, die Vision vom automatischen Verstehen sozialer Situationen ein Stück weit mehr Realität werden zu lassen. Diese Arbeit liefert wichtige Beiträge zur autmatischen Erkennung menschlichen Blickverhaltens in alltäglichen Situationen. Obwohl viele soziale Interaktionen in Gruppen stattfinden, existieren unüberwachte Methoden zur Augenkontakterkennung bisher lediglich für dyadische Interaktionen. Wir stellen einen neuen Ansatz zur Augenkontakterkennung in Gruppen vor, welcher ohne manuelle Annotationen auskommt, indem er sich den statistischen Zusammenhang zwischen Blick- und Sprechverhalten zu Nutze macht. Tägliche Aktivitäten sind eine Herausforderung für Geräte zur mobile Augenbewegungsmessung, da Verschiebungen dieser Geräte zur Verschlechterung ihrer Kalibrierung führen können. In dieser Arbeit verwenden wir Nutzerverhalten an mobilen Endgeräten, um den Effekt solcher Verschiebungen zu korrigieren. Neben der Erkennung verbessert diese Arbeit auch die Interpretation sozialer Signale. Wir veröffentlichen den ersten Datensatz sowie die erste Methode zur Emotionserkennung in dyadischen Interaktionen ohne den Einsatz spezialisierter Ausrüstung. Außerdem stellen wir die erste Studie zur automatischen Erkennung mangelnder Verbundenheit in Gruppeninteraktionen vor, und führen die erste datensatzübergreifende Evaluierung zur Detektion von sich entwickelndem Führungsverhalten durch. Zum Abschluss der Arbeit präsentieren wir die ersten Ansätze zur Antizipation von Blickverhalten in sozialen Interaktionen. Blickverhalten hat die besondere Eigenschaft, dass es sowohl als soziales Signal als auch der Ausrichtung der visuellen Wahrnehmung dient. Somit eröffnet die Fähigkeit zur Antizipation von Blickverhalten Maschinen die Möglichkeit, sich sowohl nahtloser in soziale Interaktionen einzufügen, als auch Menschen zu warnen, wenn diese Gefahr laufen wichtige Aspekte der Umgebung zu übersehen. Wir präsentieren Methoden zur Antizipation von Blickverhalten im Kontext der Interaktion mit mobilen Endgeräten während täglicher Aktivitäten, als auch während dyadischer Interaktionen mittels Videotelefonie

    A Deep Learning Approach to Understanding Real-World Scene Perception in Autism

    Get PDF
    Autism is a multifaceted neurodevelopmental condition. Around 90% of individuals with autism experience sensory sensitivities, particularly impacting visual perception. Despite this high percentage, previous studies investigating visual perception in autism impose severe limitations on our understanding. In many of these experiments, their stimuli and experimental methods are un-naturalistic and produce unreproducible and conflicting results. In this study, we investigate the nature of the real-world visual experience in autism with a cutting-edge experimental approach. First, we use virtual reality headsets with eye-trackers to measure gaze behavior while individuals freely explore real-world, everyday scenes. Then, we compare their gaze behavior to the representations within convolutional neural networks (CNNs), a class of computational models resemblant of the primate visual system. This allows us to model the stages of the visual processing hierarchy that could account for differences in visual processing between individuals with and without autism. To our knowledge, this is the first fully unbiased, data-driven approach to studying naturalistic visual behavior in autism. In brief, we found that convolutional neural networks, regardless of the task upon which they were trained, are better able to predict gaze behavior in typically developing controls than in individuals with autism. This suggests that differences in gaze behavior between the two groups are not principally driven by the semantically-meaningful features within a scene and emerge from differences earlier in visual processing

    Objects predict fixations better than early saliency

    Get PDF
    Humans move their eyes while looking at scenes and pictures. Eye movements correlate with shifts in attention and are thought to be a consequence of optimal resource allocation for high-level tasks such as visual recognition. Models of attention, such as “saliency maps,” are often built on the assumption that “early” features (color, contrast, orientation, motion, and so forth) drive attention directly. We explore an alternative hypothesis: Observers attend to “interesting” objects. To test this hypothesis, we measure the eye position of human observers while they inspect photographs of common natural scenes. Our observers perform different tasks: artistic evaluation, analysis of content, and search. Immediately after each presentation, our observers are asked to name objects they saw. Weighted with recall frequency, these objects predict fixations in individual images better than early saliency, irrespective of task. Also, saliency combined with object positions predicts which objects are frequently named. This suggests that early saliency has only an indirect effect on attention, acting through recognized objects. Consequently, rather than treating attention as mere preprocessing step for object recognition, models of both need to be integrated

    Predicting human behavior in smart environments: theory and application to gaze prediction

    Get PDF
    Predicting human behavior is desirable in many application scenarios in smart environments. The existing models for eye movements do not take contextual factors into account. This addressed in this thesis using a systematic machine-learning approach, where user profiles for eye movements behaviors are learned from data. In addition, a theoretical innovation is presented, which goes beyond pure data analysis. The thesis proposed the modeling of eye movements as a Markov Decision Processes. It uses Inverse Reinforcement Learning paradigm to infer the user eye movements behaviors

    A Conceptual Model of Exploration Wayfinding: An Integrated Theoretical Framework and Computational Methodology

    Get PDF
    This thesis is an attempt to integrate contending cognitive approaches to modeling wayfinding behavior. The primary goal is to create a plausible model for exploration tasks within indoor environments. This conceptual model can be extended for practical applications in the design, planning, and Social sciences. Using empirical evidence a cognitive schema is designed that accounts for perceptual and behavioral preferences in pedestrian navigation. Using this created schema, as a guiding framework, the use of network analysis and space syntax act as a computational methods to simulate human exploration wayfinding in unfamiliar indoor environments. The conceptual model provided is then implemented in two ways. First of which is by updating an existing agent-based modeling software directly. The second means of deploying the model is using a spatial interaction model that distributed visual attraction and movement permeability across a graph-representation of building floor plans

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
    • …
    corecore