899 research outputs found

    GazeDPM: Early Integration of Gaze Information in Deformable Part Models

    Full text link
    An increasing number of works explore collaborative human-computer systems in which human gaze is used to enhance computer vision systems. For object detection these efforts were so far restricted to late integration approaches that have inherent limitations, such as increased precision without increase in recall. We propose an early integration approach in a deformable part model, which constitutes a joint formulation over gaze and visual data. We show that our GazeDPM method improves over the state-of-the-art DPM baseline by 4% and a recent method for gaze-supported object detection by 3% on the public POET dataset. Our approach additionally provides introspection of the learnt models, can reveal salient image structures, and allows us to investigate the interplay between gaze attracting and repelling areas, the importance of view-specific models, as well as viewers' personal biases in gaze patterns. We finally study important practical aspects of our approach, such as the impact of using saliency maps instead of real fixations, the impact of the number of fixations, as well as robustness to gaze estimation error

    Gaze transitions when learning with multimedia

    Get PDF
    Eye tracking methodology is used to examine the influence of interactive multimedia on the allocation of visual attention and its dynamics during learning. We hypothesized that an interactive simulation promotes more organized switching of attention between different elements of multimedia learning material, e.g., textual description and pictorial visualization. Participants studied a description of an algorithm accompanied either by an interactive simulation, self-paced animation, or static illustration. Using a novel framework for entropy-based comparison of gaze transition matrices, results showed that the interactive simulation elicited more careful visual investigation of the learning material as well as reading of the problem description through to its completion

    Proficiency-aware systems

    Get PDF
    In an increasingly digital world, technological developments such as data-driven algorithms and context-aware applications create opportunities for novel human-computer interaction (HCI). We argue that these systems have the latent potential to stimulate users and encourage personal growth. However, users increasingly rely on the intelligence of interactive systems. Thus, it remains a challenge to design for proficiency awareness, essentially demanding increased user attention whilst preserving user engagement. Designing and implementing systems that allow users to become aware of their own proficiency and encourage them to recognize learning benefits is the primary goal of this research. In this thesis, we introduce the concept of proficiency-aware systems as one solution. In our definition, proficiency-aware systems use estimates of the user's proficiency to tailor the interaction in a domain and facilitate a reflective understanding for this proficiency. We envision that proficiency-aware systems leverage collected data for learning benefit. Here, we see self-reflection as a key for users to become aware of necessary efforts to advance their proficiency. A key challenge for proficiency-aware systems is the fact that users often have a different self-perception of their proficiency. The benefits of personal growth and advancing one's repertoire might not necessarily be apparent to users, alienating them, and possibly leading to abandoning the system. To tackle this challenge, this work does not rely on learning strategies but rather focuses on the capabilities of interactive systems to provide users with the necessary means to reflect on their proficiency, such as showing calculated text difficulty to a newspaper editor or visualizing muscle activity to a passionate sportsperson. We first elaborate on how proficiency can be detected and quantified in the context of interactive systems using physiological sensing technologies. Through developing interaction scenarios, we demonstrate the feasibility of gaze- and electromyography-based proficiency-aware systems by utilizing machine learning algorithms that can estimate users' proficiency levels for stationary vision-dominant tasks (reading, information intake) and dynamic manual tasks (playing instruments, fitness exercises). Secondly, we show how to facilitate proficiency awareness for users, including design challenges on when and how to communicate proficiency. We complement this second part by highlighting the necessity of toolkits for sensing modalities to enable the implementation of proficiency-aware systems for a wide audience. In this thesis, we contribute a definition of proficiency-aware systems, which we illustrate by designing and implementing interactive systems. We derive technical requirements for real-time, objective proficiency assessment and identify design qualities of communicating proficiency through user reflection. We summarize our findings in a set of design and engineering guidelines for proficiency awareness in interactive systems, highlighting that proficiency feedback makes performance interpretable for the user.In einer zunehmend digitalen Welt schaffen technologische Entwicklungen - wie datengesteuerte Algorithmen und kontextabhängige Anwendungen - neuartige Interaktionsmöglichkeiten mit digitalen Geräten. Jedoch verlassen sich Nutzer oftmals auf die Intelligenz dieser Systeme, ohne dabei selbst auf eine persönliche Weiterentwicklung hinzuwirken. Wird ein solches Vorgehen angestrebt, verlangt dies seitens der Anwender eine erhöhte Aufmerksamkeit. Es ist daher herausfordernd, ein entsprechendes Design für Kompetenzbewusstsein (Proficiency Awareness) zu etablieren. Das primäre Ziel dieser Arbeit ist es, eine Methodik für das Design und die Implementierung von interaktiven Systemen aufzustellen, die Nutzer dabei unterstützen über ihre eigene Kompetenz zu reflektieren, um dadurch Lerneffekte implizit wahrnehmen können. Diese Arbeit stellt ein Konzept für fähigkeitsbewusste Systeme (proficiency-aware systems) vor, welche die Fähigkeiten von Nutzern abschätzen, die Interaktion entsprechend anpassen sowie das Bewusstsein der Nutzer über deren Fähigkeiten fördern. Hierzu sollten die Systeme gesammelte Daten von Nutzern einsetzen, um Lerneffekte sichtbar zu machen. Die Möglichkeit der Anwender zur Selbstreflexion ist hierbei als entscheidend anzusehen, um als Motivation zur Verbesserung der eigenen Fähigkeiten zu dienen. Eine zentrale Herausforderung solcher Systeme ist die Tatsache, dass Nutzer - im Vergleich zur Abschätzung des Systems - oft eine divergierende Selbstwahrnehmung ihrer Kompetenz haben. Im ersten Moment sind daher die Vorteile einer persönlichen Weiterentwicklung nicht unbedingt ersichtlich. Daher baut diese Forschungsarbeit nicht darauf auf, Nutzer über vorgegebene Lernstrategien zu unterrichten, sondern sie bedient sich der Möglichkeiten interaktiver Systeme, die Anwendern die notwendigen Hilfsmittel zur Verfügung stellen, damit diese selbst über ihre Fähigkeiten reflektieren können. Einem Zeitungseditor könnte beispielsweise die aktuelle Textschwierigkeit angezeigt werden, während einem passionierten Sportler dessen Muskelaktivität veranschaulicht wird. Zunächst wird herausgearbeitet, wie sich die Fähigkeiten der Nutzer mittels physiologischer Sensortechnologien erkennen und quantifizieren lassen. Die Evaluation von Interaktionsszenarien demonstriert die Umsetzbarkeit fähigkeitsbewusster Systeme, basierend auf der Analyse von Blickbewegungen und Muskelaktivität. Hierbei kommen Algorithmen des maschinellen Lernens zum Einsatz, die das Leistungsniveau der Anwender für verschiedene Tätigkeiten berechnen. Im Besonderen analysieren wir stationäre Aktivitäten, die hauptsächlich den Sehsinn ansprechen (Lesen, Aufnahme von Informationen), sowie dynamische Betätigungen, die die Motorik der Nutzer fordern (Spielen von Instrumenten, Fitnessübungen). Der zweite Teil zeigt auf, wie Systeme das Bewusstsein der Anwender für deren eigene Fähigkeiten fördern können, einschließlich der Designherausforderungen , wann und wie das System erkannte Fähigkeiten kommunizieren sollte. Abschließend wird die Notwendigkeit von Toolkits für Sensortechnologien hervorgehoben, um die Implementierung derartiger Systeme für ein breites Publikum zu ermöglichen. Die Forschungsarbeit beinhaltet eine Definition für fähigkeitsbewusste Systeme und veranschaulicht dieses Konzept durch den Entwurf und die Implementierung interaktiver Systeme. Ferner werden technische Anforderungen objektiver Echtzeitabschätzung von Nutzerfähigkeiten erforscht und Designqualitäten für die Kommunikation dieser Abschätzungen mittels Selbstreflexion identifiziert. Zusammengefasst sind die Erkenntnisse in einer Reihe von Design- und Entwicklungsrichtlinien für derartige Systeme. Insbesondere die Kommunikation, der vom System erkannten Kompetenz, hilft Anwendern, die eigene Leistung zu interpretieren

    SpeechMirror: A Multimodal Visual Analytics System for Personalized Reflection of Online Public Speaking Effectiveness

    Full text link
    As communications are increasingly taking place virtually, the ability to present well online is becoming an indispensable skill. Online speakers are facing unique challenges in engaging with remote audiences. However, there has been a lack of evidence-based analytical systems for people to comprehensively evaluate online speeches and further discover possibilities for improvement. This paper introduces SpeechMirror, a visual analytics system facilitating reflection on a speech based on insights from a collection of online speeches. The system estimates the impact of different speech techniques on effectiveness and applies them to a speech to give users awareness of the performance of speech techniques. A similarity recommendation approach based on speech factors or script content supports guided exploration to expand knowledge of presentation evidence and accelerate the discovery of speech delivery possibilities. SpeechMirror provides intuitive visualizations and interactions for users to understand speech factors. Among them, SpeechTwin, a novel multimodal visual summary of speech, supports rapid understanding of critical speech factors and comparison of different speech samples, and SpeechPlayer augments the speech video by integrating visualization of the speaker's body language with interaction, for focused analysis. The system utilizes visualizations suited to the distinct nature of different speech factors for user comprehension. The proposed system and visualization techniques were evaluated with domain experts and amateurs, demonstrating usability for users with low visualization literacy and its efficacy in assisting users to develop insights for potential improvement.Comment: Main paper (11 pages, 6 figures) and Supplemental document (11 pages, 11 figures). Accepted by VIS 202

    iBall: Augmenting Basketball Videos with Gaze-moderated Embedded Visualizations

    Full text link
    We present iBall, a basketball video-watching system that leverages gaze-moderated embedded visualizations to facilitate game understanding and engagement of casual fans. Video broadcasting and online video platforms make watching basketball games increasingly accessible. Yet, for new or casual fans, watching basketball videos is often confusing due to their limited basketball knowledge and the lack of accessible, on-demand information to resolve their confusion. To assist casual fans in watching basketball videos, we compared the game-watching behaviors of casual and die-hard fans in a formative study and developed iBall based on the fndings. iBall embeds visualizations into basketball videos using a computer vision pipeline, and automatically adapts the visualizations based on the game context and users' gaze, helping casual fans appreciate basketball games without being overwhelmed. We confrmed the usefulness, usability, and engagement of iBall in a study with 16 casual fans, and further collected feedback from 8 die-hard fans.Comment: ACM CHI2

    Comparing Experts and Novices on Scaffolded Data Visualizations using Eye-tracking

    Get PDF
    Spatially-based scientific data visualizations are becoming widely available, yet they are often not optimized for novice audiences. This study follows after an investigation of ex-pert and novice meaning-making from scaffolded data visualizations using clinical inter-views. Using eye-tracking and concurrent interviewing, we examined quantitative fixation and AOI data and qualitative scan path data for two expertise groups (N = 20) on five versions of scaffolded global ocean data visualizations. We found influences of expertise, scaffolding, and trial. In accordance with our clinical interview findings, experts use dif-ferent meaning-making strategies from novices, but novice performance improves with scaffolding and guided practice, providing triangulation. Eye-tracking data also provide insight on meaning-making and effectiveness of scaffolding that clinical interviews alone did not

    Human Attention in Image Captioning: Dataset and Analysis

    Get PDF
    In this work, we present a novel dataset consisting of eye movements and verbal descriptions recorded synchronously over images. Using this data, we study the differences in human attention during free-viewing and image captioning tasks. We look into the relationship between human attention and language constructs during perception and sentence articulation. We also analyse attention deployment mechanisms in the top-down soft attention approach that is argued to mimic human attention in captioning tasks, and investigate whether visual saliency can help image captioning. Our study reveals that (1) human attention behaviour differs in free-viewing and image description tasks. Humans tend to fixate on a greater variety of regions under the latter task, (2) there is a strong relationship between described objects and attended objects (97%97\% of the described objects are being attended), (3) a convolutional neural network as feature encoder accounts for human-attended regions during image captioning to a great extent (around 78%78\%), (4) soft-attention mechanism differs from human attention, both spatially and temporally, and there is low correlation between caption scores and attention consistency scores. These indicate a large gap between humans and machines in regards to top-down attention, and (5) by integrating the soft attention model with image saliency, we can significantly improve the model's performance on Flickr30k and MSCOCO benchmarks. The dataset can be found at: https://github.com/SenHe/Human-Attention-in-Image-Captioning.Comment: To appear at ICCV 201

    Event-driven Similarity and Classification of Scanpaths

    Get PDF
    Eye tracking experiments often involve recording the pattern of deployment of visual attention over the stimulus as viewers perform a given task (e.g., visual search). It is useful in training applications, for example, to make available an expert\u27s sequence of eye movements, or scanpath, to novices for their inspection and subsequent learning. It may also be potentially useful to be able to assess the conformance of the novice\u27s scanpath to that of the expert. A computational tool is proposed that provides a framework for performing such classification, based on the use of a probabilistic machine learning algorithm. The approach was influenced by the need to compute similarity of eye fixations at single points in time, such as would be required for video stimuli. This method is also useful for eye movement analysis over static images and some interactive tasks. The algorithm employs a common qualitative omparison method, the heatmap, in a quantitative way to measure deviation from group aggregate behavior. This quantitative comparison is performed at individual events, defined by the stimulus, such as frame timestamps of video or mouseclicks of interactive tasks. The algorithm is evaluated and found to be more accurate and discriminative than existing comparison algorithms for the stimuli used in the examined experiments
    • …
    corecore