    Assisted Viewpoint Interaction for 3D Visualization

    Many three-dimensional visualizations are characterized by the use of a mobile viewpoint that offers multiple perspectives on a set of visual information. To effectively control the viewpoint, the viewer must simultaneously manage the cognitive tasks of understanding the layout of the environment, and knowing where to look to find relevant information, along with mastering the physical interaction required to position the viewpoint in meaningful locations. Numerous systems attempt to address these problems by catering to two extremes: simplified controls or direct presentation. This research attempts to promote hybrid interfaces that offer a supportive, yet unscripted exploration of a virtual environment.Attentive navigation is a specific technique designed to actively redirect viewers' attention while accommodating their independence. User-evaluation shows that this technique effectively facilitates several visualization tasks including landmark recognition, survey knowledge acquisition, and search sensitivity. Unfortunately, it also proves to be excessively intrusive, leading viewers to occasionally struggle for control of the viewpoint. Additional design iterations suggest that formalized coordination protocols between the viewer and the automation can mute the shortcomings and enhance the effectiveness of the initial attentive navigation design.The implications of this research generalize to inform the broader requirements for Human-Automation interaction through the visual channel. Potential applications span a number of fields, including visual representations of abstract information, 3D modeling, virtual environments, and teleoperation experiences

    Workload-aware systems and interfaces for cognitive augmentation

    In today's society, our cognition is constantly influenced by information intake, attention switching, and task interruptions. This increases the difficulty of a given task, adding to the existing workload and leading to compromised cognitive performances. The human body expresses the use of cognitive resources through physiological responses when confronted with a plethora of cognitive workload. This temporarily mobilizes additional resources to deal with the workload at the cost of accelerated mental exhaustion. We predict that recent developments in physiological sensing will increasingly create user interfaces that are aware of the user’s cognitive capacities, hence able to intervene when high or low states of cognitive workload are detected. In this thesis, we initially focus on determining opportune moments for cognitive assistance. Subsequently, we investigate suitable feedback modalities in a user-centric design process which are desirable for cognitive assistance. We present design requirements for how cognitive augmentation can be achieved using interfaces that sense cognitive workload. We then investigate different physiological sensing modalities to enable suitable real-time assessments of cognitive workload. We provide empirical evidence that the human brain is sensitive to fluctuations in cognitive resting states, hence making cognitive effort measurable. Firstly, we show that electroencephalography is a reliable modality to assess the mental workload generated during the user interface operation. Secondly, we use eye tracking to evaluate changes in eye movements and pupil dilation to quantify different workload states. The combination of machine learning and physiological sensing resulted in suitable real-time assessments of cognitive workload. The use of physiological sensing enables us to derive when cognitive augmentation is suitable. Based on our inquiries, we present applications that regulate cognitive workload in home and work settings. We deployed an assistive system in a field study to investigate the validity of our derived design requirements. Finding that workload is mitigated, we investigated how cognitive workload can be visualized to the user. We present an implementation of a biofeedback visualization that helps to improve the understanding of brain activity. A final study shows how cognitive workload measurements can be used to predict the efficiency of information intake through reading interfaces. Here, we conclude with use cases and applications which benefit from cognitive augmentation. This thesis investigates how assistive systems can be designed to implicitly sense and utilize cognitive workload for input and output. To do so, we measure cognitive workload in real-time by collecting behavioral and physiological data from users and analyze this data to support users through assistive systems that adapt their interface according to the currently measured workload. Our overall goal is to extend new and existing context-aware applications by the factor cognitive workload. We envision Workload-Aware Systems and Workload-Aware Interfaces as an extension in the context-aware paradigm. To this end, we conducted eight research inquiries during this thesis to investigate how to design and create workload-aware systems. Finally, we present our vision of future workload-aware systems and workload-aware interfaces. Due to the scarce availability of open physiological data sets, reference implementations, and methods, previous context-aware systems were limited in their ability to utilize cognitive workload for user interaction. Together with the collected data sets, we expect this thesis to pave the way for methodical and technical tools that integrate workload-awareness as a factor for context-aware systems.Tagtäglich werden unsere kognitiven Fähigkeiten durch die Verarbeitung von unzähligen Informationen in Anspruch genommen. Dies kann die Schwierigkeit einer Aufgabe durch mehr oder weniger Arbeitslast beeinflussen. Der menschliche Körper drückt die Nutzung kognitiver Ressourcen durch physiologische Reaktionen aus, wenn dieser mit kognitiver Arbeitsbelastung konfrontiert oder überfordert wird. Dadurch werden weitere Ressourcen mobilisiert, um die Arbeitsbelastung vorübergehend zu bewältigen. Wir prognostizieren, dass die derzeitige Entwicklung physiologischer Messverfahren kognitive Leistungsmessungen stets möglich machen wird, um die kognitive Arbeitslast des Nutzers jederzeit zu messen. Diese sind in der Lage, einzugreifen wenn eine zu hohe oder zu niedrige kognitive Belastung erkannt wird. Wir konzentrieren uns zunächst auf die Erkennung passender Momente für kognitive Unterstützung welche sich der gegenwärtigen kognitiven Arbeitslast bewusst sind. Anschließend untersuchen wir in einem nutzerzentrierten Designprozess geeignete Feedbackmechanismen, die zur kognitiven Assistenz beitragen. Wir präsentieren Designanforderungen, welche zeigen wie Schnittstellen eine kognitive Augmentierung durch die Messung kognitiver Arbeitslast erreichen können. Anschließend untersuchen wir verschiedene physiologische Messmodalitäten, welche Bewertungen der kognitiven Arbeitsbelastung in Realzeit ermöglichen. Zunächst validieren wir empirisch, dass das menschliche Gehirn auf kognitive Arbeitslast reagiert. Es zeigt sich, dass die Ableitung der kognitiven Arbeitsbelastung über Elektroenzephalographie eine geeignete Methode ist, um den kognitiven Anspruch neuartiger Assistenzsysteme zu evaluieren. Anschließend verwenden wir Eye-Tracking, um Veränderungen in den Augenbewegungen und dem Durchmesser der Pupille unter verschiedenen Intensitäten kognitiver Arbeitslast zu bewerten. Das Anwenden von maschinellem Lernen führt zu zuverlässigen Echtzeit-Bewertungen kognitiver Arbeitsbelastung. Auf der Grundlage der bisherigen Forschungsarbeiten stellen wir Anwendungen vor, welche die Kognition im häuslichen und beruflichen Umfeld unterstützen. Die physiologischen Messungen stellen fest, wann eine kognitive Augmentierung sich als günstig erweist. In einer Feldstudie setzen wir ein Assistenzsystem ein, um die erhobenen Designanforderungen zur Reduktion kognitiver Arbeitslast zu validieren. Unsere Ergebnisse zeigen, dass die Arbeitsbelastung durch den Einsatz von Assistenzsystemen reduziert wird. Im Anschluss untersuchen wir, wie kognitive Arbeitsbelastung visualisiert werden kann. Wir stellen eine Implementierung einer Biofeedback-Visualisierung vor, die das Nutzerverständnis zum Verlauf und zur Entstehung von kognitiver Arbeitslast unterstützt. Eine abschließende Studie zeigt, wie Messungen kognitiver Arbeitslast zur Vorhersage der aktuellen Leseeffizienz benutzt werden können. Wir schließen hierbei mit einer Reihe von Applikationen ab, welche sich kognitive Arbeitslast als Eingabe zunutze machen. Die vorliegende wissenschaftliche Arbeit befasst sich mit dem Design von Assistenzsystemen, welche die kognitive Arbeitslast der Nutzer implizit erfasst und diese bei der Durchführung alltäglicher Aufgaben unterstützt. Dabei werden physiologische Daten erfasst, um Rückschlüsse in Realzeit auf die derzeitige kognitive Arbeitsbelastung zu erlauben. Anschließend werden diese Daten analysiert, um dem Nutzer strategisch zu assistieren. Das Ziel dieser Arbeit ist die Erweiterung neuartiger und bestehender kontextbewusster Benutzerschnittstellen um den Faktor kognitive Arbeitslast. Daher werden in dieser Arbeit arbeitslastbewusste Systeme und arbeitslastbewusste Benutzerschnittstellen als eine zusätzliche Dimension innerhalb des Paradigmas kontextbewusster Systeme präsentiert. Wir stellen acht Forschungsstudien vor, um die Designanforderungen und die Implementierung von kognitiv arbeitslastbewussten Systemen zu untersuchen. Schließlich stellen wir unsere Vision von zukünftigen kognitiven arbeitslastbewussten Systemen und Benutzerschnittstellen vor. Durch die knappe Verfügbarkeit öffentlich zugänglicher Datensätze, Referenzimplementierungen, und Methoden, waren Kontextbewusste Systeme in der Auswertung kognitiver Arbeitslast bezüglich der Nutzerinteraktion limitiert. Ergänzt durch die in dieser Arbeit gesammelten Datensätze erwarten wir, dass diese Arbeit den Weg für methodische und technische Werkzeuge ebnet, welche kognitive Arbeitslast als Faktor in das Kontextbewusstsein von Computersystemen integriert

    Predicting Visual Attention and Distraction During Visual Search Using Convolutional Neural Networks

    Most studies in computational modeling of visual attention encompass task-free observation of images. Free-viewing saliency considers limited scenarios of daily life. Most visual activities are goal-oriented and demand a great amount of top-down attention control. Visual search task demands more top-down control of attention, compared to free-viewing. In this paper, we present two approaches to model visual attention and distraction of observers during visual search. Our first approach adapts a light-weight free-viewing saliency model to predict eye fixation density maps of human observers over pixels of search images, using a two-stream convolutional encoder-decoder network, trained and evaluated on COCO-Search18 dataset. This method predicts which locations are more distracting when searching for a particular target. Our network achieves good results on standard saliency metrics (AUC-Judd=0.95, AUC-Borji=0.85, sAUC=0.84, NSS=4.64, KLD=0.93, CC=0.72, SIM=0.54, and IG=2.59). Our second approach is object-based and predicts the distractor and target objects during visual search. Distractors are all objects except the target that observers fixate on during search. This method uses a Mask-RCNN segmentation network pre-trained on MS-COCO and fine-tuned on COCO-Search18 dataset. We release our segmentation annotations of targets and distractors in COCO-Search18 for three target categories: bottle, bowl, and car. The average scores over the three categories are: F1-score=0.64, MAP(iou:0.5)=0.57, MAR(iou:0.5)=0.73. Our implementation code in Tensorflow is publicly available at https://github.com/ManooshSamiei/Distraction-Visual-Search .Comment: 33 pages, 24 figures, 12 tables, this is a pre-print manuscript currently under review in Journal of Visio

    Attention Mechanism for Recognition in Computer Vision

    It has been proven that humans do not focus their attention on an entire scene at once when they perform a recognition task. Instead, they pay attention to the most important parts of the scene to extract the most discriminative information. Inspired by this observation, in this dissertation, the importance of attention mechanism in recognition tasks in computer vision is studied by designing novel attention-based models. In specific, four scenarios are investigated that represent the most important aspects of attention mechanism.First, an attention-based model is designed to reduce the visual features\u27 dimensionality by selectively processing only a small subset of the data. We study this aspect of the attention mechanism in a framework based on object recognition in distributed camera networks. Second, an attention-based image retrieval system (i.e., person re-identification) is proposed which learns to focus on the most discriminative regions of the person\u27s image and process those regions with higher computation power using a deep convolutional neural network. Furthermore, we show how visualizing the attention maps can make deep neural networks more interpretable. In other words, by visualizing the attention maps we can observe the regions of the input image where the neural network relies on, in order to make a decision. Third, a model for estimating the importance of the objects in a scene based on a given task is proposed. More specifically, the proposed model estimates the importance of the road users that a driver (or an autonomous vehicle) should pay attention to in a driving scenario in order to have safe navigation. In this scenario, the attention estimation is the final output of the model. Fourth, an attention-based module and a new loss function in a meta-learning based few-shot learning system is proposed in order to incorporate the context of the task into the feature representations of the samples and increasing the few-shot recognition accuracy.In this dissertation, we showed that attention can be multi-facet and studied the attention mechanism from the perspectives of feature selection, reducing the computational cost, interpretable deep learning models, task-driven importance estimation, and context incorporation. Through the study of four scenarios, we further advanced the field of where \u27\u27attention is all you need\u27\u27

    Landmark Visualization on Mobile Maps – Effects on Visual Attention, Spatial Learning, and Cognitive Load during Map-Aided Real-World Navigation of Pedestrians

    Even though they are day-to-day activities, humans find navigation and wayfinding to be cognitively challenging. To facilitate their everyday mobility, humans increasingly rely on ubiquitous mobile maps as navigation aids. However, the over-reliance on and habitual use of omnipresent navigation aids deteriorate humans' short-term ability to learn new information about their surroundings and induces a long-term decline in spatial skills. This deterioration in spatial learning is attributed to the fact that these aids capture users' attention and cause them to enter a passive navigation mode. Another factor that limits spatial learning during map-aided navigation is the lack of salient landmark information on mobile maps. Prior research has already demonstrated that wayfinders rely on landmarks—geographic features that stand out from their surroundings—to facilitate navigation and build a spatial representation of the environments they traverse. Landmarks serve as anchor points and help wayfinders to visually match the spatial information depicted on the mobile map with the information collected during the active exploration of the environment. Considering the acknowledged significance of landmarks for human wayfinding due to their visibility and saliency, this thesis investigates an open research question: how to graphically communicate landmarks on mobile map aids to cue wayfinders' allocation of attentional resources to these task-relevant environmental features. From a cartographic design perspective, landmarks can be depicted on mobile map aids on a graphical continuum ranging from abstract 2D text labels to realistic 3D buildings with high visual fidelity. Based on the importance of landmarks for human wayfinding and the rich cartographic body of research concerning their depiction on mobile maps, this thesis investigated how various landmark visualization styles affect the navigation process of two user groups (expert and general wayfinders) in different navigation use contexts (emergency and general navigation tasks). Specifically, I conducted two real-world map-aided navigation studies to assess the influence of various landmark visualization styles on wayfinders' navigation performance, spatial learning, allocation of visual attention, and cognitive load. In Study I, I investigated how depicting landmarks as abstract 2D building footprints or realistic 3D buildings on the mobile map affected expert wayfinders' navigation performance, visual attention, spatial learning, and cognitive load during an emergency navigation task. I asked expert navigators recruited from the Swiss Armed Forces to follow a predefined route using a mobile map depicting landmarks as either abstract 2D building footprints or realistic 3D buildings and to identify the depicted task-relevant landmarks in the environment. I recorded the experts' gaze behavior with a mobile eye-tracer and their cognitive load with EEG during the navigation task, and I captured their incidental spatial learning at the end of the task. The wayfinding experts' exhibited high navigation performance and low cognitive load during the map-aided navigation task regardless of the landmark visualization style. Their gaze behavior revealed that wayfinding experts navigating with realistic 3D landmarks focused more on the visualizations of landmarks on the mobile map than those who navigated with abstract 2D landmarks, while the latter focused more on the depicted route. Furthermore, when the experts focused for longer on the environment and the landmarks, their spatial learning improved regardless of the landmark visualization style. I also found that the spatial learning of experts with self-reported low spatial abilities improved when they navigated with landmarks depicted as realistic 3D buildings. In Study II, I investigated the influence of abstract and realistic 3D landmark visualization styles on wayfinders sampled from the general population. As in Study I, I investigated wayfinders' navigation performance, visual attention, spatial learning, and cognitive load. In contrast to Study I, the participants in Study II were exposed to both landmark visualization styles in a navigation context that mimics everyday navigation. Furthermore, the participants were informed that their spatial knowledge of the environment would be tested after navigation. As in Study I, the wayfinders in Study II exhibited high navigation performance and low cognitive load regardless of the landmark visualization style. Their visual attention revealed that wayfinders with low spatial abilities and wayfinders familiar with the study area fixated on the environment longer when they navigated with realistic 3D landmarks on the mobile map. Spatial learning improved when wayfinders with low spatial abilities were assisted by realistic 3D landmarks. Also, when wayfinders were assisted by realistic 3D landmarks and paid less attention to the map aid, their spatial learning improved. Taken together, the present real-world navigation studies provide ecologically valid results on the influence of various landmark visualization styles on wayfinders. In particular, the studies demonstrate how visualization style modulates wayfinders' visual attention and facilitates spatial learning across various user groups and navigation use contexts. Furthermore, the results of both studies highlight the importance of individual differences in spatial abilities as predictors of spatial learning during map-assisted navigation. Based on these findings, the present work provides design recommendations for future mobile maps that go beyond the traditional concept of "one fits all." Indeed, the studies support the cause for landmark depiction that directs individual wayfinders' visual attention to task-relevant landmarks to further enhance spatial learning. This would be especially helpful for users with low spatial skills. In doing so, future mobile maps could dynamically adapt the visualization style of landmarks according to wayfinders' spatial abilities for cued visual attention, thus meeting individuals' spatial learning needs

    VS-TransGRU: A Novel Transformer-GRU-based Framework Enhanced by Visual-Semantic Fusion for Egocentric Action Anticipation

    Egocentric action anticipation is a challenging task that aims to make advanced predictions of future actions from current and historical observations in the first-person view. Most existing methods focus on improving the model architecture and loss function based on the visual input and recurrent neural network to boost the anticipation performance. However, these methods, which merely consider visual information and rely on a single network architecture, gradually reach a performance plateau. In order to fully understand what has been observed and capture the dependencies between current observations and future actions well enough, we propose a novel visual-semantic fusion enhanced and Transformer GRU-based action anticipation framework in this paper. Firstly, high-level semantic information is introduced to improve the performance of action anticipation for the first time. We propose to use the semantic features generated based on the class labels or directly from the visual observations to augment the original visual features. Secondly, an effective visual-semantic fusion module is proposed to make up for the semantic gap and fully utilize the complementarity of different modalities. Thirdly, to take advantage of both the parallel and autoregressive models, we design a Transformer based encoder for long-term sequential modeling and a GRU-based decoder for flexible iteration decoding. Extensive experiments on two large-scale first-person view datasets, i.e., EPIC-Kitchens and EGTEA Gaze+, validate the effectiveness of our proposed method, which achieves new state-of-the-art performance, outperforming previous approaches by a large margin.Comment: 12 pages, 7 figure

    Augmented reality and scene examination

    The research presented in this thesis explores the impact of Augmented Reality on human performance, and compares this technology with Virtual Reality using a head-mounted video-feed for a variety of tasks that relate to scene examination. The motivation for the work was the question of whether Augmented Reality could provide a vehicle for training in crime scene investigation. The Augmented Reality application was developed using fiducial markers in the Windows Presentation Foundation, running on a wearable computer platform; Virtual Reality was developed using the Crytek game engine to present a photo-realistic 3D environment; and a video-feed was provided through head-mounted webcam. All media were presented through head-mounted displays of similar resolution to provide the sole source of visual information to participants in the experiments. The experiments were designed to increase the amount of mobility required to conduct the search task, i.e., from rotation in the horizontal or vertical plane through to movement around a room. In each experiment, participants were required to find objects and subsequently recall their location. It is concluded that human performance is affected not merely via the medium through which the world is perceived but moreover, the constraints governing how movement in the world is controlled

    Multimodality with Eye tracking and Haptics: A New Horizon for Serious Games?

    The goal of this review is to illustrate the emerging use of multimodal virtual reality that can benefit learning-based games. The review begins with an introduction to multimodal virtual reality in serious games and we provide a brief discussion of why cognitive processes involved in learning and training are enhanced under immersive virtual environments. We initially outline studies that have used eye tracking and haptic feedback independently in serious games, and then review some innovative applications that have already combined eye tracking and haptic devices in order to provide applicable multimodal frameworks for learning-based games. Finally, some general conclusions are identified and clarified in order to advance current understanding in multimodal serious game production as well as exploring possible areas for new applications
