1,098 research outputs found

    Presentation adaptation for multimodal interface systems: Three essays on the effectiveness of user-centric content and modality adaptation

    Full text link
    The use of devices is becoming increasingly ubiquitous and the contexts of their users more and more dynamic. This often leads to situations where one communication channel is rather impractical. Text-based communication is particularly inconvenient when the hands are already occupied with another task. Audio messages induce privacy risks and may disturb other people if used in public spaces. Multimodal interfaces thus offer users the flexibility to choose between multiple interaction modalities. While the choice of a suitable input modality lies in the hands of the users, they may also require output in a different modality depending on their situation. To adapt the output of a system to a particular context, rules are needed that specify how information should be presented given the users’ situation and state. Therefore, this thesis tests three adaptation rules that – based on observations from cognitive science – have the potential to improve the interaction with an application by adapting the presented content or its modality. Following modality alignment, the output (audio versus visual) of a smart home display is matched with the user’s input (spoken versus manual) to the system. Experimental evaluations reveal that preferences for an input modality are initially too unstable to infer a clear preference for either interaction modality. Thus, the data shows no clear relation between the users’ modality choice for the first interaction and their attitude towards output in different modalities. To apply multimodal redundancy, information is displayed in multiple modalities. An application of the rule in a video conference reveals that captions can significantly reduce confusion. However, the effect is limited to confusion resulting from language barriers, whereas contradictory auditory reports leave the participants in a state of confusion independent of whether captions are available or not. We therefore suggest to activate captions only when the facial expression of a user – captured by action units, expressions of positive or negative affect, and a reduced blink rate – implies that the captions effectively improve comprehension. Content filtering in movies puts the character into the spotlight that – according to the distribution of their gaze to elements in the previous scene – the users prefer. If preferences are predicted with machine learning classifiers, this has the potential to significantly improve the user’ involvement compared to scenes of elements that the user does not prefer. Focused attention is additionally higher compared to scenes in which multiple characters take a lead role

    Saliency prediction in 360° architectural scenes: Performance and impact of daylight variations

    Get PDF
    Saliency models are image-based prediction models that estimate human visual attention. Such models, when applied to architectural spaces, could pave the way for design decisions where visual attention is taken into account. In this study, we tested the performance of eleven commonly used saliency models that combine traditional and deep learning methods on 126 rendered interior scenes with associated head tracking data. The data was extracted from three experiments conducted in virtual reality between 2016 and 2018. Two of these datasets pertain to the perceptual effects of daylight and include variations of daylighting conditions for a limited set of interior spaces, thereby allowing to test the influence of light conditions on human head movement. Ground truth maps were extracted from the collected head tracking logs, and the prediction accuracy of the models was tested via the correlation coefficient between ground truth and prediction maps. To address the possible inflation of results due to the equator bias, we conducted complementary analyses by restricting the area of investigation to the equatorial image regions. Although limited to immersive virtual environments, the promising performance of some traditional models such as GBVS360eq and BMS360eq for colored and textured architectural rendered spaces offers us the prospect of their possible integration into design tools. We also observed a strong correlation in head movements for the same space lit by different types of sky, a finding whose generalization requires further investigations based on datasets more specifically developed to address this question

    Developing an Affect-Aware Rear-Projected Robotic Agent

    Get PDF
    Social (or Sociable) robots are designed to interact with people in a natural and interpersonal manner. They are becoming an integrated part of our daily lives and have achieved positive outcomes in several applications such as education, health care, quality of life, entertainment, etc. Despite significant progress towards the development of realistic social robotic agents, a number of problems remain to be solved. First, current social robots either lack enough ability to have deep social interaction with human, or they are very expensive to build and maintain. Second, current social robots have yet to reach the full emotional and social capabilities necessary for rich and robust interaction with human beings. To address these problems, this dissertation presents the development of a low-cost, flexible, affect-aware rear-projected robotic agent (called ExpressionBot), that is designed to support verbal and non-verbal communication between the robot and humans, with the goal of closely modeling the dynamics of natural face-to-face communication. The developed robotic platform uses state-of-the-art character animation technologies to create an animated human face (aka avatar) that is capable of showing facial expressions, realistic eye movement, and accurate visual speech, and then project this avatar onto a face-shaped translucent mask. The mask and the projector are then rigged onto a neck mechanism that can move like a human head. Since an animation is projected onto a mask, the robotic face is highly flexible research tool, mechanically simple, and low-cost to design, build and maintain compared with mechatronic and android faces. The results of our comprehensive Human-Robot Interaction (HRI) studies illustrate the benefits and values of the proposed rear-projected robotic platform over a virtual-agent with the same animation displayed on a 2D computer screen. The results indicate that ExpressionBot is well accepted by users, with some advantages in expressing facial expressions more accurately and perceiving mutual eye gaze contact. To improve social capabilities of the robot and create an expressive and empathic social agent (affect-aware) which is capable of interpreting users\u27 emotional facial expressions, we developed a new Deep Neural Networks (DNN) architecture for Facial Expression Recognition (FER). The proposed DNN was initially trained on seven well-known publicly available databases, and obtained significantly better than, or comparable to, traditional convolutional neural networks or other state-of-the-art methods in both accuracy and learning time. Since the performance of the automated FER system highly depends on its training data, and the eventual goal of the proposed robotic platform is to interact with users in an uncontrolled environment, a database of facial expressions in the wild (called AffectNet) was created by querying emotion-related keywords from different search engines. AffectNet contains more than 1M images with faces and 440,000 manually annotated images with facial expressions, valence, and arousal. Two DNNs were trained on AffectNet to classify the facial expression images and predict the value of valence and arousal. Various evaluation metrics show that our deep neural network approaches trained on AffectNet can perform better than conventional machine learning methods and available off-the-shelf FER systems. We then integrated this automated FER system into spoken dialog of our robotic platform to extend and enrich the capabilities of ExpressionBot beyond spoken dialog and create an affect-aware robotic agent that can measure and infer users\u27 affect and cognition. Three social/interaction aspects (task engagement, being empathic, and likability of the robot) are measured in an experiment with the affect-aware robotic agent. The results indicate that users rated our affect-aware agent as empathic and likable as a robot in which user\u27s affect is recognized by a human (WoZ). In summary, this dissertation presents the development and HRI studies of a perceptive, and expressive, conversational, rear-projected, life-like robotic agent (aka ExpressionBot or Ryan) that models natural face-to-face communication between human and emapthic agent. The results of our in-depth human-robot-interaction studies show that this robotic agent can serve as a model for creating the next generation of empathic social robots

    Predicting Driver Takeover Performance and Designing Alert Systems in Conditionally Automated Driving

    Full text link
    With the Society of Automotive Engineers Level 3 automation, drivers are no longer required to actively monitor driving environments, and can potentially engage in non-driving related tasks. Nevertheless, when the automation reaches its operational limits, drivers will have to take over control of vehicles at a moment’s notice. Drivers have difficulty with takeover transitions, as they become increasingly decoupled from the operational level of driving. In response to the takeover difficulty, existing literature has investigated various factors affecting takeover performance. However, not all the factors were studied comprehensively, and the results of some factors were mixed. Meanwhile, there is a lack of research on the development of computational models that predict drivers’ takeover performance using their physiological and driving environment data. Furthermore, current research on the design of in-vehicle alert systems suffers from methodological shortcomings and presents identical takeover warnings regardless of event criticality. To address these shortcomings, the goals of this dissertation were to (1) examine the effects of drivers' cognitive load, emotions, traffic density, and takeover request lead time on their driving behavioral (takeover timeliness and quality) and psychophysiological responses (eye movements, galvanic skin responses, and heart rate activities) to takeover requests; (2) develop computational models to predict drivers’ takeover performance using their physiological and driving environment data via machine learning algorithms; and (3) design in-vehicle alert systems with different display modalities and information types and evaluate the displays in different event criticality conditions via human-subject experiments. The results of three human-subject experiments showed that positive emotional valence led to smoother takeover behaviors. Only when drivers had low cognitive load, they had shorter takeover reaction time in high oncoming traffic conditions. High oncoming traffic led to higher collision risk. High speed led to higher collision risk and harsher takeover behaviors in lane changing scenarios, but engendered longer takeover reaction time and smoother takeover behaviors in lane keeping scenarios. Meanwhile, we developed a random forest model to predict drivers' takeover performance with an accuracy of 84.3% and an F1-score of 64.0%. Our model had finer granularity than and outperformed other machine learning models used in prior studies. The findings of alert system design studies showed that drivers had more anxiety with the why only information compared to the why + what will information when information was presented in the speech modality. They felt more prepared to take over control of the vehicle and had more preference for the combination of augmented reality and speech conditions than others when drivers were in high event criticality situations. This dissertation can add to the knowledge base about takeover response investigation, takeover performance prediction, and in-vehicle alert system design. The results will enhance the understanding of how drivers’ emotions, cognitive load, traffic density, and scenario type influence their takeover responses. The computational models for takeover performance prediction are underlying algorithms of in-vehicle monitoring systems in real-world applications. The findings will provide design recommendations to automated vehicle manufacturers on in-vehicle alert systems. This will ultimately enhance the interaction between drivers and automated vehicles and improve driving safety in intelligent transportation systems.PHDIndustrial & Operations EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169727/1/nadu_1.pd

    The Bee\u27s Knees or Spines of a Spider: What Makes an \u27Insect\u27 Interesting?

    Get PDF
    Insects and their kin (bugs) are among the most detested and despised creatures on earth. Irrational fears of these mostly harmless organisms often restrict and prevent opportunities for outdoor recreation and leisure. Alternatively, Shipley and Bixler (2016) theorize that direct and positive experiences with bugs during middle childhood may result in fascination with insects leading to comfort in wildland settings. The objective of this research was to examine and identify the novel and unfamiliar bug types that people are more likely to find interesting and visually attend to when spontaneously presented with their images. This research examined these questions through four integrated exploratory studies. The first study (n = 216) found that a majority of adults are unfamiliar with a majority of bugs, despite the abundance of many common but Ëśunfamiliar\u27 bugs. The second (n = 15) and third (n = 308) study examined participant\u27s first impressions of unfamiliar bugs. The second study consisted of in-depth interviews, while the third study had participants report their perceptions of bugs across multiple emotional dimensions. Together, both studies suggest there are many unfamiliar bugs that are perceptually novel and perceived as interesting when encountered. The fourth study (n = 48) collected metrics of visual attention using eye-tracking by measuring visual fixations while participants viewed different bugs identified through previous studies as either being interesting or disinteresting. The findings of the fourth study suggest that interesting bugs can capture more visual attention than uninteresting bugs. Results from all four studies provide a heuristic for interpretive naturalists, magazine editors, marketers, public relation advisors, filmmakers, and any other visual communication professional that can be used in the choice of images of unfamiliar images of insects and other small invertebrates to evoke situational interest and motivate subsequent behavior
    • …
    corecore