941 research outputs found

    Deep Thermal Imaging: Proximate Material Type Recognition in the Wild through Deep Learning of Spatial Surface Temperature Patterns

    Get PDF
    We introduce Deep Thermal Imaging, a new approach for close-range automatic recognition of materials to enhance the understanding of people and ubiquitous technologies of their proximal environment. Our approach uses a low-cost mobile thermal camera integrated into a smartphone to capture thermal textures. A deep neural network classifies these textures into material types. This approach works effectively without the need for ambient light sources or direct contact with materials. Furthermore, the use of a deep learning network removes the need to handcraft the set of features for different materials. We evaluated the performance of the system by training it to recognise 32 material types in both indoor and outdoor environments. Our approach produced recognition accuracies above 98% in 14,860 images of 15 indoor materials and above 89% in 26,584 images of 17 outdoor materials. We conclude by discussing its potentials for real-time use in HCI applications and future directions.Comment: Proceedings of the 2018 CHI Conference on Human Factors in Computing System

    Personalizing Human-Robot Dialogue Interactions using Face and Name Recognition

    Get PDF
    Task-oriented dialogue systems are computer systems that aim to provide an interaction indistinguishable from ordinary human conversation with the goal of completing user- defined tasks. They are achieving this by analyzing the intents of users and choosing respective responses. Recent studies show that by personalizing the conversations with this systems one can positevely affect their perception and long-term acceptance. Personalised social robots have been widely applied in different fields to provide assistance. In this thesis we are working on development of a scientific conference assistant. The goal of this assistant is to provide the conference participants with conference information and inform about the activities for their spare time during conference. Moreover, to increase the engagement with the robot our team has worked on personalizing the human-robot interaction by means of face and name recognition. To achieve this personalisation, first the name recognition ability of available physical robot was improved, next by the concent of the participants their pictures were taken and used for memorization of returning users. As acquiring the consent for personal data storage is not an optimal solution, an alternative method for participants recognition using QR Codes on their badges was developed and compared to pre-trained model in terms of speed. Lastly, the personal details of each participant, as unviversity, country of origin, was acquired prior to conference or during the conversation and used in dialogues. The developed robot, called DAGFINN was displayed at two conferences happened this year in Stavanger, where the first time installment did not involve personalization feature. Hence, we conclude this thesis by discussing the influence of personalisation on dialogues with the robot and participants satisfaction with developed social robot

    STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition

    Full text link
    Existing methods of privacy-preserving action recognition (PPAR) mainly focus on frame-level (spatial) privacy removal through 2D CNNs. Unfortunately, they have two major drawbacks. First, they may compromise temporal dynamics in input videos, which are critical for accurate action recognition. Second, they are vulnerable to practical attacking scenarios where attackers probe for privacy from an entire video rather than individual frames. To address these issues, we propose a novel framework STPrivacy to perform video-level PPAR. For the first time, we introduce vision Transformers into PPAR by treating a video as a tubelet sequence, and accordingly design two complementary mechanisms, i.e., sparsification and anonymization, to remove privacy from a spatio-temporal perspective. In specific, our privacy sparsification mechanism applies adaptive token selection to abandon action-irrelevant tubelets. Then, our anonymization mechanism implicitly manipulates the remaining action-tubelets to erase privacy in the embedding space through adversarial learning. These mechanisms provide significant advantages in terms of privacy preservation for human eyes and action-privacy trade-off adjustment during deployment. We additionally contribute the first two large-scale PPAR benchmarks, VP-HMDB51 and VP-UCF101, to the community. Extensive evaluations on them, as well as two other tasks, validate the effectiveness and generalization capability of our framework

    SKATE: A Natural Language Interface for Encoding Structured Knowledge

    Full text link
    In Natural Language (NL) applications, there is often a mismatch between what the NL interface is capable of interpreting and what a lay user knows how to express. This work describes a novel natural language interface that reduces this mismatch by refining natural language input through successive, automatically generated semi-structured templates. In this paper we describe how our approach, called SKATE, uses a neural semantic parser to parse NL input and suggest semi-structured templates, which are recursively filled to produce fully structured interpretations. We also show how SKATE integrates with a neural rule-generation model to interactively suggest and acquire commonsense knowledge. We provide a preliminary coverage analysis of SKATE for the task of story understanding, and then describe a current business use-case of the tool in a specific domain: COVID-19 policy design.Comment: Accepted at IAAI-2

    Multimodal Emotion Recognition via Convolutional Neural Networks: Comparison of different strategies on two multimodal datasets

    Get PDF
    The aim of this paper is to investigate emotion recognition using a multimodal approach that exploits convolutional neural networks (CNNs) with multiple input. Multimodal approaches allow different modalities to cooperate in order to achieve generally better performances because different features are extracted from different pieces of information. In this work, the facial frames, the optical flow computed from consecutive facial frames, and the Mel Spectrograms (from the word melody) are extracted from videos and combined together in different ways to understand which modality combination works better. Several experiments are run on the models by first considering one modality at a time so that good accuracy results are found on each modality. Afterward, the models are concatenated to create a final model that allows multiple inputs. For the experiments the datasets used are BAUM-1 ((Bahçeşehir University Multimodal Affective Database - 1) and RAVDESS (Ryerson Audio–Visual Database of Emotional Speech and Song), which both collect two distinguished sets of videos based on the different intensity of the expression, that is acted/strong or spontaneous/normal, providing the representations of the following emotional states that will be taken into consideration: angry, disgust, fearful, happy and sad. The performances of the proposed models are shown through accuracy results and some confusion matrices, demonstrating better accuracy than the compared proposals in the literature. The best accuracy achieved on BAUM-1 dataset is about 95%, while on RAVDESS it is about 95.5%