Search CORE

941 research outputs found

Deep Thermal Imaging: Proximate Material Type Recognition in the Wild through Deep Learning of Spatial Surface Temperature Patterns

Author: Anind
Callister William D.
Cho Youngjun
Cho Youngjun
Hendriks Antonius
Holone H.
Jaderberg Max
Krizhevsky Alex
Kurz Daniel
Leonard John J.
Liu C.
Liu H.
Lloyd J. M.
Marks
Nair Vinod
Ren Shaoqing
Romera-Paredes Bernardino
Victor
Vollmer Michael
Zhu Xiangxin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/03/2018
Field of study

We introduce Deep Thermal Imaging, a new approach for close-range automatic recognition of materials to enhance the understanding of people and ubiquitous technologies of their proximal environment. Our approach uses a low-cost mobile thermal camera integrated into a smartphone to capture thermal textures. A deep neural network classifies these textures into material types. This approach works effectively without the need for ambient light sources or direct contact with materials. Furthermore, the use of a deep learning network removes the need to handcraft the set of features for different materials. We evaluated the performance of the system by training it to recognise 32 material types in both indoor and outdoor environments. Our approach produced recognition accuracies above 98% in 14,860 images of 15 indoor materials and above 89% in 26,584 images of 17 outdoor materials. We conclude by discussing its potentials for real-time use in HCI applications and future directions.Comment: Proceedings of the 2018 CHI Conference on Human Factors in Computing System

arXiv.org e-Print Archive

Crossref

UCL Discovery

Personalizing Human-Robot Dialogue Interactions using Face and Name Recognition

Author: Khlybova Ekaterina
Orujova Narmin
Publication venue: 'Saint Louis University'
Publication date: 01/01/2022
Field of study

Task-oriented dialogue systems are computer systems that aim to provide an interaction indistinguishable from ordinary human conversation with the goal of completing user- defined tasks. They are achieving this by analyzing the intents of users and choosing respective responses. Recent studies show that by personalizing the conversations with this systems one can positevely affect their perception and long-term acceptance. Personalised social robots have been widely applied in different fields to provide assistance. In this thesis we are working on development of a scientific conference assistant. The goal of this assistant is to provide the conference participants with conference information and inform about the activities for their spare time during conference. Moreover, to increase the engagement with the robot our team has worked on personalizing the human-robot interaction by means of face and name recognition. To achieve this personalisation, first the name recognition ability of available physical robot was improved, next by the concent of the participants their pictures were taken and used for memorization of returning users. As acquiring the consent for personal data storage is not an optimal solution, an alternative method for participants recognition using QR Codes on their badges was developed and compared to pre-trained model in terms of speed. Lastly, the personal details of each participant, as unviversity, country of origin, was acquired prior to conference or during the conversation and used in dialogues. The developed robot, called DAGFINN was displayed at two conferences happened this year in Stavanger, where the first time installment did not involve personalization feature. Hence, we conclude this thesis by discussing the influence of personalisation on dialogues with the robot and participants satisfaction with developed social robot

NORA - Norwegian Open Research Archives

UiS Brage

STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition

Author: Fan Hehe
Keppo Jussi
Li Jiahe
Li Ming
Liu Jia-Wei
Liu Jun
Shou Mike Zheng
Xu Xiangyu
Yan Shuicheng
Zhou Pan
Publication venue
Publication date: 11/03/2023
Field of study

Existing methods of privacy-preserving action recognition (PPAR) mainly focus on frame-level (spatial) privacy removal through 2D CNNs. Unfortunately, they have two major drawbacks. First, they may compromise temporal dynamics in input videos, which are critical for accurate action recognition. Second, they are vulnerable to practical attacking scenarios where attackers probe for privacy from an entire video rather than individual frames. To address these issues, we propose a novel framework STPrivacy to perform video-level PPAR. For the first time, we introduce vision Transformers into PPAR by treating a video as a tubelet sequence, and accordingly design two complementary mechanisms, i.e., sparsification and anonymization, to remove privacy from a spatio-temporal perspective. In specific, our privacy sparsification mechanism applies adaptive token selection to abandon action-irrelevant tubelets. Then, our anonymization mechanism implicitly manipulates the remaining action-tubelets to erase privacy in the embedding space through adversarial learning. These mechanisms provide significant advantages in terms of privacy preservation for human eyes and action-privacy trade-off adjustment during deployment. We additionally contribute the first two large-scale PPAR benchmarks, VP-HMDB51 and VP-UCF101, to the community. Extensive evaluations on them, as well as two other tasks, validate the effectiveness and generalization capability of our framework

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

SKATE: A Natural Language Interface for Encoding Structured Knowledge

Author: Bradshaw Andrea
Diertani Ariel
Ferrucci Dave
Kalyanpur Aditya
McFate Clifton
Melville David
Moon Lori
Publication venue
Publication date: 10/12/2020
Field of study

In Natural Language (NL) applications, there is often a mismatch between what the NL interface is capable of interpreting and what a lay user knows how to express. This work describes a novel natural language interface that reduces this mismatch by refining natural language input through successive, automatically generated semi-structured templates. In this paper we describe how our approach, called SKATE, uses a neural semantic parser to parse NL input and suggest semi-structured templates, which are recursively filled to produce fully structured interpretations. We also show how SKATE integrates with a neural rule-generation model to interactively suggest and acquire commonsense knowledge. We provide a preliminary coverage analysis of SKATE for the task of story understanding, and then describe a current business use-case of the tool in a specific domain: COVID-19 policy design.Comment: Accepted at IAAI-2

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Multimodal Emotion Recognition via Convolutional Neural Networks: Comparison of different strategies on two multimodal datasets

Author: Bilotti U.
Bisogni C.
De Marsico M.
Tramonte S.
Publication venue
Publication date: 01/01/2024
Field of study

The aim of this paper is to investigate emotion recognition using a multimodal approach that exploits convolutional neural networks (CNNs) with multiple input. Multimodal approaches allow different modalities to cooperate in order to achieve generally better performances because different features are extracted from different pieces of information. In this work, the facial frames, the optical flow computed from consecutive facial frames, and the Mel Spectrograms (from the word melody) are extracted from videos and combined together in different ways to understand which modality combination works better. Several experiments are run on the models by first considering one modality at a time so that good accuracy results are found on each modality. Afterward, the models are concatenated to create a final model that allows multiple inputs. For the experiments the datasets used are BAUM-1 ((Bahçeşehir University Multimodal Affective Database - 1) and RAVDESS (Ryerson Audio–Visual Database of Emotional Speech and Song), which both collect two distinguished sets of videos based on the different intensity of the expression, that is acted/strong or spontaneous/normal, providing the representations of the following emotional states that will be taken into consideration: angry, disgust, fearful, happy and sad. The performances of the proposed models are shown through accuracy results and some confusion matrices, demonstrating better accuracy than the compared proposals in the literature. The best accuracy achieved on BAUM-1 dataset is about 95%, while on RAVDESS it is about 95.5%

Archivio della Ricerca - Università di Salerno