9,364 research outputs found
RGBD Datasets: Past, Present and Future
Since the launch of the Microsoft Kinect, scores of RGBD datasets have been
released. These have propelled advances in areas from reconstruction to gesture
recognition. In this paper we explore the field, reviewing datasets across
eight categories: semantics, object pose estimation, camera tracking, scene
reconstruction, object tracking, human actions, faces and identification. By
extracting relevant information in each category we help researchers to find
appropriate data for their needs, and we consider which datasets have succeeded
in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which
are currently underexplored, and suggest that future directions may include
synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
Continual Learing of Hand Gestures for Human Robot Interaction
Human communication is multimodal. For years, natural language processing has been studied as a form of human-machine or human-robot interaction. In recent years, computer vision techniques have been applied to the recognition of static and dynamic gestures, and progress is being made in sign language recognition too. The typical way to train a machine learning algorithm to perform a classification task is to provide training examples for all the classes that need to be identified by the model. In a real-world scenario, such as in the use of assistive robots, it is useful to learn new concepts from interaction. However, unlike biological brains, artificial neural networks suffer from catastrophic forgetting, and as a result, are not good at incrementally learning new classes. In this thesis, the HAnd Gesture Incremental Learning (HAGIL) framework is proposed as a method to incrementally learn to classify static hand gestures. We show that HAGIL is able to incrementally learn up to 36 new symbols using only 5 samples for each old symbol, achieving a final average accuracy of over 90%. In addition to that, the incremental training time is reduced to a 10% of the time required when using all data available
Looking for a better fit? An Incremental Learning Multimodal Object Referencing Framework adapting to Individual Drivers
The rapid advancement of the automotive industry towards automated and
semi-automated vehicles has rendered traditional methods of vehicle
interaction, such as touch-based and voice command systems, inadequate for a
widening range of non-driving related tasks, such as referencing objects
outside of the vehicle. Consequently, research has shifted toward gestural
input (e.g., hand, gaze, and head pose gestures) as a more suitable mode of
interaction during driving. However, due to the dynamic nature of driving and
individual variation, there are significant differences in drivers' gestural
input performance. While, in theory, this inherent variability could be
moderated by substantial data-driven machine learning models, prevalent
methodologies lean towards constrained, single-instance trained models for
object referencing. These models show a limited capacity to continuously adapt
to the divergent behaviors of individual drivers and the variety of driving
scenarios. To address this, we propose \textit{IcRegress}, a novel
regression-based incremental learning approach that adapts to changing behavior
and the unique characteristics of drivers engaged in the dual task of driving
and referencing objects. We suggest a more personalized and adaptable solution
for multimodal gestural interfaces, employing continuous lifelong learning to
enhance driver experience, safety, and convenience. Our approach was evaluated
using an outside-the-vehicle object referencing use case, highlighting the
superiority of the incremental learning models adapted over a single trained
model across various driver traits such as handedness, driving experience, and
numerous driving conditions. Finally, to facilitate reproducibility, ease
deployment, and promote further research, we offer our approach as an
open-source framework at \url{https://github.com/amrgomaaelhady/IcRegress}.Comment: Accepted for publication in the Proceedings of the 29th International
Conference on Intelligent User Interfaces (IUI'24), March 18--21, 2024, in
Greenville, SC, US
Towards responsive Sensitive Artificial Listeners
This paper describes work in the recently started project SEMAINE, which aims to build a set of Sensitive Artificial Listeners – conversational agents designed to sustain an interaction with a human user despite limited verbal skills, through robust recognition and generation of non-verbal behaviour in real-time, both when the agent is speaking and listening. We report on data collection and on the design of a system architecture in view of real-time responsiveness
Automatic analysis of facial actions: a survey
As one of the most comprehensive and objective ways to describe facial expressions, the Facial Action Coding System (FACS) has recently received significant attention. Over the past 30 years, extensive research has been conducted by psychologists and neuroscientists on various aspects of facial expression analysis using FACS. Automating FACS coding would make this research faster and more widely applicable, opening up new avenues to understanding how we communicate through facial expressions. Such an automated process can also potentially increase the reliability, precision and temporal resolution of coding. This paper provides a comprehensive survey of research into machine analysis of facial actions. We systematically review all components of such systems: pre-processing, feature extraction and machine coding of facial actions. In addition, the existing FACS-coded facial expression databases are summarised. Finally, challenges that have to be addressed to make automatic facial action analysis applicable in real-life situations are extensively discussed. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the future of machine recognition of facial actions: what are the challenges and opportunities that researchers in the field face
- …