4,276 research outputs found
Multimodal Signal Processing and Learning Aspects of Human-Robot Interaction for an Assistive Bathing Robot
We explore new aspects of assistive living on smart human-robot interaction
(HRI) that involve automatic recognition and online validation of speech and
gestures in a natural interface, providing social features for HRI. We
introduce a whole framework and resources of a real-life scenario for elderly
subjects supported by an assistive bathing robot, addressing health and hygiene
care issues. We contribute a new dataset and a suite of tools used for data
acquisition and a state-of-the-art pipeline for multimodal learning within the
framework of the I-Support bathing robot, with emphasis on audio and RGB-D
visual streams. We consider privacy issues by evaluating the depth visual
stream along with the RGB, using Kinect sensors. The audio-gestural recognition
task on this new dataset yields up to 84.5%, while the online validation of the
I-Support system on elderly users accomplishes up to 84% when the two
modalities are fused together. The results are promising enough to support
further research in the area of multimodal recognition for assistive social
HRI, considering the difficulties of the specific task. Upon acceptance of the
paper part of the data will be publicly available
Prefrontal cortex activation upon a demanding virtual hand-controlled task: A new frontier for neuroergonomics
open9noFunctional near-infrared spectroscopy (fNIRS) is a non-invasive vascular-based functional neuroimaging technology that can assess, simultaneously from multiple cortical areas, concentration changes in oxygenated-deoxygenated hemoglobin at the level of the cortical microcirculation blood vessels. fNIRS, with its high degree of ecological validity and its very limited requirement of physical constraints to subjects, could represent a valid tool for monitoring cortical responses in the research field of neuroergonomics. In virtual reality (VR) real situations can be replicated with greater control than those obtainable in the real world. Therefore, VR is the ideal setting where studies about neuroergonomics applications can be performed. The aim of the present study was to investigate, by a 20-channel fNIRS system, the dorsolateral/ventrolateral prefrontal cortex (DLPFC/VLPFC) in subjects while performing a demanding VR hand-controlled task (HCT). Considering the complexity of the HCT, its execution should require the attentional resources allocation and the integration of different executive functions. The HCT simulates the interaction with a real, remotely-driven, system operating in a critical environment. The hand movements were captured by a high spatial and temporal resolution 3-dimensional (3D) hand-sensing device, the LEAP motion controller, a gesture-based control interface that could be used in VR for tele-operated applications. Fifteen University students were asked to guide, with their right hand/forearm, a virtual ball (VB) over a virtual route (VROU) reproducing a 42 m narrow road including some critical points. The subjects tried to travel as long as possible without making VB fall. The distance traveled by the guided VB was 70.2 ± 37.2 m. The less skilled subjects failed several times in guiding the VB over the VROU. Nevertheless, a bilateral VLPFC activation, in response to the HCT execution, was observed in all the subjects. No correlation was found between the distance traveled by the guided VB and the corresponding cortical activation. These results confirm the suitability of fNIRS technology to objectively evaluate cortical hemodynamic changes occurring in VR environments. Future studies could give a contribution to a better understanding of the cognitive mechanisms underlying human performance either in expert or non-expert operators during the simulation of different demanding/fatiguing activities.openCarrieri, Marika; Petracca, Andrea; Lancia, Stefania; Basso Moro, Sara; Brigadoi, Sabrina; Spezialetti, Matteo; Ferrari, Marco; Placidi, Giuseppe; Quaresima, ValentinaCarrieri, Marika; Petracca, Andrea; Lancia, Stefania; BASSO MORO, Sara; Brigadoi, Sabrina; Spezialetti, Matteo; Ferrari, Marco; Placidi, Giuseppe; Quaresima, Valentin
RGBD Datasets: Past, Present and Future
Since the launch of the Microsoft Kinect, scores of RGBD datasets have been
released. These have propelled advances in areas from reconstruction to gesture
recognition. In this paper we explore the field, reviewing datasets across
eight categories: semantics, object pose estimation, camera tracking, scene
reconstruction, object tracking, human actions, faces and identification. By
extracting relevant information in each category we help researchers to find
appropriate data for their needs, and we consider which datasets have succeeded
in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which
are currently underexplored, and suggest that future directions may include
synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
Temporal Relational Reasoning in Videos
Temporal relational reasoning, the ability to link meaningful transformations
of objects or entities over time, is a fundamental property of intelligent
species. In this paper, we introduce an effective and interpretable network
module, the Temporal Relation Network (TRN), designed to learn and reason about
temporal dependencies between video frames at multiple time scales. We evaluate
TRN-equipped networks on activity recognition tasks using three recent video
datasets - Something-Something, Jester, and Charades - which fundamentally
depend on temporal relational reasoning. Our results demonstrate that the
proposed TRN gives convolutional neural networks a remarkable capacity to
discover temporal relations in videos. Through only sparsely sampled video
frames, TRN-equipped networks can accurately predict human-object interactions
in the Something-Something dataset and identify various human gestures on the
Jester dataset with very competitive performance. TRN-equipped networks also
outperform two-stream networks and 3D convolution networks in recognizing daily
activities in the Charades dataset. Further analyses show that the models learn
intuitive and interpretable visual common sense knowledge in videos.Comment: camera-ready version for ECCV'1
Human robot interaction in a crowded environment
Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3].
Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person.
Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]
- …