3,638 research outputs found
OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision
Hand Pose Estimation (HPE) is crucial to many applications, but conventional
cameras-based CM-HPE methods are completely subject to Line-of-Sight (LoS), as
cameras cannot capture occluded objects. In this paper, we propose to exploit
Radio-Frequency-Vision (RF-vision) capable of bypassing obstacles for achieving
occluded HPE, and we introduce OCHID-Fi as the first RF-HPE method with 3D pose
estimation capability. OCHID-Fi employs wideband RF sensors widely available on
smart devices (e.g., iPhones) to probe 3D human hand pose and extract their
skeletons behind obstacles. To overcome the challenge in labeling RF imaging
given its human incomprehensible nature, OCHID-Fi employs a cross-modality and
cross-domain training process. It uses a pre-trained CM-HPE network and a
synchronized CM/RF dataset, to guide the training of its complex-valued RF-HPE
network under LoS conditions. It further transfers knowledge learned from
labeled LoS domain to unlabeled occluded domain via adversarial learning,
enabling OCHID-Fi to generalize to unseen occluded scenarios. Experimental
results demonstrate the superiority of OCHID-Fi: it achieves comparable
accuracy to CM-HPE under normal conditions while maintaining such accuracy even
in occluded scenarios, with empirical evidence for its generalizability to new
domains.Comment: Accepted to ICCV 202
Human gesture classification by brute-force machine learning for exergaming in physiotherapy
In this paper, a novel approach for human gesture classification on skeletal data is proposed for the application of exergaming in physiotherapy. Unlike existing methods, we propose to use a general classifier like Random Forests to recognize dynamic gestures. The temporal dimension is handled afterwards by majority voting in a sliding window over the consecutive predictions of the classifier. The gestures can have partially similar postures, such that the classifier will decide on the dissimilar postures. This brute-force classification strategy is permitted, because dynamic human gestures show sufficient dissimilar postures. Online continuous human gesture recognition can classify dynamic gestures in an early stage, which is a crucial advantage when controlling a game by automatic gesture recognition. Also, ground truth can be easily obtained, since all postures in a gesture get the same label, without any discretization into consecutive postures. This way, new gestures can be easily added, which is advantageous in adaptive game development. We evaluate our strategy by a leave-one-subject-out cross-validation on a self-captured stealth game gesture dataset and the publicly available Microsoft Research Cambridge-12 Kinect (MSRC-12) dataset. On the first dataset we achieve an excellent accuracy rate of 96.72%. Furthermore, we show that Random Forests perform better than Support Vector Machines. On the second dataset we achieve an accuracy rate of 98.37%, which is on average 3.57% better then existing methods
Real-time marker-less multi-person 3D pose estimation in RGB-Depth camera networks
This paper proposes a novel system to estimate and track the 3D poses of
multiple persons in calibrated RGB-Depth camera networks. The multi-view 3D
pose of each person is computed by a central node which receives the
single-view outcomes from each camera of the network. Each single-view outcome
is computed by using a CNN for 2D pose estimation and extending the resulting
skeletons to 3D by means of the sensor depth. The proposed system is
marker-less, multi-person, independent of background and does not make any
assumption on people appearance and initial pose. The system provides real-time
outcomes, thus being perfectly suited for applications requiring user
interaction. Experimental results show the effectiveness of this work with
respect to a baseline multi-view approach in different scenarios. To foster
research and applications based on this work, we released the source code in
OpenPTrack, an open source project for RGB-D people tracking.Comment: Submitted to the 2018 IEEE International Conference on Robotics and
Automatio
Making the Invisible Visible: Action Recognition Through Walls and Occlusions
Understanding people's actions and interactions typically depends on seeing
them. Automating the process of action recognition from visual data has been
the topic of much research in the computer vision community. But what if it is
too dark, or if the person is occluded or behind a wall? In this paper, we
introduce a neural network model that can detect human actions through walls
and occlusions, and in poor lighting conditions. Our model takes radio
frequency (RF) signals as input, generates 3D human skeletons as an
intermediate representation, and recognizes actions and interactions of
multiple people over time. By translating the input to an intermediate
skeleton-based representation, our model can learn from both vision-based and
RF-based datasets, and allow the two tasks to help each other. We show that our
model achieves comparable accuracy to vision-based action recognition systems
in visible scenarios, yet continues to work accurately when people are not
visible, hence addressing scenarios that are beyond the limit of today's
vision-based action recognition.Comment: ICCV 2019. The first two authors contributed equally to this pape
- …