12 research outputs found
Real-Time Human Motion Capture with Multiple Depth Cameras
Commonly used human motion capture systems require intrusive attachment of
markers that are visually tracked with multiple cameras. In this work we
present an efficient and inexpensive solution to markerless motion capture
using only a few Kinect sensors. Unlike the previous work on 3d pose estimation
using a single depth camera, we relax constraints on the camera location and do
not assume a co-operative user. We apply recent image segmentation techniques
to depth images and use curriculum learning to train our system on purely
synthetic data. Our method accurately localizes body parts without requiring an
explicit shape model. The body joint locations are then recovered by combining
evidence from multiple views in real-time. We also introduce a dataset of ~6
million synthetic depth frames for pose estimation from multiple cameras and
exceed state-of-the-art results on the Berkeley MHAD dataset.Comment: Accepted to computer robot vision 201
Play and Learn: Using Video Games to Train Computer Vision Models
Video games are a compelling source of annotated data as they can readily
provide fine-grained groundtruth for diverse tasks. However, it is not clear
whether the synthetically generated data has enough resemblance to the
real-world images to improve the performance of computer vision models in
practice. We present experiments assessing the effectiveness on real-world data
of systems trained on synthetic RGB images that are extracted from a video
game. We collected over 60000 synthetic samples from a modern video game with
similar conditions to the real-world CamVid and Cityscapes datasets. We provide
several experiments to demonstrate that the synthetically generated RGB images
can be used to improve the performance of deep neural networks on both image
segmentation and depth estimation. These results show that a convolutional
network trained on synthetic data achieves a similar test error to a network
that is trained on real-world data for dense image classification. Furthermore,
the synthetically generated RGB images can provide similar or better results
compared to the real-world datasets if a simple domain adaptation technique is
applied. Our results suggest that collaboration with game developers for an
accessible interface to gather data is potentially a fruitful direction for
future work in computer vision.Comment: To appear in the British Machine Vision Conference (BMVC), September
2016. -v2: fixed a typo in the reference
3D human pose estimation from depth maps using a deep combination of poses
Many real-world applications require the estimation of human body joints for
higher-level tasks as, for example, human behaviour understanding. In recent
years, depth sensors have become a popular approach to obtain three-dimensional
information. The depth maps generated by these sensors provide information that
can be employed to disambiguate the poses observed in two-dimensional images.
This work addresses the problem of 3D human pose estimation from depth maps
employing a Deep Learning approach. We propose a model, named Deep Depth Pose
(DDP), which receives a depth map containing a person and a set of predefined
3D prototype poses and returns the 3D position of the body joints of the
person. In particular, DDP is defined as a ConvNet that computes the specific
weights needed to linearly combine the prototypes for the given input. We have
thoroughly evaluated DDP on the challenging 'ITOP' and 'UBC3V' datasets, which
respectively depict realistic and synthetic samples, defining a new
state-of-the-art on them.Comment: Accepted for publication at "Journal of Visual Communication and
Image Representation
Computer Vision Solutions for Range of Motion Assessment
Joint range of motion (ROM) is an important indicator of physical functionality and musculoskeletal health. In sports, athletes require adequate levels of joint mobility to minimize the risk of injuries and maximize performance, while in rehabilitation, restoring joint ROM is essential for faster recovery and improved physical function. Traditional methods for measuring ROM include goniometry, inclinometry and visual estimation; all of which are limited in accuracy due to the subjective nature of the assessment. With the rapid development of technology, new systems based on computer vision are continuously introduced as a possible solution for more objective and accurate measurements of the range of motion. Therefore, this article aimed to evaluate novel computer vision-based systems based on their accuracy and practical applicability for a range of motion assessment. The review covers a variety of systems, including motion-capture systems (2D and 3D cameras), RGB-Depth cameras, commercial software systems and smartphone apps. Furthermore, this article also highlights the potential limitations of these systems and explores their potential future applications in sports and rehabilitation
A Novel Two Stream Decision Level Fusion of Vision and Inertial Sensors Data for Automatic Multimodal Human Activity Recognition System
This paper presents a novel multimodal human activity recognition system. It
uses a two-stream decision level fusion of vision and inertial sensors. In the
first stream, raw RGB frames are passed to a part affinity field-based pose
estimation network to detect the keypoints of the user. These keypoints are
then pre-processed and inputted in a sliding window fashion to a specially
designed convolutional neural network for the spatial feature extraction
followed by regularized LSTMs to calculate the temporal features. The outputs
of LSTM networks are then inputted to fully connected layers for
classification. In the second stream, data obtained from inertial sensors are
pre-processed and inputted to regularized LSTMs for the feature extraction
followed by fully connected layers for the classification. At this stage, the
SoftMax scores of two streams are then fused using the decision level fusion
which gives the final prediction. Extensive experiments are conducted to
evaluate the performance. Four multimodal standard benchmark datasets (UP-Fall
detection, UTD-MHAD, Berkeley-MHAD, and C-MHAD) are used for experimentations.
The accuracies obtained by the proposed system are 96.9 %, 97.6 %, 98.7 %, and
95.9 % respectively on the UP-Fall Detection, UTDMHAD, Berkeley-MHAD, and
C-MHAD datasets. These results are far superior than the current
state-of-the-art methods
Efficient Body Motion Quantification and Similarity Evaluation Using 3-D Joints Skeleton Coordinates
Regress 3D human pose from 2D skeleton with kinematics knowledge
3D human pose estimation is a hot topic in the field of computer vision. It provides data support for tasks such as pose recognition, human tracking and action recognition. Therefore, it is widely applied in the fields of advanced human-computer interaction, intelligent monitoring and so on. Estimating 3D human pose from a single 2D image is an ill-posed problem and is likely to cause low prediction accuracy, due to the problems of self-occlusion and depth ambiguity. This paper developed two types of human kinematics to improve the estimation accuracy. First, taking the 2D human body skeleton sequence obtained by the 2D human body pose detector as input, a temporal convolutional network is proposed to develop the movement periodicity in temporal domain. Second, geometrical prior knowledge is introduced into the model to constrain the estimated pose to fit the general kinematics knowledge. The experiments are tested on Human3.6M and MPII (Max Planck Institut Informatik) Human Pose (MPI-INF-3DHP) datasets, and the proposed model shows better generalization ability compared with the baseline and the state-of-the-art models
Database for 3D human pose estimation from single depth images
This work is part of the project I-Â-DRESS (Assistive interactive robotic system for support in dressing). The specific objective is the detection of human body postures and the tracking of their movements. To this end, this work aims to create the image database needed for the training of the algorithms of pose estimation for the artificial vision of the robotic system, based on the depth images obtained by a sensor Time-Â-of-Â-Flight (ToF) depth camera type, such as the incorporated by the Kinect One (Kinect v2) device.Peer ReviewedPreprin