57 research outputs found
Real-Time Human Motion Capture with Multiple Depth Cameras
Commonly used human motion capture systems require intrusive attachment of
markers that are visually tracked with multiple cameras. In this work we
present an efficient and inexpensive solution to markerless motion capture
using only a few Kinect sensors. Unlike the previous work on 3d pose estimation
using a single depth camera, we relax constraints on the camera location and do
not assume a co-operative user. We apply recent image segmentation techniques
to depth images and use curriculum learning to train our system on purely
synthetic data. Our method accurately localizes body parts without requiring an
explicit shape model. The body joint locations are then recovered by combining
evidence from multiple views in real-time. We also introduce a dataset of ~6
million synthetic depth frames for pose estimation from multiple cameras and
exceed state-of-the-art results on the Berkeley MHAD dataset.Comment: Accepted to computer robot vision 201
Hybrid One-Shot 3D Hand Pose Estimation by Exploiting Uncertainties
Model-based approaches to 3D hand tracking have been shown to perform well in
a wide range of scenarios. However, they require initialisation and cannot
recover easily from tracking failures that occur due to fast hand motions.
Data-driven approaches, on the other hand, can quickly deliver a solution, but
the results often suffer from lower accuracy or missing anatomical validity
compared to those obtained from model-based approaches. In this work we propose
a hybrid approach for hand pose estimation from a single depth image. First, a
learned regressor is employed to deliver multiple initial hypotheses for the 3D
position of each hand joint. Subsequently, the kinematic parameters of a 3D
hand model are found by deliberately exploiting the inherent uncertainty of the
inferred joint proposals. This way, the method provides anatomically valid and
accurate solutions without requiring manual initialisation or suffering from
track losses. Quantitative results on several standard datasets demonstrate
that the proposed method outperforms state-of-the-art representatives of the
model-based, data-driven and hybrid paradigms.Comment: BMVC 2015 (oral); see also
http://lrs.icg.tugraz.at/research/hybridhape
Collaborative voting of 3D features for robust gesture estimation
© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Human body analysis raises special interest because it enables a wide range of interactive applications. In this paper we present a gesture estimator that discriminates body poses in depth images. A novel collaborative method is proposed to learn 3D features of the human body and, later, to estimate specific gestures. The collaborative estimation framework is inspired by decision forests, where each selected point (anchor point) contributes to the estimation by casting votes. The main idea is to detect a body part by accumulating the inference of other trained body parts. The collaborative voting encodes the global context of human pose, while 3D features represent local appearance. Body parts contributing to the detection are interpreted as a voting process. Experimental results for different 3D features prove the validity of the proposed algorithm.Peer ReviewedPostprint (author's final draft
A wearable and non-wearable approach for gesture recognition: initial results
A natural way of communication between humans
are gestures. Through this type of non-verbal communication, the
human interaction may change since it is possible to send a
particular message or capture the attention of the other peer. In
the human-computer interaction the capture of such gestures has
been a topic of interest where the goal is to classify human gestures
in different scenarios. Applying machine learning techniques, one
may be able to track and recognize human gestures and use the
gathered information to assess the medical condition of a person
regarding, for example, motor impairments. According to the type
of movement and to the target population one may use different
wearable or non-wearable sensors. In this work, we are using a
hybrid approach for automatically detecting the ball throwing
movement by applying a Microsoft Kinect (non-wearable) and the
Pandlet (set of wearable sensors such as accelerometer, gyroscope,
among others). After creating a dataset of 10 participants, a SVM
model with a DTW kernel is trained and used as a classification
tool. The system performance was quantified in terms of confusion
matrix, accuracy, sensitivity and specificity, Area Under the
Curve, and Mathews Correlation Coefficient metrics. The
obtained results point out that the present system is able to
recognize the selected throwing gestures and that the overall
performance of the Kinect is better compared to the Pandlet.This article is a result of the project Deus Ex Machina: NORTE-01-0145-FEDER-000026, supported by Norte Portugal Regional Operational Program (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF).info:eu-repo/semantics/publishedVersio
Integrated Multi-view 3D Image Capture and Motion Parallax 3D Display System
We propose an integrated 3D image capture and display system using a transversely moving camera, regular 2D display screen and user tracking that can facilitate the multi-view capture of a real scene or object and display the captured perspective views in 3D. The motion parallax 3D technique is used to capture the depth information of the object and display the corresponding views to the user using head tracking. The system is composed of two parts, the first part consists of a horizontally moving camera interfaced with a customized camera control and capture application. The second part consist of a regular LCD screen combined with web camera and user tracking application. The 3D multi-view images captured through the imaging setup are relayed to the display based on the user location and corresponding view is dynamically displayed on the screen based on the viewing angle of the user with respect to the screen. The developed prototype system provides the multi-view capture of 60 views with the step size of 1 cm and greater than 40˚ field-of-view overlap. The display system relays 60 views providing the viewing angle coverage of ±35˚ where the angular difference between two views is 1.2˚
Human Pose Detection for Robotic-Assisted and Rehabilitation Environments
Assistance and rehabilitation robotic platforms must have precise sensory systems for human–robot interaction. Therefore, human pose estimation is a current topic of research, especially for the safety of human–robot collaboration and the evaluation of human biomarkers. Within this field of research, the evaluation of the low-cost marker-less human pose estimators of OpenPose and Detectron 2 has received much attention for their diversity of applications, such as surveillance, sports, videogames, and assessment in human motor rehabilitation. This work aimed to evaluate and compare the angles in the elbow and shoulder joints estimated by OpenPose and Detectron 2 during four typical upper-limb rehabilitation exercises: elbow side flexion, elbow flexion, shoulder extension, and shoulder abduction. A setup of two Kinect 2 RGBD cameras was used to obtain the ground truth of the joint and skeleton estimations during the different exercises. Finally, we provided a numerical comparison (RMSE and MAE) among the angle measurements obtained with OpenPose, Detectron 2, and the ground truth. The results showed how OpenPose outperforms Detectron 2 in these types of applications.Óscar G. Hernández holds a grant from the Spanish Fundación Carolina, the University of Alicante, and the National Autonomous University of Honduras
Phase messaging method for time-of-flight cameras
Ubiquitous light emitting devices and low-cost commercial digital cameras facilitate optical wireless communication system such as visual MIMO where handheld cameras communicate with electronic displays. While intensity-based optical communications are more prevalent in camera-display messaging, we present a novel method that uses modulated light phase for messaging and time-of-flight (ToF) cameras for receivers. With intensity-based methods, light signals can be degraded by reflections and ambient illumination. By comparison, communication using ToF cameras is more robust against challenging lighting conditions. Additionally, the concept of phase messaging can be combined with intensity messaging for a significant data rate advantage. In this work, we design and construct a phase messaging array (PMA), which is the first of its kind, to communicate to a ToF depth camera by manipulating the phase of the depth camera's infrared light signal. The array enables message variation spatially using a plane of infrared light emitting diodes and temporally by varying the induced phase shift. In this manner, the phase messaging array acts as the transmitter by electronically controlling the light signal phase. The ToF camera acts as the receiver by observing and recording a time-varying depth. We show a complete implementation of a 3×3 prototype array with custom hardware and demonstrating average bit accuracy as high as 97.8%. The prototype data rate with this approach is 1 Kbps that can be extended to approximately 10 Mbps.National Science Foundation (U.S.) (grant CNS-106546#
- …