63,426 research outputs found
Real-Time Human Motion Capture with Multiple Depth Cameras
Commonly used human motion capture systems require intrusive attachment of
markers that are visually tracked with multiple cameras. In this work we
present an efficient and inexpensive solution to markerless motion capture
using only a few Kinect sensors. Unlike the previous work on 3d pose estimation
using a single depth camera, we relax constraints on the camera location and do
not assume a co-operative user. We apply recent image segmentation techniques
to depth images and use curriculum learning to train our system on purely
synthetic data. Our method accurately localizes body parts without requiring an
explicit shape model. The body joint locations are then recovered by combining
evidence from multiple views in real-time. We also introduce a dataset of ~6
million synthetic depth frames for pose estimation from multiple cameras and
exceed state-of-the-art results on the Berkeley MHAD dataset.Comment: Accepted to computer robot vision 201
EPose: Energy-Efficient Edge-assisted Multi-camera System for Multi-human 3D Pose Estimation
Multi-human 3D pose estimation plays a key role in establishing a seamless
connection between the real world and the virtual world. Recent efforts adopted
a two-stage framework that first builds 2D pose estimations in multiple camera
views from different perspectives and then synthesizes them into 3D poses.
However, the focus has largely been on developing new computer vision
algorithms on the offline video datasets without much consideration on the
energy constraints in real-world systems with flexibly-deployed and
battery-powered cameras. In this paper, we propose an energy-efficient
edge-assisted multiple-camera system, dubbed EPose, for real-time
multi-human 3D pose estimation, based on the key idea of adaptive camera
selection. Instead of always employing all available cameras to perform 2D pose
estimations as in the existing works, EPose selects only a subset of
cameras depending on their camera view qualities in terms of occlusion and
energy states in an adaptive manner, thereby reducing the energy consumption
(which translates to extended battery lifetime) and improving the estimation
accuracy. To achieve this goal, EPose incorporates an attention-based LSTM
to predict the occlusion information of each camera view and guide camera
selection before cameras are selected to process the images of a scene, and
runs a camera selection algorithm based on the Lyapunov optimization framework
to make long-term adaptive selection decisions. We build a prototype of
EPose on a 5-camera testbed, demonstrate its feasibility and evaluate its
performance. Our results show that a significant energy saving (up to 31.21%)
can be achieved while maintaining a high 3D pose estimation accuracy comparable
to state-of-the-art methods
Continuous close-range 3D object pose estimation
In the context of future manufacturing lines, removing fixtures will be a
fundamental step to increase the flexibility of autonomous systems in assembly
and logistic operations. Vision-based 3D pose estimation is a necessity to
accurately handle objects that might not be placed at fixed positions during
the robot task execution. Industrial tasks bring multiple challenges for the
robust pose estimation of objects such as difficult object properties, tight
cycle times and constraints on camera views. In particular, when interacting
with objects, we have to work with close-range partial views of objects that
pose a new challenge for typical view-based pose estimation methods. In this
paper, we present a 3D pose estimation method based on a gradient-ascend
particle filter that integrates new observations on-the-fly to improve the pose
estimate. Thereby, we can apply this method online during task execution to
save valuable cycle time. In contrast to other view-based pose estimation
methods, we model potential views in full 6- dimensional space that allows us
to cope with close-range partial objects views. We demonstrate the approach on
a real assembly task, in which the algorithm usually converges to the correct
pose within 10-15 iterations with an average accuracy of less than 8mm
CREPES: Cooperative RElative Pose Estimation System
Mutual localization plays a crucial role in multi-robot cooperation. CREPES,
a novel system that focuses on six degrees of freedom (DOF) relative pose
estimation for multi-robot systems, is proposed in this paper. CREPES has a
compact hardware design using active infrared (IR) LEDs, an IR fish-eye camera,
an ultra-wideband (UWB) module and an inertial measurement unit (IMU). By
leveraging IR light communication, the system solves data association between
visual detection and UWB ranging. Ranging measurements from the UWB and
directional information from the camera offer relative 3-DOF position
estimation. Combining the mutual relative position with neighbors and the
gravity constraints provided by IMUs, we can estimate the 6-DOF relative pose
from a single frame of sensor measurements. In addition, we design an estimator
based on the error-state Kalman filter (ESKF) to enhance system accuracy and
robustness. When multiple neighbors are available, a Pose Graph Optimization
(PGO) algorithm is applied to further improve system accuracy. We conduct
enormous experiments to demonstrate CREPES' accuracy between robot pairs and a
team of robots, as well as performance under challenging conditions
Audiovisual head orientation estimation with particle filtering in multisensor scenarios
This article presents a multimodal approach to head pose estimation of individuals in environments equipped with multiple cameras and microphones, such as SmartRooms or automatic video conferencing. Determining the individuals head orientation is the basis for many forms of more sophisticated interactions between humans and technical devices and can also be used for automatic sensor selection (camera, microphone) in communications or video surveillance systems. The use of particle filters as a unified framework for the estimation of the head orientation for both monomodal and multimodal cases is proposed. In video, we estimate head orientation from color information by exploiting spatial redundancy among cameras. Audio information is processed to estimate the direction of the voice produced by a speaker making use of the directivity characteristics of the head radiation pattern. Furthermore, two different particle filter multimodal information fusion schemes for combining the audio and video streams are analyzed in terms of accuracy and robustness. In the first one, fusion is performed at a decision level by combining each monomodal head pose estimation, while the second one uses a joint estimation system combining information at data level. Experimental results conducted over the CLEAR 2006 evaluation database are reported and the comparison of the proposed multimodal head pose estimation algorithms with the reference monomodal approaches proves the effectiveness of the proposed approach.Postprint (published version
Image-based Flight Control of Unmanned Aerial Vehicles (UAVs) for Material Handling in Custom Manufacturing
This study introduces an approach for and the challenges in employing unmanned aerial vehicles (UAVs) for material handling in the emerging industrial custom manufacturing environments. Compared with conventional industrial robotic systems, UAVs offer enhanced flexibility for the design and on-the-fly variation of the pathways and workflow to optimally perform multiple tasks on demand, besides offering favorable cost and dimensional footprint factors. A fundamental challenge to the deployment of UAVs in manufacturing and other indoor industrial settings lies in ensuring the accuracy of a drone’s localization and flight path. Earlier approaches based on using multiple sensors (e.g., GPS, IMU) to improve the localization accuracy of UAVs are considered ineffective in indoor environments. In fact, few investigations have tackled the issues arising due to the limited space and complicated components and moving entities, human presence in shop-floor environments. Towards addressing this challenge, a pose estimation method that employs just a single camera onboard with a UAV, together with multiple ArUco markers positioned strategically over the shop-floor is implemented to track the real-time location of a UAV. A Kalman filter is applied to mitigate noise effects for pose estimation. To assess the performance of this method, several experiments were carried out in Texas A&M University’s manufacturing labs. The result suggests that Kalman filter can reduce the variance of pose estimation by 88.48% compared to a conventional camera and marker-based motion tracking method (~ 27 cm) and can localize (via averaging) the position to within 8 cm of the actual target location
Real-time marker-less multi-person 3D pose estimation in RGB-Depth camera networks
This paper proposes a novel system to estimate and track the 3D poses of
multiple persons in calibrated RGB-Depth camera networks. The multi-view 3D
pose of each person is computed by a central node which receives the
single-view outcomes from each camera of the network. Each single-view outcome
is computed by using a CNN for 2D pose estimation and extending the resulting
skeletons to 3D by means of the sensor depth. The proposed system is
marker-less, multi-person, independent of background and does not make any
assumption on people appearance and initial pose. The system provides real-time
outcomes, thus being perfectly suited for applications requiring user
interaction. Experimental results show the effectiveness of this work with
respect to a baseline multi-view approach in different scenarios. To foster
research and applications based on this work, we released the source code in
OpenPTrack, an open source project for RGB-D people tracking.Comment: Submitted to the 2018 IEEE International Conference on Robotics and
Automatio
- …