63,426 research outputs found

    Real-Time Human Motion Capture with Multiple Depth Cameras

    Full text link
    Commonly used human motion capture systems require intrusive attachment of markers that are visually tracked with multiple cameras. In this work we present an efficient and inexpensive solution to markerless motion capture using only a few Kinect sensors. Unlike the previous work on 3d pose estimation using a single depth camera, we relax constraints on the camera location and do not assume a co-operative user. We apply recent image segmentation techniques to depth images and use curriculum learning to train our system on purely synthetic data. Our method accurately localizes body parts without requiring an explicit shape model. The body joint locations are then recovered by combining evidence from multiple views in real-time. We also introduce a dataset of ~6 million synthetic depth frames for pose estimation from multiple cameras and exceed state-of-the-art results on the Berkeley MHAD dataset.Comment: Accepted to computer robot vision 201

    E3^3Pose: Energy-Efficient Edge-assisted Multi-camera System for Multi-human 3D Pose Estimation

    Full text link
    Multi-human 3D pose estimation plays a key role in establishing a seamless connection between the real world and the virtual world. Recent efforts adopted a two-stage framework that first builds 2D pose estimations in multiple camera views from different perspectives and then synthesizes them into 3D poses. However, the focus has largely been on developing new computer vision algorithms on the offline video datasets without much consideration on the energy constraints in real-world systems with flexibly-deployed and battery-powered cameras. In this paper, we propose an energy-efficient edge-assisted multiple-camera system, dubbed E3^3Pose, for real-time multi-human 3D pose estimation, based on the key idea of adaptive camera selection. Instead of always employing all available cameras to perform 2D pose estimations as in the existing works, E3^3Pose selects only a subset of cameras depending on their camera view qualities in terms of occlusion and energy states in an adaptive manner, thereby reducing the energy consumption (which translates to extended battery lifetime) and improving the estimation accuracy. To achieve this goal, E3^3Pose incorporates an attention-based LSTM to predict the occlusion information of each camera view and guide camera selection before cameras are selected to process the images of a scene, and runs a camera selection algorithm based on the Lyapunov optimization framework to make long-term adaptive selection decisions. We build a prototype of E3^3Pose on a 5-camera testbed, demonstrate its feasibility and evaluate its performance. Our results show that a significant energy saving (up to 31.21%) can be achieved while maintaining a high 3D pose estimation accuracy comparable to state-of-the-art methods

    Continuous close-range 3D object pose estimation

    Full text link
    In the context of future manufacturing lines, removing fixtures will be a fundamental step to increase the flexibility of autonomous systems in assembly and logistic operations. Vision-based 3D pose estimation is a necessity to accurately handle objects that might not be placed at fixed positions during the robot task execution. Industrial tasks bring multiple challenges for the robust pose estimation of objects such as difficult object properties, tight cycle times and constraints on camera views. In particular, when interacting with objects, we have to work with close-range partial views of objects that pose a new challenge for typical view-based pose estimation methods. In this paper, we present a 3D pose estimation method based on a gradient-ascend particle filter that integrates new observations on-the-fly to improve the pose estimate. Thereby, we can apply this method online during task execution to save valuable cycle time. In contrast to other view-based pose estimation methods, we model potential views in full 6- dimensional space that allows us to cope with close-range partial objects views. We demonstrate the approach on a real assembly task, in which the algorithm usually converges to the correct pose within 10-15 iterations with an average accuracy of less than 8mm

    CREPES: Cooperative RElative Pose Estimation System

    Full text link
    Mutual localization plays a crucial role in multi-robot cooperation. CREPES, a novel system that focuses on six degrees of freedom (DOF) relative pose estimation for multi-robot systems, is proposed in this paper. CREPES has a compact hardware design using active infrared (IR) LEDs, an IR fish-eye camera, an ultra-wideband (UWB) module and an inertial measurement unit (IMU). By leveraging IR light communication, the system solves data association between visual detection and UWB ranging. Ranging measurements from the UWB and directional information from the camera offer relative 3-DOF position estimation. Combining the mutual relative position with neighbors and the gravity constraints provided by IMUs, we can estimate the 6-DOF relative pose from a single frame of sensor measurements. In addition, we design an estimator based on the error-state Kalman filter (ESKF) to enhance system accuracy and robustness. When multiple neighbors are available, a Pose Graph Optimization (PGO) algorithm is applied to further improve system accuracy. We conduct enormous experiments to demonstrate CREPES' accuracy between robot pairs and a team of robots, as well as performance under challenging conditions

    Audiovisual head orientation estimation with particle filtering in multisensor scenarios

    Get PDF
    This article presents a multimodal approach to head pose estimation of individuals in environments equipped with multiple cameras and microphones, such as SmartRooms or automatic video conferencing. Determining the individuals head orientation is the basis for many forms of more sophisticated interactions between humans and technical devices and can also be used for automatic sensor selection (camera, microphone) in communications or video surveillance systems. The use of particle filters as a unified framework for the estimation of the head orientation for both monomodal and multimodal cases is proposed. In video, we estimate head orientation from color information by exploiting spatial redundancy among cameras. Audio information is processed to estimate the direction of the voice produced by a speaker making use of the directivity characteristics of the head radiation pattern. Furthermore, two different particle filter multimodal information fusion schemes for combining the audio and video streams are analyzed in terms of accuracy and robustness. In the first one, fusion is performed at a decision level by combining each monomodal head pose estimation, while the second one uses a joint estimation system combining information at data level. Experimental results conducted over the CLEAR 2006 evaluation database are reported and the comparison of the proposed multimodal head pose estimation algorithms with the reference monomodal approaches proves the effectiveness of the proposed approach.Postprint (published version

    Image-based Flight Control of Unmanned Aerial Vehicles (UAVs) for Material Handling in Custom Manufacturing

    Get PDF
    This study introduces an approach for and the challenges in employing unmanned aerial vehicles (UAVs) for material handling in the emerging industrial custom manufacturing environments. Compared with conventional industrial robotic systems, UAVs offer enhanced flexibility for the design and on-the-fly variation of the pathways and workflow to optimally perform multiple tasks on demand, besides offering favorable cost and dimensional footprint factors. A fundamental challenge to the deployment of UAVs in manufacturing and other indoor industrial settings lies in ensuring the accuracy of a drone’s localization and flight path. Earlier approaches based on using multiple sensors (e.g., GPS, IMU) to improve the localization accuracy of UAVs are considered ineffective in indoor environments. In fact, few investigations have tackled the issues arising due to the limited space and complicated components and moving entities, human presence in shop-floor environments. Towards addressing this challenge, a pose estimation method that employs just a single camera onboard with a UAV, together with multiple ArUco markers positioned strategically over the shop-floor is implemented to track the real-time location of a UAV. A Kalman filter is applied to mitigate noise effects for pose estimation. To assess the performance of this method, several experiments were carried out in Texas A&M University’s manufacturing labs. The result suggests that Kalman filter can reduce the variance of pose estimation by 88.48% compared to a conventional camera and marker-based motion tracking method (~ 27 cm) and can localize (via averaging) the position to within 8 cm of the actual target location

    Real-time marker-less multi-person 3D pose estimation in RGB-Depth camera networks

    Get PDF
    This paper proposes a novel system to estimate and track the 3D poses of multiple persons in calibrated RGB-Depth camera networks. The multi-view 3D pose of each person is computed by a central node which receives the single-view outcomes from each camera of the network. Each single-view outcome is computed by using a CNN for 2D pose estimation and extending the resulting skeletons to 3D by means of the sensor depth. The proposed system is marker-less, multi-person, independent of background and does not make any assumption on people appearance and initial pose. The system provides real-time outcomes, thus being perfectly suited for applications requiring user interaction. Experimental results show the effectiveness of this work with respect to a baseline multi-view approach in different scenarios. To foster research and applications based on this work, we released the source code in OpenPTrack, an open source project for RGB-D people tracking.Comment: Submitted to the 2018 IEEE International Conference on Robotics and Automatio
    • …
    corecore