69,284 research outputs found
A Multi-view Pixel-wise Voting Network for 6DoF Pose Estimation
6DoF pose estimation is an important task in the Computer Vision field
for what regards robotic and automotive applications. Many recent approaches successfully perform pose estimation on monocular images, which
lack depth information. In this work, the potential of extending such
methods to a multi-view setting is explored, in order to recover depth information from geometrical relations between the views. In particular two
different multi-view adaptations for a particular monocular pose estimator, called PVNet, are developed, by either combining monocular results
on the individual views or by modifying the original method to take in
input directly the set of views. The new models are evaluated on the TOD
transparent object dataset and compared against the original PVNet implementation, a depth-based pose estimation called DenseFusion, and the
method proposed by the authors of the dataset, called Keypose. Experimental results show that integrating multi-view information significantly
increases test accuracy and that both models outperform DenseFusion,
while still being slightly surpassed by Keypose
EPose: Energy-Efficient Edge-assisted Multi-camera System for Multi-human 3D Pose Estimation
Multi-human 3D pose estimation plays a key role in establishing a seamless
connection between the real world and the virtual world. Recent efforts adopted
a two-stage framework that first builds 2D pose estimations in multiple camera
views from different perspectives and then synthesizes them into 3D poses.
However, the focus has largely been on developing new computer vision
algorithms on the offline video datasets without much consideration on the
energy constraints in real-world systems with flexibly-deployed and
battery-powered cameras. In this paper, we propose an energy-efficient
edge-assisted multiple-camera system, dubbed EPose, for real-time
multi-human 3D pose estimation, based on the key idea of adaptive camera
selection. Instead of always employing all available cameras to perform 2D pose
estimations as in the existing works, EPose selects only a subset of
cameras depending on their camera view qualities in terms of occlusion and
energy states in an adaptive manner, thereby reducing the energy consumption
(which translates to extended battery lifetime) and improving the estimation
accuracy. To achieve this goal, EPose incorporates an attention-based LSTM
to predict the occlusion information of each camera view and guide camera
selection before cameras are selected to process the images of a scene, and
runs a camera selection algorithm based on the Lyapunov optimization framework
to make long-term adaptive selection decisions. We build a prototype of
EPose on a 5-camera testbed, demonstrate its feasibility and evaluate its
performance. Our results show that a significant energy saving (up to 31.21%)
can be achieved while maintaining a high 3D pose estimation accuracy comparable
to state-of-the-art methods
RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints
We propose a Convolutional Neural Network (CNN)-based model "RotationNet,"
which takes multi-view images of an object as input and jointly estimates its
pose and object category. Unlike previous approaches that use known viewpoint
labels for training, our method treats the viewpoint labels as latent
variables, which are learned in an unsupervised manner during the training
using an unaligned object dataset. RotationNet is designed to use only a
partial set of multi-view images for inference, and this property makes it
useful in practical scenarios where only partial views are available. Moreover,
our pose alignment strategy enables one to obtain view-specific feature
representations shared across classes, which is important to maintain high
accuracy in both object categorization and pose estimation. Effectiveness of
RotationNet is demonstrated by its superior performance to the state-of-the-art
methods of 3D object classification on 10- and 40-class ModelNet datasets. We
also show that RotationNet, even trained without known poses, achieves the
state-of-the-art performance on an object pose estimation dataset. The code is
available on https://github.com/kanezaki/rotationnetComment: 24 pages, 23 figures. Accepted to CVPR 201
A multi-viewpoint feature-based re-identification system driven by skeleton keypoints
Thanks to the increasing popularity of 3D sensors, robotic vision has experienced huge improvements in a wide range of applications and systems in the last years. Besides the many benefits, this migration caused some incompatibilities with those systems that cannot be based on range sensors, like intelligent video surveillance systems, since the two kinds of sensor data lead to different representations of people and objects. This work goes in the direction of bridging the gap, and presents a novel re-identification system that takes advantage of multiple video flows in order to enhance the performance of a skeletal tracking algorithm, which is in turn exploited for driving the re-identification. A new, geometry-based method for joining together the detections provided by the skeletal tracker from multiple video flows is introduced, which is capable of dealing with many people in the scene, coping with the errors introduced in each view by the skeletal tracker. Such method has a high degree of generality, and can be applied to any kind of body pose estimation algorithm. The system was tested on a public dataset for video surveillance applications, demonstrating the improvements achieved by the multi-viewpoint approach in the accuracy of both body pose estimation and re-identification. The proposed approach was also compared with a skeletal tracking system working on 3D data: the comparison assessed the good performance level of the multi-viewpoint approach. This means that the lack of the rich information provided by 3D sensors can be compensated by the availability of more than one viewpoint
- …