10,999 research outputs found
Exploring Convolutional Networks for End-to-End Visual Servoing
Present image based visual servoing approaches rely on extracting hand
crafted visual features from an image. Choosing the right set of features is
important as it directly affects the performance of any approach. Motivated by
recent breakthroughs in performance of data driven methods on recognition and
localization tasks, we aim to learn visual feature representations suitable for
servoing tasks in unstructured and unknown environments. In this paper, we
present an end-to-end learning based approach for visual servoing in diverse
scenes where the knowledge of camera parameters and scene geometry is not
available a priori. This is achieved by training a convolutional neural network
over color images with synchronised camera poses. Through experiments performed
in simulation and on a quadrotor, we demonstrate the efficacy and robustness of
our approach for a wide range of camera poses in both indoor as well as outdoor
environments.Comment: IEEE ICRA 201
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
We present the first real-time method to capture the full global 3D skeletal
pose of a human in a stable, temporally consistent manner using a single RGB
camera. Our method combines a new convolutional neural network (CNN) based pose
regressor with kinematic skeleton fitting. Our novel fully-convolutional pose
formulation regresses 2D and 3D joint positions jointly in real time and does
not require tightly cropped input frames. A real-time kinematic skeleton
fitting method uses the CNN output to yield temporally stable 3D global pose
reconstructions on the basis of a coherent kinematic skeleton. This makes our
approach the first monocular RGB method usable in real-time applications such
as 3D character control---thus far, the only monocular methods for such
applications employed specialized RGB-D cameras. Our method's accuracy is
quantitatively on par with the best offline 3D monocular RGB pose estimation
methods. Our results are qualitatively comparable to, and sometimes better
than, results from monocular RGB-D approaches, such as the Kinect. However, we
show that our approach is more broadly applicable than RGB-D solutions, i.e. it
works for outdoor scenes, community videos, and low quality commodity RGB
cameras.Comment: Accepted to SIGGRAPH 201
- …