170,316 research outputs found
Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd
Object detection and 6D pose estimation in the crowd (scenes with multiple
object instances, severe foreground occlusions and background distractors), has
become an important problem in many rapidly evolving technological areas such
as robotics and augmented reality. Single shot-based 6D pose estimators with
manually designed features are still unable to tackle the above challenges,
motivating the research towards unsupervised feature learning and
next-best-view estimation. In this work, we present a complete framework for
both single shot-based 6D object pose estimation and next-best-view prediction
based on Hough Forests, the state of the art object pose estimator that
performs classification and regression jointly. Rather than using manually
designed features we a) propose an unsupervised feature learnt from
depth-invariant patches using a Sparse Autoencoder and b) offer an extensive
evaluation of various state of the art features. Furthermore, taking advantage
of the clustering performed in the leaf nodes of Hough Forests, we learn to
estimate the reduction of uncertainty in other views, formulating the problem
of selecting the next-best-view. To further improve pose estimation, we propose
an improved joint registration and hypotheses verification module as a final
refinement step to reject false detections. We provide two additional
challenging datasets inspired from realistic scenarios to extensively evaluate
the state of the art and our framework. One is related to domestic environments
and the other depicts a bin-picking scenario mostly found in industrial
settings. We show that our framework significantly outperforms state of the art
both on public and on our datasets.Comment: CVPR 2016 accepted paper, project page:
http://www.iis.ee.ic.ac.uk/rkouskou/6D_NBV.htm
Camera System Performance Derived from Natural Scenes
The Modulation Transfer Function (MTF) is a well-established measure of camera system performance, commonly employed to characterize optical and image capture systems. It is a measure based on Linear System Theory; thus, its use relies on the assumption that the system is linear and stationary. This is not the case with modern-day camera systems that incorporate non-linear image signal processes (ISP) to improve the output image. Non-linearities result in variations in camera system performance, which are dependent upon the specific input signals. This paper discusses the development of a novel framework, designed to acquire MTFs directly from images of natural complex scenes, thus making the use of traditional test charts with set patterns redundant. The framework is based on extraction, characterization and classification of edges found within images of natural scenes. Scene derived performance measures aim to characterize non-linear image processes incorporated in modern cameras more faithfully. Further, they can produce ‘live’ performance measures, acquired directly from camera feeds
Joint Blind Motion Deblurring and Depth Estimation of Light Field
Removing camera motion blur from a single light field is a challenging task
since it is highly ill-posed inverse problem. The problem becomes even worse
when blur kernel varies spatially due to scene depth variation and high-order
camera motion. In this paper, we propose a novel algorithm to estimate all blur
model variables jointly, including latent sub-aperture image, camera motion,
and scene depth from the blurred 4D light field. Exploiting multi-view nature
of a light field relieves the inverse property of the optimization by utilizing
strong depth cues and multi-view blur observation. The proposed joint
estimation achieves high quality light field deblurring and depth estimation
simultaneously under arbitrary 6-DOF camera motion and unconstrained scene
depth. Intensive experiment on real and synthetic blurred light field confirms
that the proposed algorithm outperforms the state-of-the-art light field
deblurring and depth estimation methods
BodyNet: Volumetric Inference of 3D Human Body Shapes
Human shape estimation is an important task for video editing, animation and
fashion industry. Predicting 3D human body shape from natural images, however,
is highly challenging due to factors such as variation in human bodies,
clothing and viewpoint. Prior methods addressing this problem typically attempt
to fit parametric body models with certain priors on pose and shape. In this
work we argue for an alternative representation and propose BodyNet, a neural
network for direct inference of volumetric body shape from a single image.
BodyNet is an end-to-end trainable network that benefits from (i) a volumetric
3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate
supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them
results in performance improvement as demonstrated by our experiments. To
evaluate the method, we fit the SMPL model to our network output and show
state-of-the-art results on the SURREAL and Unite the People datasets,
outperforming recent approaches. Besides achieving state-of-the-art
performance, our method also enables volumetric body-part segmentation.Comment: Appears in: European Conference on Computer Vision 2018 (ECCV 2018).
27 page
- …