16,987 research outputs found
iPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects
We address the task of 6D pose estimation of known rigid objects from single
input images in scenarios where the objects are partly occluded. Recent
RGB-D-based methods are robust to moderate degrees of occlusion. For RGB
inputs, no previous method works well for partly occluded objects. Our main
contribution is to present the first deep learning-based system that estimates
accurate poses for partly occluded objects from RGB-D and RGB input. We achieve
this with a new instance-aware pipeline that decomposes 6D object pose
estimation into a sequence of simpler steps, where each step removes specific
aspects of the problem. The first step localizes all known objects in the image
using an instance segmentation network, and hence eliminates surrounding
clutter and occluders. The second step densely maps pixels to 3D object surface
positions, so called object coordinates, using an encoder-decoder network, and
hence eliminates object appearance. The third, and final, step predicts the 6D
pose using geometric optimization. We demonstrate that we significantly
outperform the state-of-the-art for pose estimation of partly occluded objects
for both RGB and RGB-D input
Loss-resilient Coding of Texture and Depth for Free-viewpoint Video Conferencing
Free-viewpoint video conferencing allows a participant to observe the remote
3D scene from any freely chosen viewpoint. An intermediate virtual viewpoint
image is commonly synthesized using two pairs of transmitted texture and depth
maps from two neighboring captured viewpoints via depth-image-based rendering
(DIBR). To maintain high quality of synthesized images, it is imperative to
contain the adverse effects of network packet losses that may arise during
texture and depth video transmission. Towards this end, we develop an
integrated approach that exploits the representation redundancy inherent in the
multiple streamed videos a voxel in the 3D scene visible to two captured views
is sampled and coded twice in the two views. In particular, at the receiver we
first develop an error concealment strategy that adaptively blends
corresponding pixels in the two captured views during DIBR, so that pixels from
the more reliable transmitted view are weighted more heavily. We then couple it
with a sender-side optimization of reference picture selection (RPS) during
real-time video coding, so that blocks containing samples of voxels that are
visible in both views are more error-resiliently coded in one view only, given
adaptive blending will erase errors in the other view. Further, synthesized
view distortion sensitivities to texture versus depth errors are analyzed, so
that relative importance of texture and depth code blocks can be computed for
system-wide RPS optimization. Experimental results show that the proposed
scheme can outperform the use of a traditional feedback channel by up to 0.82
dB on average at 8% packet loss rate, and by as much as 3 dB for particular
frames
Subjectivity and complexity of facial attractiveness
The origin and meaning of facial beauty represent a longstanding puzzle.
Despite the profuse literature devoted to facial attractiveness, its very
nature, its determinants and the nature of inter-person differences remain
controversial issues. Here we tackle such questions proposing a novel
experimental approach in which human subjects, instead of rating natural faces,
are allowed to efficiently explore the face-space and 'sculpt' their favorite
variation of a reference facial image. The results reveal that different
subjects prefer distinguishable regions of the face-space, highlighting the
essential subjectivity of the phenomenon.The different sculpted facial vectors
exhibit strong correlations among pairs of facial distances, characterising the
underlying universality and complexity of the cognitive processes, and the
relative relevance and robustness of the different facial distances.Comment: 15 pages, 5 figures. Supplementary information: 26 pages, 13 figure
Joint Material and Illumination Estimation from Photo Sets in the Wild
Faithful manipulation of shape, material, and illumination in 2D Internet
images would greatly benefit from a reliable factorization of appearance into
material (i.e., diffuse and specular) and illumination (i.e., environment
maps). On the one hand, current methods that produce very high fidelity
results, typically require controlled settings, expensive devices, or
significant manual effort. To the other hand, methods that are automatic and
work on 'in the wild' Internet images, often extract only low-frequency
lighting or diffuse materials. In this work, we propose to make use of a set of
photographs in order to jointly estimate the non-diffuse materials and sharp
lighting in an uncontrolled setting. Our key observation is that seeing
multiple instances of the same material under different illumination (i.e.,
environment), and different materials under the same illumination provide
valuable constraints that can be exploited to yield a high-quality solution
(i.e., specular materials and environment illumination) for all the observed
materials and environments. Similar constraints also arise when observing
multiple materials in a single environment, or a single material across
multiple environments. The core of this approach is an optimization procedure
that uses two neural networks that are trained on synthetic images to predict
good gradients in parametric space given observation of reflected light. We
evaluate our method on a range of synthetic and real examples to generate
high-quality estimates, qualitatively compare our results against
state-of-the-art alternatives via a user study, and demonstrate
photo-consistent image manipulation that is otherwise very challenging to
achieve
- …