7,762 research outputs found
Where and Who? Automatic Semantic-Aware Person Composition
Image compositing is a method used to generate realistic yet fake imagery by
inserting contents from one image to another. Previous work in compositing has
focused on improving appearance compatibility of a user selected foreground
segment and a background image (i.e. color and illumination consistency). In
this work, we instead develop a fully automated compositing model that
additionally learns to select and transform compatible foreground segments from
a large collection given only an input image background. To simplify the task,
we restrict our problem by focusing on human instance composition, because
human segments exhibit strong correlations with their background and because of
the availability of large annotated data. We develop a novel branching
Convolutional Neural Network (CNN) that jointly predicts candidate person
locations given a background image. We then use pre-trained deep feature
representations to retrieve person instances from a large segment database.
Experimental results show that our model can generate composite images that
look visually convincing. We also develop a user interface to demonstrate the
potential application of our method.Comment: 10 pages, 9 figure
Matterport3D: Learning from RGB-D Data in Indoor Environments
Access to large, diverse RGB-D datasets is critical for training RGB-D scene
understanding algorithms. However, existing datasets still cover only a limited
number of views or a restricted scale of spaces. In this paper, we introduce
Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views
from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided
with surface reconstructions, camera poses, and 2D and 3D semantic
segmentations. The precise global alignment and comprehensive, diverse
panoramic set of views over entire buildings enable a variety of supervised and
self-supervised computer vision tasks, including keypoint matching, view
overlap prediction, normal prediction from color, semantic segmentation, and
region classification
Recovering 6D Object Pose: A Review and Multi-modal Analysis
A large number of studies analyse object detection and pose estimation at
visual level in 2D, discussing the effects of challenges such as occlusion,
clutter, texture, etc., on the performances of the methods, which work in the
context of RGB modality. Interpreting the depth data, the study in this paper
presents thorough multi-modal analyses. It discusses the above-mentioned
challenges for full 6D object pose estimation in RGB-D images comparing the
performances of several 6D detectors in order to answer the following
questions: What is the current position of the computer vision community for
maintaining "automation" in robotic manipulation? What next steps should the
community take for improving "autonomy" in robotics while handling objects? Our
findings include: (i) reasonably accurate results are obtained on
textured-objects at varying viewpoints with cluttered backgrounds. (ii) Heavy
existence of occlusion and clutter severely affects the detectors, and
similar-looking distractors is the biggest challenge in recovering instances'
6D. (iii) Template-based methods and random forest-based learning algorithms
underlie object detection and 6D pose estimation. Recent paradigm is to learn
deep discriminative feature representations and to adopt CNNs taking RGB images
as input. (iv) Depending on the availability of large-scale 6D annotated depth
datasets, feature representations can be learnt on these datasets, and then the
learnt representations can be customized for the 6D problem
Matching Local Invariant Features with Contextual Information : an Experimental Evaluation
The main advantage of using local invariant features is their local character which yields robustness to occlusion and varying background. Therefore, local features have proved to be a powerful tool for finding correspondences between images, and have been employed in many applications. However, the local character limits the descriptive capability of features descriptors, and local features fail to resolve ambiguities that can occur when an image shows multiple similar regions. Considering some global information will clearly help to achieve better performances. The question is which information to use and how to use it. Context can be used to enrich the description of the features, or used in the matching step to filter out mismatches. In this paper, we compare different recent methods which use context for matching and show that better results are obtained if contextual information is used during the matching process. We evaluate the methods in two applications: wide baseline matching and object recognition, and it appears that a relaxation based approach gives the best results
- …