133,207 research outputs found
Best Viewpoint Tracking for Camera Mounted on Robotic Arm with Dynamic Obstacles
The problem of finding a next best viewpoint for 3D modeling or scene mapping
has been explored in computer vision over the last decade. This paper tackles a
similar problem, but with different characteristics. It proposes a method for
dynamic next best viewpoint recovery of a target point while avoiding possible
occlusions. Since the environment can change, the method has to iteratively
find the next best view with a global understanding of the free and occupied
parts.
We model the problem as a set of possible viewpoints which correspond to the
centers of the facets of a virtual tessellated hemisphere covering the scene.
Taking into account occlusions, distances between current and future
viewpoints, quality of the viewpoint and joint constraints (robot arm joint
distances or limits), we evaluate the next best viewpoint. The proposal has
been evaluated on 8 different scenarios with different occlusions and a short
3D video sequence to validate its dynamic performance.Comment: 10 pages, 6 figures, poster in 3DV conferenc
Matterport3D: Learning from RGB-D Data in Indoor Environments
Access to large, diverse RGB-D datasets is critical for training RGB-D scene
understanding algorithms. However, existing datasets still cover only a limited
number of views or a restricted scale of spaces. In this paper, we introduce
Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views
from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided
with surface reconstructions, camera poses, and 2D and 3D semantic
segmentations. The precise global alignment and comprehensive, diverse
panoramic set of views over entire buildings enable a variety of supervised and
self-supervised computer vision tasks, including keypoint matching, view
overlap prediction, normal prediction from color, semantic segmentation, and
region classification
Disparity map generation based on trapezoidal camera architecture for multiview video
Visual content acquisition is a strategic functional block of any visual system. Despite its wide possibilities,
the arrangement of cameras for the acquisition of good quality visual content for use in multi-view video
remains a huge challenge. This paper presents the mathematical description of trapezoidal camera
architecture and relationships which facilitate the determination of camera position for visual content
acquisition in multi-view video, and depth map generation. The strong point of Trapezoidal Camera
Architecture is that it allows for adaptive camera topology by which points within the scene, especially the
occluded ones can be optically and geometrically viewed from several different viewpoints either on the
edge of the trapezoid or inside it. The concept of maximum independent set, trapezoid characteristics, and
the fact that the positions of cameras (with the exception of few) differ in their vertical coordinate
description could very well be used to address the issue of occlusion which continues to be a major
problem in computer vision with regards to the generation of depth map
Scalable virtual viewpoint image synthesis for multiple camera environments
One of the main aims of emerging audio-visual (AV) applications is to provide interactive navigation within a captured event or scene. This paper presents a view synthesis algorithm that provides a scalable and flexible approach to virtual viewpoint synthesis in multiple camera environments. The multi-view synthesis (MVS) process consists of four different phases that are described in detail: surface identification, surface selection, surface boundary blending and surface reconstruction. MVS view synthesis identifies and selects only the best quality surface areas from the set of available reference images, thereby reducing perceptual errors in virtual view reconstruction. The approach is camera setup independent and scalable as virtual views can be created given 1 to N of the available video inputs. Thus, MVS provides interactive AV applications with a means to handle scenarios where camera inputs increase or decrease over time
Interactive Camera Network Design using a Virtual Reality Interface
Traditional literature on camera network design focuses on constructing
automated algorithms. These require problem specific input from experts in
order to produce their output. The nature of the required input is highly
unintuitive leading to an unpractical workflow for human operators. In this
work we focus on developing a virtual reality user interface allowing human
operators to manually design camera networks in an intuitive manner. From real
world practical examples we conclude that the camera networks designed using
this interface are highly competitive with, or superior to those generated by
automated algorithms, but the associated workflow is much more intuitive and
simple. The competitiveness of the human-generated camera networks is
remarkable because the structure of the optimization problem is a well known
combinatorial NP-hard problem. These results indicate that human operators can
be used in challenging geometrical combinatorial optimization problems given an
intuitive visualization of the problem.Comment: 11 pages, 8 figure
Understanding deep features with computer-generated imagery
We introduce an approach for analyzing the variation of features generated by
convolutional neural networks (CNNs) with respect to scene factors that occur
in natural images. Such factors may include object style, 3D viewpoint, color,
and scene lighting configuration. Our approach analyzes CNN feature responses
corresponding to different scene factors by controlling for them via rendering
using a large database of 3D CAD models. The rendered images are presented to a
trained CNN and responses for different layers are studied with respect to the
input scene factors. We perform a decomposition of the responses based on
knowledge of the input scene factors and analyze the resulting components. In
particular, we quantify their relative importance in the CNN responses and
visualize them using principal component analysis. We show qualitative and
quantitative results of our study on three CNNs trained on large image
datasets: AlexNet, Places, and Oxford VGG. We observe important differences
across the networks and CNN layers for different scene factors and object
categories. Finally, we demonstrate that our analysis based on
computer-generated imagery translates to the network representation of natural
images
- âŠ