1,188 research outputs found
SEGCloud: Semantic Segmentation of 3D Point Clouds
3D semantic scene labeling is fundamental to agents operating in the real
world. In particular, labeling raw 3D point sets from sensors provides
fine-grained semantics. Recent works leverage the capabilities of Neural
Networks (NNs), but are limited to coarse voxel predictions and do not
explicitly enforce global consistency. We present SEGCloud, an end-to-end
framework to obtain 3D point-level segmentation that combines the advantages of
NNs, trilinear interpolation(TI) and fully connected Conditional Random Fields
(FC-CRF). Coarse voxel predictions from a 3D Fully Convolutional NN are
transferred back to the raw 3D points via trilinear interpolation. Then the
FC-CRF enforces global consistency and provides fine-grained semantics on the
points. We implement the latter as a differentiable Recurrent NN to allow joint
optimization. We evaluate the framework on two indoor and two outdoor 3D
datasets (NYU V2, S3DIS, KITTI, Semantic3D.net), and show performance
comparable or superior to the state-of-the-art on all datasets.Comment: Accepted as a spotlight at the International Conference of 3D Vision
(3DV 2017
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
3D Reconstruction of Indoor Corridor Models Using Single Imagery and Video Sequences
In recent years, 3D indoor modeling has gained more attention due to its role in decision-making process of maintaining the status and managing the security of building indoor spaces. In this thesis, the problem of continuous indoor corridor space modeling has been tackled through two approaches. The first approach develops a modeling method based on middle-level perceptual organization. The second approach develops a visual Simultaneous Localisation and Mapping (SLAM) system with model-based loop closure.
In the first approach, the image space was searched for a corridor layout that can be converted into a geometrically accurate 3D model. Manhattan rule assumption was adopted, and indoor corridor layout hypotheses were generated through a random rule-based intersection of image physical line segments and virtual rays of orthogonal vanishing points. Volumetric reasoning, correspondences to physical edges, orientation map and geometric context of an image are all considered for scoring layout hypotheses. This approach provides physically plausible solutions while facing objects or occlusions in a corridor scene.
In the second approach, Layout SLAM is introduced. Layout SLAM performs camera localization while maps layout corners and normal point features in 3D space. Here, a new feature matching cost function was proposed considering both local and global context information. In addition, a rotation compensation variable makes Layout SLAM robust against cameras orientation errors accumulations. Moreover, layout model matching of keyframes insures accurate loop closures that prevent miss-association of newly visited landmarks to previously visited scene parts.
The comparison of generated single image-based 3D models to ground truth models showed that average ratio differences in widths, heights and lengths were 1.8%, 3.7% and 19.2% respectively. Moreover, Layout SLAM performed with the maximum absolute trajectory error of 2.4m in position and 8.2 degree in orientation for approximately 318m path on RAWSEEDS data set. Loop closing was strongly performed for Layout SLAM and provided 3D indoor corridor layouts with less than 1.05m displacement errors in length and less than 20cm in width and height for approximately 315m path on York University data set. The proposed methods can successfully generate 3D indoor corridor models compared to their major counterpart
Semantic Mapping of Road Scenes
The problem of understanding road scenes has been on the fore-front in the computer vision community
for the last couple of years. This enables autonomous systems to navigate and understand
the surroundings in which it operates. It involves reconstructing the scene and estimating the objects
present in it, such as ‘vehicles’, ‘road’, ‘pavements’ and ‘buildings’. This thesis focusses on these
aspects and proposes solutions to address them.
First, we propose a solution to generate a dense semantic map from multiple street-level images.
This map can be imagined as the bird’s eye view of the region with associated semantic labels for
ten’s of kilometres of street level data. We generate the overhead semantic view from street level
images. This is in contrast to existing approaches using satellite/overhead imagery for classification
of urban region, allowing us to produce a detailed semantic map for a large scale urban area. Then
we describe a method to perform large scale dense 3D reconstruction of road scenes with associated
semantic labels. Our method fuses the depth-maps in an online fashion, generated from the
stereo pairs across time into a global 3D volume, in order to accommodate arbitrarily long image
sequences. The object class labels estimated from the street level stereo image sequence are used to
annotate the reconstructed volume. Then we exploit the scene structure in object class labelling by
performing inference over the meshed representation of the scene. By performing labelling over the
mesh we solve two issues: Firstly, images often have redundant information with multiple images
describing the same scene. Solving these images separately is slow, where our method is approximately
a magnitude faster in the inference stage compared to normal inference in the image domain.
Secondly, often multiple images, even though they describe the same scene result in inconsistent
labelling. By solving a single mesh, we remove the inconsistency of labelling across the images.
Also our mesh based labelling takes into account of the object layout in the scene, which is often
ambiguous in the image domain, thereby increasing the accuracy of object labelling. Finally, we perform
labelling and structure computation through a hierarchical robust PN Markov Random Field
defined on voxels and super-voxels given by an octree. This allows us to infer the 3D structure and
the object-class labels in a principled manner, through bounded approximate minimisation of a well
defined and studied energy functional. In this thesis, we also introduce two object labelled datasets
created from real world data. The 15 kilometre Yotta Labelled dataset consists of 8,000 images per
camera view of the roadways of the United Kingdom with a subset of them annotated with object
class labels and the second dataset is comprised of ground truth object labels for the publicly available
KITTI dataset. Both the datasets are available publicly and we hope will be helpful to the vision
research community
Recommended from our members
Real-time spatial modeling to detect and track resources on construction sites
For more than 10 years the U.S. construction industry has experienced over 1,000
fatalities annually. Many fatalities may have been prevented had the individuals and
equipment involved been more aware of and alert to the physical state of the environment
around them. Awareness may be improved by automatic 3D (three-dimensional) sensing
and modeling of the job site environment in real-time. Existing 3D modeling approaches
based on range scanning techniques are capable of modeling static objects only, and thus
cannot model in real-time dynamic objects in an environment comprised of moving
humans, equipment, and materials. Emerging prototype 3D video range cameras offer
another alternative by facilitating affordable, wide field of view, automated static and
dynamic object detection and tracking at frame rates better than 1Hz (real-time).
This dissertation presents an imperical work and methodology to rapidly create a
spatial model of construction sites and in particular to detect, model, and track the position, dimension, direction, and velocity of static and moving project resources in real-time, based on range data obtained from a three-dimensional video range camera in a
static or moving position. Existing construction site 3D modeling approaches based on
optical range sensing technologies (laser scanners, rangefinders, etc.) and 3D modeling
approaches (dense, sparse, etc.) that offered potential solutions for this research are
reviewed. The choice of an emerging sensing tool and preliminary experiments with this
prototype sensing technology are discussed. These findings led to the development of a
range data processing algorithm based on three-dimensional occupancy grids which is
demonstrated in detail. Testing and validation of the proposed algorithms have been
conducted to quantify the performance of sensor and algorithm through extensive
experimentation involving static and moving objects. Experiments in indoor laboratory
and outdoor construction environments have been conducted with construction resources
such as humans, equipment, materials, or structures to verify the accuracy of the
occupancy grid modeling approach. Results show that modeling objects and measuring
their position, dimension, direction, and speed had an accuracy level compatible to the
requirements of active safety features for construction. Results demonstrate that video
rate 3D data acquisition and analysis of construction environments can support effective
detection, tracking, and convex hull modeling of objects. Exploiting rapidly generated
three-dimensional models for improved visualization, communications, and process
control has inherent value, broad application, and potential impact, e.g. as-built vs. as-planned comparison, condition assessment, maintenance, operations, and construction
activities control. In combination with effective management practices, this sensing
approach has the potential to assist equipment operators to avoid incidents that result in
reduce human injury, death, or collateral damage on construction sites.Civil, Architectural, and Environmental Engineerin
Automatic Generation of Labeled 3D Point Clouds of Natural Environments with Gazebo.
https://conferences.ieeeauthorcenter.ieee.org/author-ethics/guidelines-and-policies/post-publication-policies/#preprintProgress in applying supervised learning for nat- ural scene classification is impeded by the lack of appropriate datasets for training. This paper describes the automatic generation of synthetic three-dimensional (3D) scans of natural environments with each point labelled individually with its element class. The developed software employs the robotic simulator Gazebo to obtain range and intensity measurements from a 3D laser rangefinder aboard a ground mobile robot. Precisely, the returned intensity values are used to annotate every 3D point within its corresponding class 100% error free. Several examples are provided to show the utility of the proposed approach
Automatic Reconstruction of Textured 3D Models
Three dimensional modeling and visualization of environments is an increasingly important problem. This work addresses the problem of automatic 3D reconstruction and we present a system for unsupervised reconstruction of textured 3D models in the context of modeling indoor environments. We present solutions to all aspects of the modeling process and an integrated system for the automatic creation of large scale 3D models
Automatic Extrinsic Calibration of Vision and Lidar by Maximizing Mutual Information
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/112212/1/rob21542.pd
- …