522 research outputs found
Semantic Mapping of Road Scenes
The problem of understanding road scenes has been on the fore-front in the computer vision community
for the last couple of years. This enables autonomous systems to navigate and understand
the surroundings in which it operates. It involves reconstructing the scene and estimating the objects
present in it, such as âvehiclesâ, âroadâ, âpavementsâ and âbuildingsâ. This thesis focusses on these
aspects and proposes solutions to address them.
First, we propose a solution to generate a dense semantic map from multiple street-level images.
This map can be imagined as the birdâs eye view of the region with associated semantic labels for
tenâs of kilometres of street level data. We generate the overhead semantic view from street level
images. This is in contrast to existing approaches using satellite/overhead imagery for classification
of urban region, allowing us to produce a detailed semantic map for a large scale urban area. Then
we describe a method to perform large scale dense 3D reconstruction of road scenes with associated
semantic labels. Our method fuses the depth-maps in an online fashion, generated from the
stereo pairs across time into a global 3D volume, in order to accommodate arbitrarily long image
sequences. The object class labels estimated from the street level stereo image sequence are used to
annotate the reconstructed volume. Then we exploit the scene structure in object class labelling by
performing inference over the meshed representation of the scene. By performing labelling over the
mesh we solve two issues: Firstly, images often have redundant information with multiple images
describing the same scene. Solving these images separately is slow, where our method is approximately
a magnitude faster in the inference stage compared to normal inference in the image domain.
Secondly, often multiple images, even though they describe the same scene result in inconsistent
labelling. By solving a single mesh, we remove the inconsistency of labelling across the images.
Also our mesh based labelling takes into account of the object layout in the scene, which is often
ambiguous in the image domain, thereby increasing the accuracy of object labelling. Finally, we perform
labelling and structure computation through a hierarchical robust PN Markov Random Field
defined on voxels and super-voxels given by an octree. This allows us to infer the 3D structure and
the object-class labels in a principled manner, through bounded approximate minimisation of a well
defined and studied energy functional. In this thesis, we also introduce two object labelled datasets
created from real world data. The 15 kilometre Yotta Labelled dataset consists of 8,000 images per
camera view of the roadways of the United Kingdom with a subset of them annotated with object
class labels and the second dataset is comprised of ground truth object labels for the publicly available
KITTI dataset. Both the datasets are available publicly and we hope will be helpful to the vision
research community
Object-Aware Tracking and Mapping
Reasoning about geometric properties of digital cameras and optical physics enabled
researchers to build methods that localise cameras in 3D space from a video
stream, while â often simultaneously â constructing a model of the environment.
Related techniques have evolved substantially since the 1980s, leading to increasingly
accurate estimations. Traditionally, however, the quality of results is strongly
affected by the presence of moving objects, incomplete data, or difficult surfaces
â i.e. surfaces that are not Lambertian or lack texture. One insight of this work is
that these problems can be addressed by going beyond geometrical and optical constraints,
in favour of object level and semantic constraints. Incorporating specific
types of prior knowledge in the inference process, such as motion or shape priors,
leads to approaches with distinct advantages and disadvantages.
After introducing relevant concepts in Chapter 1 and Chapter 2, methods for building
object-centric maps in dynamic environments using motion priors are investigated
in Chapter 5. Chapter 6 addresses the same problem as Chapter 5, but presents
an approach which relies on semantic priors rather than motion cues. To fully exploit
semantic information, Chapter 7 discusses the conditioning of shape representations
on prior knowledge and the practical application to monocular, object-aware
reconstruction systems
Voxel-Based Indoor Reconstruction From HoloLens Triangle Meshes
Current mobile augmented reality devices are often equipped with range
sensors. The Microsoft HoloLens for instance is equipped with a Time-Of-Flight
(ToF) range camera providing coarse triangle meshes that can be used in custom
applications. We suggest to use the triangle meshes for the automatic
generation of indoor models that can serve as basis for augmenting their
physical counterpart with location-dependent information. In this paper, we
present a novel voxel-based approach for automated indoor reconstruction from
unstructured three-dimensional geometries like triangle meshes. After an
initial voxelization of the input data, rooms are detected in the resulting
voxel grid by segmenting connected voxel components of ceiling candidates and
extruding them downwards to find floor candidates. Semantic class labels like
'Wall', 'Wall Opening', 'Interior Object' and 'Empty Interior' are then
assigned to the room voxels in-between ceiling and floor by a rule-based voxel
sweep algorithm. Finally, the geometry of the detected walls and their openings
is refined in voxel representation. The proposed approach is not restricted to
Manhattan World scenarios and does not rely on room surfaces being planar.Comment: 8 pages, 4 figure
Statistical Modelling and Inference in Image Analysis
The aim of the thesis is to investigate classes of model-based approaches to statistical image analysis. We explored the properties of models and examined the problem of parameter estimation from the original image data and, in particular, from noisy versions of the the scene. We concentrated on Markov random field (MRF) models, Markov mesh random field (MMRF) models and Multi-dimensional Markov chain (MDMC) models. In Chapter 2, for the one-dimensional version of Markov random fields, we developed a recursive technique which enables us to achieve maximum likelihood estimation for the underlying parameter and to carry out the EM algorithm for parameter estimation when only noisy data are available. This technique also enables us, in just a single pass, to generate a sample from a one-dimensional Markov random field. Although, unfortunately, this technique cannot be extended to two- or multi-dimensional models, it was applied to many cases in this thesis. Since, for two-dimensional Markov random fields, the density of each row (column), conditionally on all other rows (columns) is of the form of a one-dimensional Markov random field, and since the distribution of the original image, conditionally on the noisy version of data, is still a Markov random field, the technique can be used on different forms of conditional density of one row (column). In Chapter 3, therefore, we developed the line-relaxation method for simulating MRFs and maximum line pseudo-likelihood estimation of parameter(s), and in Chapter 5, we developed a simultaneous procedure of parameter estimation and restoration, in which line pseudo-likelihood and a modified EM algorithm were used. The first part of Chapter 3 and Chapter 4 concentrate on inference for two-dimensional MRFs. We obtained a matrix expression for partition functins for general models, and a more explicit form for a multi-colour Ising model, and thus located the positions of critical points of this multi-colour model. We examined the asymptotic properties of an asymmetric, two-colour Ising model. For general models, in Chapter 4, we explored asymptotic properties under an "independence" or a "near independence" condition, and then developed the approach of maximum approximate-likelihood estimation. For three-dimensional MMRF models, in chapter 6, a generalization of Devijver's F-G-H algorithm is developed for restoration. In Chapter 7, the recursive technique was again used to introduce MDMC models, which form a natural extension of a Markov chain. By suitable choice of model parameters, textures can be generated that are similar to those simulated from MRFs, but the simulation procedure is computationally much more economical. The recursive technique also enables us to maximize the likelihood function of the model. For all three sorts of prior random field models considered in this thesis, we developed a simultaneous procedure for parameter estimation and image restoration, when only noisy data are available. The currently restored image was used, together with noisy data, in modified versions of the EM algorithm. In simulation studies, quite good results were obtained, in terms of estimation of parameters in both the original model and, particularly, in the noise model, and in terms of restoration
A review on deep learning techniques for 3D sensed data classification
Over the past decade deep learning has driven progress in 2D image
understanding. Despite these advancements, techniques for automatic 3D sensed
data understanding, such as point clouds, is comparatively immature. However,
with a range of important applications from indoor robotics navigation to
national scale remote sensing there is a high demand for algorithms that can
learn to automatically understand and classify 3D sensed data. In this paper we
review the current state-of-the-art deep learning architectures for processing
unstructured Euclidean data. We begin by addressing the background concepts and
traditional methodologies. We review the current main approaches including;
RGB-D, multi-view, volumetric and fully end-to-end architecture designs.
Datasets for each category are documented and explained. Finally, we give a
detailed discussion about the future of deep learning for 3D sensed data, using
literature to justify the areas where future research would be most valuable.Comment: 25 pages, 9 figures. Review pape
Recommended from our members
Image based human body rendering via regression & MRF energy minimization
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.A machine learning method for synthesising human images is explored to create new images without relying on 3D modelling. Machine learning allows the creation of new images through prediction from existing data based on the use of training images. In the present study, image synthesis is performed at two levels: contour and pixel. A class of learning-based methods is formulated to create object contours from the training image for the synthetic image that allow pixel synthesis within the contours in the second level. The methods rely on applying robust object descriptions, dynamic learning models after appropriate motion segmentation, and machine learning-based frameworks.
Image-based human image synthesis using machine learning is a research focus that has recently gained considerable attention in the field of computer graphics. It makes use of techniques from image/motion analysis in computer vision. The problem lies in the estimation of methods for image-based object configuration (i.e. segmentation, contour outline). Using the results of these analysis methods as bases, the research adopts the machine learning approach, in which human images are synthesised by executing the synthesis of contour and pixels through the learning from training image.
Firstly, thesis shows how an accurate silhouette is distilled using developed background subtraction for accuracy and efficiency. The traditional vector machine approach is used to avoid ambiguities within the regression process. Images can be represented as a class of accurate and efficient vectors for single images as well as sequences. Secondly, the framework is explored using a unique view of machine learning methods, i.e., support vector regression (SVR), to obtain the convergence result of vectors for contour allocation. The changing relationship between the synthetic image and the training image is expressed as a vector and represented in functions. Finally, a pixel synthesis is performed based on belief propagation.
This thesis proposes a novel image-based rendering method for colour image synthesis using SVR and belief propagation for generalisation to enable the prediction of contour and colour information from input colour images. The methods rely on using appropriately defined and robust input colour images, optimising the input contour images within a sparse SVR framework. Firstly, the thesis shows how contour can effectively and efficiently be predicted from small numbers of input contour images. In addition, the thesis exploits the sparse properties of SVR efficiency, and makes use of SVR to estimate regression function. The image-based rendering method employed in this study enables contour synthesis for the prediction of small numbers of input source images. This procedure avoids the use of complex models and geometry information. Secondly, the method used for human body contour colouring is extended to define eight differently connected pixels, and construct a link distance field via the belief propagation method. The link distance, which acts as the message in propagation, is transformed by improving the low-envelope method in fast distance transform. Finally, the methodology is tested by considering human facial and human body clothing information. The accuracy of the test results for the human body model confirms the efficiency of the proposed method
- âŚ