3,201 research outputs found
Depth from Monocular Images using a Semi-Parallel Deep Neural Network (SPDNN) Hybrid Architecture
Deep neural networks are applied to a wide range of problems in recent years.
In this work, Convolutional Neural Network (CNN) is applied to the problem of
determining the depth from a single camera image (monocular depth). Eight
different networks are designed to perform depth estimation, each of them
suitable for a feature level. Networks with different pooling sizes determine
different feature levels. After designing a set of networks, these models may
be combined into a single network topology using graph optimization techniques.
This "Semi Parallel Deep Neural Network (SPDNN)" eliminates duplicated common
network layers, and can be further optimized by retraining to achieve an
improved model compared to the individual topologies. In this study, four SPDNN
models are trained and have been evaluated at 2 stages on the KITTI dataset.
The ground truth images in the first part of the experiment are provided by the
benchmark, and for the second part, the ground truth images are the depth map
results from applying a state-of-the-art stereo matching method. The results of
this evaluation demonstrate that using post-processing techniques to refine the
target of the network increases the accuracy of depth estimation on individual
mono images. The second evaluation shows that using segmentation data alongside
the original data as the input can improve the depth estimation results to a
point where performance is comparable with stereo depth estimation. The
computational time is also discussed in this study.Comment: 44 pages, 25 figure
Learning single-image 3D reconstruction by generative modelling of shape, pose and shading
We present a unified framework tackling two problems: class-specific 3D
reconstruction from a single image, and generation of new 3D shape samples.
These tasks have received considerable attention recently; however, most
existing approaches rely on 3D supervision, annotation of 2D images with
keypoints or poses, and/or training with multiple views of each object
instance. Our framework is very general: it can be trained in similar settings
to existing approaches, while also supporting weaker supervision. Importantly,
it can be trained purely from 2D images, without pose annotations, and with
only a single view per instance. We employ meshes as an output representation,
instead of voxels used in most prior work. This allows us to reason over
lighting parameters and exploit shading information during training, which
previous 2D-supervised methods cannot. Thus, our method can learn to generate
and reconstruct concave object classes. We evaluate our approach in various
settings, showing that: (i) it learns to disentangle shape from pose and
lighting; (ii) using shading in the loss improves performance compared to just
silhouettes; (iii) when using a standard single white light, our model
outperforms state-of-the-art 2D-supervised methods, both with and without pose
supervision, thanks to exploiting shading cues; (iv) performance improves
further when using multiple coloured lights, even approaching that of
state-of-the-art 3D-supervised methods; (v) shapes produced by our model
capture smooth surfaces and fine details better than voxel-based approaches;
and (vi) our approach supports concave classes such as bathtubs and sofas,
which methods based on silhouettes cannot learn.Comment: Extension of arXiv:1807.09259, accepted to IJCV. Differentiable
renderer available at https://github.com/pmh47/dir
Vehicle-Rear: A New Dataset to Explore Feature Fusion for Vehicle Identification Using Convolutional Neural Networks
This work addresses the problem of vehicle identification through
non-overlapping cameras. As our main contribution, we introduce a novel dataset
for vehicle identification, called Vehicle-Rear, that contains more than three
hours of high-resolution videos, with accurate information about the make,
model, color and year of nearly 3,000 vehicles, in addition to the position and
identification of their license plates. To explore our dataset we design a
two-stream CNN that simultaneously uses two of the most distinctive and
persistent features available: the vehicle's appearance and its license plate.
This is an attempt to tackle a major problem: false alarms caused by vehicles
with similar designs or by very close license plate identifiers. In the first
network stream, shape similarities are identified by a Siamese CNN that uses a
pair of low-resolution vehicle patches recorded by two different cameras. In
the second stream, we use a CNN for OCR to extract textual information,
confidence scores, and string similarities from a pair of high-resolution
license plate patches. Then, features from both streams are merged by a
sequence of fully connected layers for decision. In our experiments, we
compared the two-stream network against several well-known CNN architectures
using single or multiple vehicle features. The architectures, trained models,
and dataset are publicly available at https://github.com/icarofua/vehicle-rear
- …