901 research outputs found
Structural Restricted Boltzmann Machine for image denoising and classification
Restricted Boltzmann Machines are generative models that consist of a layer
of hidden variables connected to another layer of visible units, and they are
used to model the distribution over visible variables. In order to gain a
higher representability power, many hidden units are commonly used, which, in
combination with a large number of visible units, leads to a high number of
trainable parameters. In this work we introduce the Structural Restricted
Boltzmann Machine model, which taking advantage of the structure of the data in
hand, constrains connections of hidden units to subsets of visible units in
order to reduce significantly the number of trainable parameters, without
compromising performance. As a possible area of application, we focus on image
modelling. Based on the nature of the images, the structure of the connections
is given in terms of spatial neighbourhoods over the pixels of the image that
constitute the visible variables of the model. We conduct extensive experiments
on various image domains. Image denoising is evaluated with corrupted images
from the MNIST dataset. The generative power of our models is compared to
vanilla RBMs, as well as their classification performance, which is assessed
with five different image domains. Results show that our proposed model has a
faster and more stable training, while also obtaining better results compared
to an RBM with no constrained connections between its visible and hidden units
Vehicle-Rear: A New Dataset to Explore Feature Fusion for Vehicle Identification Using Convolutional Neural Networks
This work addresses the problem of vehicle identification through
non-overlapping cameras. As our main contribution, we introduce a novel dataset
for vehicle identification, called Vehicle-Rear, that contains more than three
hours of high-resolution videos, with accurate information about the make,
model, color and year of nearly 3,000 vehicles, in addition to the position and
identification of their license plates. To explore our dataset we design a
two-stream CNN that simultaneously uses two of the most distinctive and
persistent features available: the vehicle's appearance and its license plate.
This is an attempt to tackle a major problem: false alarms caused by vehicles
with similar designs or by very close license plate identifiers. In the first
network stream, shape similarities are identified by a Siamese CNN that uses a
pair of low-resolution vehicle patches recorded by two different cameras. In
the second stream, we use a CNN for OCR to extract textual information,
confidence scores, and string similarities from a pair of high-resolution
license plate patches. Then, features from both streams are merged by a
sequence of fully connected layers for decision. In our experiments, we
compared the two-stream network against several well-known CNN architectures
using single or multiple vehicle features. The architectures, trained models,
and dataset are publicly available at https://github.com/icarofua/vehicle-rear
- …