8,651 research outputs found
Particular object retrieval with integral max-pooling of CNN activations
Recently, image representation built upon Convolutional Neural Network (CNN)
has been shown to provide effective descriptors for image search, outperforming
pre-CNN features as short-vector representations. Yet such models are not
compatible with geometry-aware re-ranking methods and still outperformed, on
some particular object retrieval benchmarks, by traditional image search
systems relying on precise descriptor matching, geometric re-ranking, or query
expansion. This work revisits both retrieval stages, namely initial search and
re-ranking, by employing the same primitive information derived from the CNN.
We build compact feature vectors that encode several image regions without the
need to feed multiple inputs to the network. Furthermore, we extend integral
images to handle max-pooling on convolutional layer activations, allowing us to
efficiently localize matching objects. The resulting bounding box is finally
used for image re-ranking. As a result, this paper significantly improves
existing CNN-based recognition pipeline: We report for the first time results
competing with traditional methods on the challenging Oxford5k and Paris6k
datasets
On Robust Face Recognition via Sparse Encoding: the Good, the Bad, and the Ugly
In the field of face recognition, Sparse Representation (SR) has received
considerable attention during the past few years. Most of the relevant
literature focuses on holistic descriptors in closed-set identification
applications. The underlying assumption in SR-based methods is that each class
in the gallery has sufficient samples and the query lies on the subspace
spanned by the gallery of the same class. Unfortunately, such assumption is
easily violated in the more challenging face verification scenario, where an
algorithm is required to determine if two faces (where one or both have not
been seen before) belong to the same person. In this paper, we first discuss
why previous attempts with SR might not be applicable to verification problems.
We then propose an alternative approach to face verification via SR.
Specifically, we propose to use explicit SR encoding on local image patches
rather than the entire face. The obtained sparse signals are pooled via
averaging to form multiple region descriptors, which are then concatenated to
form an overall face descriptor. Due to the deliberate loss spatial relations
within each region (caused by averaging), the resulting descriptor is robust to
misalignment & various image deformations. Within the proposed framework, we
evaluate several SR encoding techniques: l1-minimisation, Sparse Autoencoder
Neural Network (SANN), and an implicit probabilistic technique based on
Gaussian Mixture Models. Thorough experiments on AR, FERET, exYaleB, BANCA and
ChokePoint datasets show that the proposed local SR approach obtains
considerably better and more robust performance than several previous
state-of-the-art holistic SR methods, in both verification and closed-set
identification problems. The experiments also show that l1-minimisation based
encoding has a considerably higher computational than the other techniques, but
leads to higher recognition rates
Smart environment monitoring through micro unmanned aerial vehicles
In recent years, the improvements of small-scale Unmanned Aerial Vehicles (UAVs) in terms of flight time, automatic control, and remote transmission are promoting the development of a wide range of practical applications. In aerial video surveillance, the monitoring of broad areas still has many challenges due to the achievement of different tasks in real-time, including mosaicking, change detection, and object detection. In this thesis work, a small-scale UAV based vision system to maintain regular surveillance over target areas is proposed. The system works in two modes. The first mode allows to monitor an area of interest by performing several flights. During the first flight, it creates an incremental geo-referenced mosaic of an area of interest and classifies all the known elements (e.g., persons) found on the ground by an improved Faster R-CNN architecture previously trained. In subsequent reconnaissance flights, the system searches for any changes (e.g., disappearance of persons) that may occur in the mosaic by a histogram equalization and RGB-Local Binary Pattern (RGB-LBP) based algorithm. If present, the mosaic is updated. The second mode, allows to perform a real-time classification by using, again, our improved Faster R-CNN model, useful for time-critical operations. Thanks to different design features, the system works in real-time and performs mosaicking and change detection tasks at low-altitude, thus allowing the classification even of small objects. The proposed system was tested by using the whole set of challenging video sequences contained in the UAV Mosaicking and Change Detection (UMCD) dataset and other public datasets. The evaluation of the system by well-known performance metrics has shown remarkable results in terms of mosaic creation and updating, as well as in terms of change detection and object detection
- …