1,219 research outputs found
Visual Analysis Algorithms for Embedded Systems
Visual search systems are very popular applications, but on-line versions in 3G wireless environments suffer from network constraint like unstable or limited bandwidth that entail latency in query delivery, significantly degenerating the user’s experience. An alternative is to exploit the ability of the newest mobile devices to perform heterogeneous activities, like not only creating but also processing images. Visual feature extraction and compression can be performed on on-board Graphical Processing Units (GPUs), making smartphones capable of detecting a generic object (matching) in an exact way or of performing a classification activity.
The latest trends in visual search have resulted in dedicated efforts in MPEG standardization, namely the MPEG CDVS (Compact Descriptor for Visual Search) standard. CDVS is an ISO/IEC standard used to extract a compressed descriptor.
As regards to classification, in recent years neural networks have acquired an impressive importance and have been applied to several domains. This thesis focuses on the use of Deep Neural networks to classify images by means of Deep learning.
Implementing visual search algorithms and deep learning-based classification on embedded environments is not a mere code-porting activity. Recent embedded devices are equipped with a powerful but limited number of resources, like development boards such as GPGPUs. GPU architectures fit particularly well, because they allow to execute more operations in parallel, following the SIMD (Single Instruction Multiple Data) paradigm. Nonetheless, it is necessary to make good design choices for the best use of available hardware and memory.
For visual search, following the MPEG CDVS standard, the contribution of this thesis is an efficient feature computation phase, a parallel CDVS detector, completely implemented on embedded devices supporting the OpenCL framework. Algorithmic choices and implementation details to target the intrinsic characteristics of the selected embedded platforms are presented and discussed. Experimental results on several GPUs show that the GPU-based solution is up to 7× faster than the
CPU-based one. This speed-up opens new visual search scenarios exploiting entire real-time on-board computations with no data transfer.
As regards to the use of Deep convolutional neural networks for off-line image classification, their computational and memory requirements are huge, and this is an issue on embedded devices. Most of the complexity derives from the convolutional layers and in particular from the matrix multiplications they entail. The contribution of this thesis is a self-contained implementation to image classification providing common layers used in neural networks. The approach relies on a heterogeneous CPU-GPU scheme for performing convolutions in the transform domain. Experimental results show that the heterogeneous scheme described in this thesis boasts a 50× speedup over the CPU-only reference and outperforms a GPU-based reference by 2×, while slashing the power consumption by nearly 30%
A deep learning pipeline for product recognition on store shelves
Recognition of grocery products in store shelves poses peculiar challenges.
Firstly, the task mandates the recognition of an extremely high number of
different items, in the order of several thousands for medium-small shops, with
many of them featuring small inter and intra class variability. Then, available
product databases usually include just one or a few studio-quality images per
product (referred to herein as reference images), whilst at test time
recognition is performed on pictures displaying a portion of a shelf containing
several products and taken in the store by cheap cameras (referred to as query
images). Moreover, as the items on sale in a store as well as their appearance
change frequently over time, a practical recognition system should handle
seamlessly new products/packages. Inspired by recent advances in object
detection and image retrieval, we propose to leverage on state of the art
object detectors based on deep learning to obtain an initial productagnostic
item detection. Then, we pursue product recognition through a similarity search
between global descriptors computed on reference and cropped query images. To
maximize performance, we learn an ad-hoc global descriptor by a CNN trained on
reference images based on an image embedding loss. Our system is
computationally expensive at training time but can perform recognition rapidly
and accurately at test time
Efficient On-the-fly Category Retrieval using ConvNets and GPUs
We investigate the gains in precision and speed, that can be obtained by
using Convolutional Networks (ConvNets) for on-the-fly retrieval - where
classifiers are learnt at run time for a textual query from downloaded images,
and used to rank large image or video datasets.
We make three contributions: (i) we present an evaluation of state-of-the-art
image representations for object category retrieval over standard benchmark
datasets containing 1M+ images; (ii) we show that ConvNets can be used to
obtain features which are incredibly performant, and yet much lower dimensional
than previous state-of-the-art image representations, and that their
dimensionality can be reduced further without loss in performance by
compression using product quantization or binarization. Consequently, features
with the state-of-the-art performance on large-scale datasets of millions of
images can fit in the memory of even a commodity GPU card; (iii) we show that
an SVM classifier can be learnt within a ConvNet framework on a GPU in parallel
with downloading the new training images, allowing for a continuous refinement
of the model as more images become available, and simultaneous training and
ranking. The outcome is an on-the-fly system that significantly outperforms its
predecessors in terms of: precision of retrieval, memory requirements, and
speed, facilitating accurate on-the-fly learning and ranking in under a second
on a single GPU.Comment: Published in proceedings of ACCV 201
Guiding Attention in Controlled Real-World Environments
The ability to direct a viewer\u27s attention has important applications in computer graphics, data visualization, image analysis, and training. Existing computer-based gaze manipulation techniques, which direct a viewer\u27s attention about a display, have been shown to be effective for spatial learning, search task completion, and medical training applications. This work extends the concept of gaze manipulation beyond digital imagery to include controlled, real-world environments. This work addresses the main challenges in guiding attention to real-world objects: determining what object the viewer is currently paying attention to, and providing (projecting) a visual cue on a different part of the scene in order to draw the viewer\u27s attention there. The developed system consists of a pair of eye-tracking glasses to determine the viewer\u27s gaze location, and a projector to create the visual cue in the physical environment. The results of a user study show that the system is effective for directing a viewer\u27s gaze in the real-world. The successful implementation has applicability in a wide range of instructional environments, including pilot training and driving simulators
- …