1,529 research outputs found
Bamboo: A fast descriptor based on AsymMetric pairwise BOOsting
A robust hash, or content-based fingerprint, is a succinct representation of the perceptually most relevant parts of a multimedia object. A key requirement of fingerprinting is that elements with perceptually similar content should map to the same fingerprint, even if their bit-level representations are different. In this work we propose BAMBOO (Binary descriptor based on AsymMetric pairwise BOOsting), a binary local descriptor that exploits a combination of content-based fingerprinting techniques and computationally efficient filters (box filters, Haar-like features, etc.) applied to image patches. In particular, we define a possibly large set of filters and iteratively select the most discriminative ones resorting to an asymmetric pair-wise boosting technique. The output values of the filtering process are quantized to one bit, leading to a very compact binary descriptor. Results show that such descriptor leads to compelling results, significantly outperforming binary descriptors having comparable complexity (e.g., BRISK), and approaching the discriminative power of state-of-the-art descriptors which are significantly more complex (e.g., SIFT and BinBoost)
A visual sensor network for object recognition: Testbed realization
This work describes the implementation of an object recognition service on top of energy and resource-constrained hardware. A complete pipeline for object recognition based on the BRISK visual features is implemented on Intel Imote2 sensor devices. The reference implementation is used to assess the performance of the object recognition pipeline in terms of processing time and recognition accuracy
Coding binary local features extracted from video sequences
Local features represent a powerful tool which is exploited in several applications such as visual search, object recognition and tracking, etc. In this context, binary descriptors provide an efficient alternative to real-valued descriptors, due to low computational complexity, limited memory footprint and fast matching algorithms. The descriptor consists of a binary vector, in which each bit is the result of a pairwise comparison between smoothed pixel intensities. In several cases, visual features need to be transmitted over a bandwidth-limited network. To this end, it is useful to compress the descriptor to reduce the required rate, while attaining a target accuracy for the task at hand. The past literature thoroughly addressed the problem of coding visual features extracted from still images and, only very recently, the problem of coding real-valued features (e.g., SIFT, SURF) extracted from video sequences. In this paper we propose a coding architecture specifically designed for binary local features extracted from video content. We exploit both spatial and temporal redundancy by means of intra-frame and inter-frame coding modes, showing that significant coding gains can be attained for a target level of accuracy of the visual analysis task
Energy consumption of visual sensor networks: impact of spatio-temporal coverage
Wireless visual sensor networks (VSNs) are expected to play a major role in future IEEE 802.15.4 personal area networks (PANs) under recently established collision-free medium access control (MAC) protocols, such as the IEEE 802.15.4e-2012 MAC. In such environments, the VSN energy consumption is affected by a number of camera sensors deployed (spatial coverage), as well as a number of captured video frames of which each node processes and transmits data (temporal coverage). In this paper we explore this aspect for uniformly formed VSNs, that is, networks comprising identical wireless visual sensor nodes connected to a collection node via a balanced cluster-tree topology, with each node producing independent identically distributed bitstream sizes after processing the video frames captured within each network activation interval. We derive analytic results for the energy-optimal spatiooral coverage parameters of such VSNs under a priori known bounds for the number of frames to process per sensor and the number of nodes to deploy within each tier of the VSN. Our results are parametric to the probability density function characterizing the bitstream size produced by each node and the energy consumption rates of the system of interest. Experimental results are derived from a deployment of TelosB motes and reveal that our analytic results are always within 7%of the energy consumption measurements for a wide range of settings. In addition, results obtained via motion JPEG encoding and feature extraction on a multimedia subsystem (BeagleBone Linux Computer) show that the optimal spatiooral settings derived by our framework allow for substantial reduction of energy consumption in comparison with ad hoc settings
Rate-energy-accuracy optimization of convolutional architectures for face recognition
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Face recognition systems based on Convolutional Neural Networks (CNNs) or convolutional architectures currently represent the state of the art, achieving an accuracy comparable to that of humans. Nonetheless, there are two issues that might hinder their adoption on distributed battery-operated devices (e.g., visual sensor nodes, smartphones, and wearable devices). First, convolutional architectures are usually computationally demanding, especially when the depth of the network is increased to maximize accuracy. Second, transmitting the output features produced by a CNN might require a bitrate higher than the one needed for coding the input image. Therefore, in this paper we address the problem of optimizing the energy-rate-accuracy characteristics of a convolutional architecture for face recognition. We carefully profile a CNN implementation on a Raspberry Pi device and optimize the structure of the neural network, achieving a 17-fold speedup without significantly affecting recognition accuracy. Moreover, we propose a coding architecture custom-tailored to features extracted by such model. (C) 2015 Elsevier Inc. All rights reserved.Face recognition systems based on Convolutional Neural Networks (CNNs) or convolutional architectures currently represent the state of the art, achieving an accuracy comparable to that of humans. Nonetheless, there are two issues that might hinder their a36142148CNPQ - CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICOCAPES - COORDENAÇÃO DE APERFEIÇOAMENTO DE PESSOAL DE NÍVEL SUPERIORFAPESP - FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULOConselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)sem informação2013/11359-0sem informaçã
Compress-then-analyze vs. analyze-then-compress: Two paradigms for image analysis in visual sensor networks
We compare two paradigms for image analysis in vi- sual sensor networks (VSN). In the compress-then-analyze (CTA) paradigm, images acquired from camera nodes are compressed and sent to a central controller for further analysis. Conversely, in the analyze-then-compress (ATC) approach, camera nodes perform visual feature extraction and transmit a compressed version of these features to a central controller. We focus on state-of-the-art binary features which are particularly suitable for resource-constrained VSNs, and we show that the ”winning” paradigm depends primarily on the network conditions. Indeed, while the ATC approach might be the only possible way to perform analysis at low available bitrates, the CTA approach reaches the best results when the available bandwidth enables the transmission of high-quality images
Search for anomalies in the {\nu}e appearance from a {\nu}{\mu} beam
We report an updated result from the ICARUS experiment on the search for
{\nu}{\mu} ->{\nu}e anomalies with the CNGS beam, produced at CERN with an
average energy of 20 GeV and travelling 730 km to the Gran Sasso Laboratory.
The present analysis is based on a total sample of 1995 events of CNGS neutrino
interactions, which corresponds to an almost doubled sample with respect to the
previously published result. Four clear {\nu}e events have been visually
identified over the full sample, compared with an expectation of 6.4 +- 0.9
events from conventional sources. The result is compatible with the absence of
additional anomalous contributions. At 90% and 99% confidence levels the limits
to possible oscillated events are 3.7 and 8.3 respectively. The corresponding
limit to oscillation probability becomes consequently 3.4 x 10-3 and 7.6 x 10-3
respectively. The present result confirms, with an improved sensitivity, the
early result already published by the ICARUS collaboration
Precise 3D track reconstruction algorithm for the ICARUS T600 liquid argon time projection chamber detector
Liquid Argon Time Projection Chamber (LAr TPC) detectors offer charged
particle imaging capability with remarkable spatial resolution. Precise event
reconstruction procedures are critical in order to fully exploit the potential
of this technology. In this paper we present a new, general approach of
three-dimensional reconstruction for the LAr TPC with a practical application
to track reconstruction. The efficiency of the method is evaluated on a sample
of simulated tracks. We present also the application of the method to the
analysis of real data tracks collected during the ICARUS T600 detector
operation with the CNGS neutrino beam.Comment: Submitted to Advances in High Energy Physic
- …
