60,900 research outputs found
What Can Help Pedestrian Detection?
Aggregating extra features has been considered as an effective approach to
boost traditional pedestrian detection methods. However, there is still a lack
of studies on whether and how CNN-based pedestrian detectors can benefit from
these extra features. The first contribution of this paper is exploring this
issue by aggregating extra features into CNN-based pedestrian detection
framework. Through extensive experiments, we evaluate the effects of different
kinds of extra features quantitatively. Moreover, we propose a novel network
architecture, namely HyperLearner, to jointly learn pedestrian detection as
well as the given extra feature. By multi-task training, HyperLearner is able
to utilize the information of given features and improve detection performance
without extra inputs in inference. The experimental results on multiple
pedestrian benchmarks validate the effectiveness of the proposed HyperLearner.Comment: Accepted to IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR) 201
Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks
Detecting and classifying targets in video streams from surveillance cameras
is a cumbersome, error-prone and expensive task. Often, the incurred costs are
prohibitive for real-time monitoring. This leads to data being stored locally
or transmitted to a central storage site for post-incident examination. The
required communication links and archiving of the video data are still
expensive and this setup excludes preemptive actions to respond to imminent
threats. An effective way to overcome these limitations is to build a smart
camera that transmits alerts when relevant video sequences are detected. Deep
neural networks (DNNs) have come to outperform humans in visual classifications
tasks. The concept of DNNs and Convolutional Networks (ConvNets) can easily be
extended to make use of higher-dimensional input data such as multispectral
data. We explore this opportunity in terms of achievable accuracy and required
computational effort. To analyze the precision of DNNs for scene labeling in an
urban surveillance scenario we have created a dataset with 8 classes obtained
in a field experiment. We combine an RGB camera with a 25-channel VIS-NIR
snapshot sensor to assess the potential of multispectral image data for target
classification. We evaluate several new DNNs, showing that the spectral
information fused together with the RGB frames can be used to improve the
accuracy of the system or to achieve similar accuracy with a 3x smaller
computation effort. We achieve a very high per-pixel accuracy of 99.1%. Even
for scarcely occurring, but particularly interesting classes, such as cars, 75%
of the pixels are labeled correctly with errors occurring only around the
border of the objects. This high accuracy was obtained with a training set of
only 30 labeled images, paving the way for fast adaptation to various
application scenarios.Comment: Presented at SPIE Security + Defence 2016 Proc. SPIE 9997, Target and
Background Signatures I
ModDrop: adaptive multi-modal gesture recognition
We present a method for gesture detection and localisation based on
multi-scale and multi-modal deep learning. Each visual modality captures
spatial information at a particular spatial scale (such as motion of the upper
body or a hand), and the whole system operates at three temporal scales. Key to
our technique is a training strategy which exploits: i) careful initialization
of individual modalities; and ii) gradual fusion involving random dropping of
separate channels (dubbed ModDrop) for learning cross-modality correlations
while preserving uniqueness of each modality-specific representation. We
present experiments on the ChaLearn 2014 Looking at People Challenge gesture
recognition track, in which we placed first out of 17 teams. Fusing multiple
modalities at several spatial and temporal scales leads to a significant
increase in recognition rates, allowing the model to compensate for errors of
the individual classifiers as well as noise in the separate channels.
Futhermore, the proposed ModDrop training technique ensures robustness of the
classifier to missing signals in one or several channels to produce meaningful
predictions from any number of available modalities. In addition, we
demonstrate the applicability of the proposed fusion scheme to modalities of
arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure
Micro protocol engineering for unstructured carriers: On the embedding of steganographic control protocols into audio transmissions
Network steganography conceals the transfer of sensitive information within
unobtrusive data in computer networks. So-called micro protocols are
communication protocols placed within the payload of a network steganographic
transfer. They enrich this transfer with features such as reliability, dynamic
overlay routing, or performance optimization --- just to mention a few. We
present different design approaches for the embedding of hidden channels with
micro protocols in digitized audio signals under consideration of different
requirements. On the basis of experimental results, our design approaches are
compared, and introduced into a protocol engineering approach for micro
protocols.Comment: 20 pages, 7 figures, 4 table
- …