27,434 research outputs found
Simultaneous Feature Learning and Hash Coding with Deep Neural Networks
Similarity-preserving hashing is a widely-used method for nearest neighbour
search in large-scale image retrieval tasks. For most existing hashing methods,
an image is first encoded as a vector of hand-engineering visual features,
followed by another separate projection or quantization step that generates
binary codes. However, such visual feature vectors may not be optimally
compatible with the coding process, thus producing sub-optimal hashing codes.
In this paper, we propose a deep architecture for supervised hashing, in which
images are mapped into binary codes via carefully designed deep neural
networks. The pipeline of the proposed deep architecture consists of three
building blocks: 1) a sub-network with a stack of convolution layers to produce
the effective intermediate image features; 2) a divide-and-encode module to
divide the intermediate image features into multiple branches, each encoded
into one hash bit; and 3) a triplet ranking loss designed to characterize that
one image is more similar to the second image than to the third one. Extensive
evaluations on several benchmark image datasets show that the proposed
simultaneous feature learning and hash coding pipeline brings substantial
improvements over other state-of-the-art supervised or unsupervised hashing
methods.Comment: This paper has been accepted to IEEE International Conference on
Pattern Recognition and Computer Vision (CVPR), 201
Learning Spatiotemporal Features for Infrared Action Recognition with 3D Convolutional Neural Networks
Infrared (IR) imaging has the potential to enable more robust action
recognition systems compared to visible spectrum cameras due to lower
sensitivity to lighting conditions and appearance variability. While the action
recognition task on videos collected from visible spectrum imaging has received
much attention, action recognition in IR videos is significantly less explored.
Our objective is to exploit imaging data in this modality for the action
recognition task. In this work, we propose a novel two-stream 3D convolutional
neural network (CNN) architecture by introducing the discriminative code layer
and the corresponding discriminative code loss function. The proposed network
processes IR image and the IR-based optical flow field sequences. We pretrain
the 3D CNN model on the visible spectrum Sports-1M action dataset and finetune
it on the Infrared Action Recognition (InfAR) dataset. To our best knowledge,
this is the first application of the 3D CNN to action recognition in the IR
domain. We conduct an elaborate analysis of different fusion schemes (weighted
average, single and double-layer neural nets) applied to different 3D CNN
outputs. Experimental results demonstrate that our approach can achieve
state-of-the-art average precision (AP) performances on the InfAR dataset: (1)
the proposed two-stream 3D CNN achieves the best reported 77.5% AP, and (2) our
3D CNN model applied to the optical flow fields achieves the best reported
single stream 75.42% AP
A parallel Viterbi decoder for block cyclic and convolution codes
We present a parallel version of Viterbi's decoding procedure, for which we are able to demonstrate that the resultant task graph has restricted complexity in that the number of communications to or from any processor cannot exceed 4 for BCH codes. The resulting algorithm works in lock step making it suitable for implementation on a systolic processor array, which we have implemented on a field programmable gate array and demonstrate the perfect scaling of the algorithm for two exemplar BCH codes. The parallelisation strategy is applicable to all cyclic codes and convolution codes. We also present a novel method for generating the state transition diagrams for these codes
- …