2 research outputs found
Learning Multilayer Channel Features for Pedestrian Detection
Pedestrian detection based on the combination of Convolutional Neural Network
(i.e., CNN) and traditional handcrafted features (i.e., HOG+LUV) has achieved
great success. Generally, HOG+LUV are used to generate the candidate proposals
and then CNN classifies these proposals. Despite its success, there is still
room for improvement. For example, CNN classifies these proposals by the
full-connected layer features while proposal scores and the features in the
inner-layers of CNN are ignored. In this paper, we propose a unifying framework
called Multilayer Channel Features (MCF) to overcome the drawback. It firstly
integrates HOG+LUV with each layer of CNN into a multi-layer image channels.
Based on the multi-layer image channels, a multi-stage cascade AdaBoost is then
learned. The weak classifiers in each stage of the multi-stage cascade is
learned from the image channels of corresponding layer. With more abundant
features, MCF achieves the state-of-the-art on Caltech pedestrian dataset
(i.e., 10.40% miss rate). Using new and accurate annotations, MCF achieves
7.98% miss rate. As many non-pedestrian detection windows can be quickly
rejected by the first few stages, it accelerates detection speed by 1.43 times.
By eliminating the highly overlapped detection windows with lower scores after
the first stage, it's 4.07 times faster with negligible performance loss
Cascaded Subpatch Networks for Effective CNNs
Conventional Convolutional Neural Networks (CNNs) use either a linear or
non-linear filter to extract features from an image patch (region) of spatial
size (Typically, is small and is equal to , e.g.,
is 5 or 7). Generally, the size of the filter is equal to the size of the input patch. We argue that the representation ability of equal-size
strategy is not strong enough. To overcome the drawback, we propose to use
subpatch filter whose spatial size is smaller than .
The proposed subpatch filter consists of two subsequent filters. The first one
is a linear filter of spatial size and is aimed at extracting
features from spatial domain. The second one is of spatial size
and is used for strengthening the connection between different input feature
channels and for reducing the number of parameters. The subpatch filter
convolves with the input patch and the resulting network is called a subpatch
network. Taking the output of one subpatch network as input, we further repeat
constructing subpatch networks until the output contains only one neuron in
spatial domain. These subpatch networks form a new network called Cascaded
Subpatch Network (CSNet). The feature layer generated by CSNet is called csconv
layer. For the whole input image, we construct a deep neural network by
stacking a sequence of csconv layers. Experimental results on four benchmark
datasets demonstrate the effectiveness and compactness of the proposed CSNet.
For example, our CSNet reaches a test error of on the CIFAR10
dataset without model averaging. To the best of our knowledge, this is the best
result ever obtained on the CIFAR10 dataset