2 research outputs found

    Learning Multilayer Channel Features for Pedestrian Detection

    Full text link
    Pedestrian detection based on the combination of Convolutional Neural Network (i.e., CNN) and traditional handcrafted features (i.e., HOG+LUV) has achieved great success. Generally, HOG+LUV are used to generate the candidate proposals and then CNN classifies these proposals. Despite its success, there is still room for improvement. For example, CNN classifies these proposals by the full-connected layer features while proposal scores and the features in the inner-layers of CNN are ignored. In this paper, we propose a unifying framework called Multilayer Channel Features (MCF) to overcome the drawback. It firstly integrates HOG+LUV with each layer of CNN into a multi-layer image channels. Based on the multi-layer image channels, a multi-stage cascade AdaBoost is then learned. The weak classifiers in each stage of the multi-stage cascade is learned from the image channels of corresponding layer. With more abundant features, MCF achieves the state-of-the-art on Caltech pedestrian dataset (i.e., 10.40% miss rate). Using new and accurate annotations, MCF achieves 7.98% miss rate. As many non-pedestrian detection windows can be quickly rejected by the first few stages, it accelerates detection speed by 1.43 times. By eliminating the highly overlapped detection windows with lower scores after the first stage, it's 4.07 times faster with negligible performance loss

    Cascaded Subpatch Networks for Effective CNNs

    Full text link
    Conventional Convolutional Neural Networks (CNNs) use either a linear or non-linear filter to extract features from an image patch (region) of spatial size H×W H\times W (Typically, H H is small and is equal to W W, e.g., H H is 5 or 7). Generally, the size of the filter is equal to the size H×W H\times W of the input patch. We argue that the representation ability of equal-size strategy is not strong enough. To overcome the drawback, we propose to use subpatch filter whose spatial size h×w h\times w is smaller than H×W H\times W . The proposed subpatch filter consists of two subsequent filters. The first one is a linear filter of spatial size h×w h\times w and is aimed at extracting features from spatial domain. The second one is of spatial size 1×1 1\times 1 and is used for strengthening the connection between different input feature channels and for reducing the number of parameters. The subpatch filter convolves with the input patch and the resulting network is called a subpatch network. Taking the output of one subpatch network as input, we further repeat constructing subpatch networks until the output contains only one neuron in spatial domain. These subpatch networks form a new network called Cascaded Subpatch Network (CSNet). The feature layer generated by CSNet is called csconv layer. For the whole input image, we construct a deep neural network by stacking a sequence of csconv layers. Experimental results on four benchmark datasets demonstrate the effectiveness and compactness of the proposed CSNet. For example, our CSNet reaches a test error of 5.68% 5.68\% on the CIFAR10 dataset without model averaging. To the best of our knowledge, this is the best result ever obtained on the CIFAR10 dataset
    corecore