344,247 research outputs found

    An Analysis of Scale Invariance in Object Detection - SNIP

    Full text link
    An analysis of different techniques for recognizing and detecting objects under extreme scale variation is presented. Scale specific and scale invariant design of detectors are compared by training them with different configurations of input data. By evaluating the performance of different network architectures for classifying small objects on ImageNet, we show that CNNs are not robust to changes in scale. Based on this analysis, we propose to train and test detectors on the same scales of an image-pyramid. Since small and large objects are difficult to recognize at smaller and larger scales respectively, we present a novel training scheme called Scale Normalization for Image Pyramids (SNIP) which selectively back-propagates the gradients of object instances of different sizes as a function of the image scale. On the COCO dataset, our single model performance is 45.7% and an ensemble of 3 networks obtains an mAP of 48.3%. We use off-the-shelf ImageNet-1000 pre-trained models and only train with bounding box supervision. Our submission won the Best Student Entry in the COCO 2017 challenge. Code will be made available at \url{http://bit.ly/2yXVg4c}.Comment: CVPR 2018, camera ready versio

    A Novel Weight-Shared Multi-Stage CNN for Scale Robustness

    Get PDF
    Convolutional neural networks (CNNs) have demonstrated remarkable results in image classification for benchmark tasks and practical applications. The CNNs with deeper architectures have achieved even higher performance recently thanks to their robustness to the parallel shift of objects in images as well as their numerous parameters and the resulting high expression ability. However, CNNs have a limited robustness to other geometric transformations such as scaling and rotation. This limits the performance improvement of the deep CNNs, but there is no established solution. This study focuses on scale transformation and proposes a network architecture called the weight-shared multi-stage network (WSMS-Net), which consists of multiple stages of CNNs. The proposed WSMS-Net is easily combined with existing deep CNNs such as ResNet and DenseNet and enables them to acquire robustness to object scaling. Experimental results on the CIFAR-10, CIFAR-100, and ImageNet datasets demonstrate that existing deep CNNs combined with the proposed WSMS-Net achieve higher accuracies for image classification tasks with only a minor increase in the number of parameters and computation time.Comment: accepted version, 13 page

    ELASTIC: Improving CNNs with Dynamic Scaling Policies

    Full text link
    Scale variation has been a challenge from traditional to modern approaches in computer vision. Most solutions to scale issues have a similar theme: a set of intuitive and manually designed policies that are generic and fixed (e.g. SIFT or feature pyramid). We argue that the scaling policy should be learned from data. In this paper, we introduce ELASTIC, a simple, efficient and yet very effective approach to learn a dynamic scale policy from data. We formulate the scaling policy as a non-linear function inside the network's structure that (a) is learned from data, (b) is instance specific, (c) does not add extra computation, and (d) can be applied on any network architecture. We applied ELASTIC to several state-of-the-art network architectures and showed consistent improvement without extra (sometimes even lower) computation on ImageNet classification, MSCOCO multi-label classification, and PASCAL VOC semantic segmentation. Our results show major improvement for images with scale challenges. Our code is available here: https://github.com/allenai/elasticComment: CVPR 2019 oral, code available https://github.com/allenai/elasti
    • …
    corecore