9 research outputs found

    Hierarchical shot detector

    Get PDF
    Single shot detector simultaneously predicts object categories and regression offsets of the default boxes. Despite of high efficiency, this structure has some inappropriate designs: (1) The classification result of the default box is improperly assigned to that of the regressed box during inference, (2) Only regression once is not good enough for accurate object detection. To solve the first problem, a novel reg-offset-cls (ROC) module is proposed. It contains three hierarchical steps: box regression, the feature sampling location predication, and the regressed box classification with the features of offset locations. To further solve the second problem, a hierarchical shot detector (HSD) is proposed, which stacks two ROC modules and one feature enhanced module. The second ROC treats the regressed boxes and the feature sampling locations of features in the first ROC as the inputs. Meanwhile, the feature enhanced module injected between two ROCs aims to extract the local and non-local context. Experiments on the MS COCO and PASCAL VOC datasets demonstrate the superiority of proposed HSD. Without the bells or whistles, HSD outperforms all one-stage methods at real-time speed

    Msb r‐cnn: A multi‐stage balanced defect detection network

    Get PDF
    Deep learning networks are applied for defect detection, among which Cascade R‐CNN is a multi‐stage object detection network and is state of the art in terms of accuracy and efficiency. However, it is still a challenge for Cascade R‐CNN to deal with complex and diverse defects, as the widely varied shapes of defects lead to inefficiency for the traditional convolution filter to extract features. Additionally, the imbalance in features, losses and samples cause lower accuracy. To address the above challenges, this paper proposes a multi‐stage balanced R‐CNN (MSB R‐CNN) for defect detection based on Cascade R‐CNN. Firstly, deformable convolution is adopted in different stages of the backbone network to improve its adaptability to the varying shapes of the defect. Then, the features obtained by the backbone network are refined and enhanced by the balanced feature pyramid. To overcome the imbalance of classification and regression loss, the balanced L1 loss is applied at different stages to correct it. Finally, for the sample selection, the interaction of union (IoU) balanced sampler and the online hard example mining (OHEM) sampler are combined at different stages to make the sampling more reasonable, which can bring a better accuracy and convergence effect to the model. The results of our experiments on the DAGM2007 dataset has shown that our network (MSB R‐CNN) can achieve a mean average precision (mAP) of 67.5%, an increase of 1.5% mAP, compared to Cascade R‐CNN

    YOLO-Former: YOLO Shakes Hand With ViT

    Full text link
    The proposed YOLO-Former method seamlessly integrates the ideas of transformer and YOLOv4 to create a highly accurate and efficient object detection system. The method leverages the fast inference speed of YOLOv4 and incorporates the advantages of the transformer architecture through the integration of convolutional attention and transformer modules. The results demonstrate the effectiveness of the proposed approach, with a mean average precision (mAP) of 85.76\% on the Pascal VOC dataset, while maintaining high prediction speed with a frame rate of 10.85 frames per second. The contribution of this work lies in the demonstration of how the innovative combination of these two state-of-the-art techniques can lead to further improvements in the field of object detection

    A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection

    Get PDF
    We propose average Localisation-Recall-Precision (aLRP), a unified, bounded, balanced and ranking-based loss function for both classification and localisation tasks in object detection. aLRP extends the Localisation-Recall-Precision (LRP) performance metric (Oksuz et al., 2018) inspired from how Average Precision (AP) Loss extends precision to a ranking-based loss function for classification (Chen et al., 2020). aLRP has the following distinct advantages: (i) aLRP is the first ranking-based loss function for both classification and localisation tasks. (ii) Thanks to using ranking for both tasks, aLRP naturally enforces high-quality localisation for high-precision classification. (iii) aLRP provides provable balance between positives and negatives. (iv) Compared to on average \sim6 hyperparameters in the loss functions of state-of-the-art detectors, aLRP Loss has only one hyperparameter, which we did not tune in practice. On the COCO dataset, aLRP Loss improves its ranking-based predecessor, AP Loss, up to around 55 AP points, achieves 48.948.9 AP without test time augmentation and outperforms all one-stage detectors. Code available at: https://github.com/kemaloksuz/aLRPLoss .Comment: NeurIPS 2020 spotlight pape
    corecore