83,025 research outputs found

    Small Object Detection Based on Two-Stage Calculation Transformer

    Get PDF
    Despite the current small object detection task has achieved significant improvements, it still suffers from some problems. For example, it is a challenge to extract small object features because of little information in the scene of small objects, which may lose the original feature information of small object, resulting in poor detection results. To address this problem, this paper proposes a two-stage calculation Transformer (TCT) based small object detection network. Firstly, a two-stage calculation Transformer is embedded in the backbone feature extraction network for feature enhancement. Based on the traditional Transformer values computation, multiple 1D dilated convolutional layer branches with different feature fusions are utilized to implement global self-attention for the purpose of improving the feature representation and information interaction. Secondly, this paper proposes an effective residual connection module to improve the low-efficiency convolution and activation of the current CSPLayer, which helps to advance the information flow and learn more rich contextual details. Finally, this paper proposes a feature fusion and refinement module for fusing multi-scale features and improving the target feature representation capability. Quantitative and qualitative experiments on PASCAL VOC2007+2012 dataset, COCO2017 dataset and TinyPerson dataset show that the proposed algorithm has better ability of target feature extraction and higher detection accuracy for small target detection, compared with YOLOX

    Enhanced Augmented Reality Framework for Sports Entertainment Applications

    Get PDF
    Augmented Reality (AR) superimposes virtual information on real-world data, such as displaying useful information on videos/images of a scene. This dissertation presents an Enhanced AR (EAR) framework for displaying useful information on images of a sports game. The challenge in such applications is robust object detection and recognition. This is even more challenging when there is strong sunlight. We address the phenomenon where a captured image is degraded by strong sunlight. The developed framework consists of an image enhancement technique to improve the accuracy of subsequent player and face detection. The image enhancement is followed by player detection, face detection, recognition of players, and display of personal information of players. First, an algorithm based on Multi-Scale Retinex (MSR) is proposed for image enhancement. For the tasks of player and face detection, we use adaptive boosting algorithm with Haar-like features for both feature selection and classification. The player face recognition algorithm uses adaptive boosting with the LDA for feature selection and nearest neighbor classifier for classification. The framework can be deployed in any sports where a viewer captures images. Display of players-specific information enhances the end-user experience. Detailed experiments are performed on 2096 diverse images captured using a digital camera and smartphone. The images contain players in different poses, expressions, and illuminations. Player face recognition module requires players faces to be frontal or up to ?350 of pose variation. The work demonstrates the great potential of computer vision based approaches for future development of AR applications.COMSATS Institute of Information Technolog

    PestNet : an end-to-end deep learning approach for large-scale multi-class pest detection and classification

    Get PDF
    Multi-class pest detection is one of the crucial components in pest management involving localization in addition to classification which is much more difficult than generic object detection because of the apparent differences among pest species. This paper proposes a region-based end-to-end approach named PestNet for large-scale multi-class pest detection and classification based on deep learning. PestNet consists of three major parts. First, a novel module channel-spatial attention (CSA) is proposed to be fused into the convolutional neural network (CNN) backbone for feature extraction and enhancement. The second one is called region proposal network (RPN) that is adopted for providing region proposals as potential pest positions based on extracted feature maps from images. Position-sensitive score map (PSSM), the third component, is used to replace fully connected (FC) layers for pest classification and bounding box regression. Furthermore, we apply contextual regions of interest (RoIs) as contextual information of pest features to improve detection accuracy. We evaluate PestNet on our newly collected large-scale pests' image dataset, Multi-class Pests Dataset 2018 (MPD2018) captured by our designed task-specific image acquisition equipment, covering more than 80k images with over 580k pests labeled by agricultural experts and categorized in 16 classes. The experimental results show that the proposed PestNet performs well on multi-class pest detection with 75.46% mean average precision (mAP), which outperforms the state-of-the-art methods
    • …
    corecore