11 research outputs found

    Spatio-temporal attention model for foreground detection in cross-scene surveillance videos

    No full text
    Foreground detection is an important theme in video surveillance. Conventional background modeling approaches build sophisticated temporal statistical model to detect foreground based on low-level features, while modern semantic/instance segmentation approaches generate high-level foreground annotation, but ignore the temporal relevance among consecutive frames. In this paper, we propose a Spatio-Temporal Attention Model (STAM) for cross-scene foreground detection. To fill the semantic gap between low and high level features, appearance and optical flow features are synthesized by attention modules via the feature learning procedure. Experimental results on CDnet 2014 benchmarks validate it and outperformed many state-of-the-art methods in seven evaluation metrics. With the attention modules and optical flow, its F-measure increased 9% and 6% respectively. The model without any tuning showed its cross-scene generalization on Wallflower and PETS datasets. The processing speed was 10.8 fps with the frame size 256 by 256

    Robust Cross-Scene Foreground Segmentation in Surveillance Video

    Full text link
    Training only one deep model for large-scale cross-scenevideo foreground segmentation is challenging due to the off-the-shelf deep learning based segmentor relies on scene-specific structural information. This results in deep mod-els that are scene-biased and evaluations that are scene-influenced.In this paper, we integrate dual modalities(foregrounds’ motion and appearance), and then eliminat-ing features without representativeness of foreground throughattention-module-guided selective-connection structures. It isin an end-to-end training manner and to achieve scene adap-tation in the plug and play style. Experiments indicate theproposed method significantly outperforms the state-of-the-art deep models and background subtraction methods in un-trained scenes – LIMU and LASIESTA. (Codes and datasetwill be available after the anonymous stage.

    Grayscale-thermal tracking via inverse sparse representation-based collaborative encoding

    Full text link
    Grayscale-thermal tracking has attracted a great deal of attention due to its capability of fusing two different yet complementary target observations. Existing methods often consider extracting the discriminative target information and exploring the target correlation among different images as two separate issues, ignoring their interdependence. This may cause tracking drifts in challenging video pairs. This paper presents a collaborative encoding model called joint correlation and discriminant analysis based inver-sparse representation (JCDA-InvSR) to jointly encode the target candidates in the grayscale and thermal video sequences. In particular, we develop a multi-objective programming to integrate the feature selection and the multi-view correlation analysis into a unified optimization problem in JCDA-InvSR, which can simultaneously highlight the special characters of the grayscale and thermal targets through alternately optimizing two aspects: the target discrimination within a given image and the target correlation across different images. For robust grayscale-thermal tracking, we also incorporate the prior knowledge of target candidate codes into the SVM based target classifier to overcome the overfitting caused by limited training labels. Extensive experiments on GTOT and RGBT234 datasets illustrate the promising performance of our tracking framework

    TIB-Net: Drone Detection Network with Tiny Iterative Backbone

    No full text
    With the widespread application of drone in commercial and industrial fields, drone detection has received increasing attention in public safety and others. However, due to various appearance of small-size drones, changeable and complex environments, and limited memory resources of edge computing devices, drone detection remains a challenging task nowadays. Although deep convolutional neural network (CNN) has shown powerful performance in object detection in recent years, most existing CNN-based methods cannot balance detection performance and model size well. To solve the problem, we develop a drone detection network with tiny iterative backbone named TIB-Net. In this network, we propose a structure called cyclic pathway, which enhances the capability to extract effective features of small object, and integrate it into existing efficient method Extremely Tiny Face Detector (EXTD). This method not only significantly improves the accuracy of drone detection, but also keeps the model size at an acceptable level. Furthermore, we integrate spatial attention module into our network backbone to emphasize information of small object, which can better locate small-size drone and further improve detection performance. In addition, we present massive manual annotations of object bounding boxes for our collected 2860 drone images as a drone benchmark dataset, which is now publicly available. In this work, we conduct a series of experiments on our collected dataset to evaluate TIB-Net, and the result shows that our proposed method achieves mean average precision of 89.2% with model size of 697.0KB, which achieves the state-of-the-art results compared with existing methods

    Deeper SSD: Simultaneous Up-sampling and Down-sampling for Drone Detection

    Full text link
    Drone detection can be considered as a specific sort of small object detection, which has always been a challenge because of its small size and few features. For improving the detection rate of drones, we design a Deeper SSD network, which uses large-scale input image and deeper convolutional network to obtain more features that benefit small object classification. At the same time, in order to improve object classification performance, we implemented the up-sampling modules to increase the number of features for the low-level feature map. In addition, in order to improve object location performance, we adopted the down-sampling modules so that the context information can be used by the high-level feature map directly. Our proposed Deeper SSD and its variants are successfully applied to the self-designed drone datasets. Our experiments demonstrate the effectiveness of the Deeper SSD and its variants, which are useful to small drone’s detection and recognition. These proposed methods can also detect small and large objects simultaneously

    C2DAN: an Improved Deep Adaptation Network with Domain Confusion and Classifier Adaptation

    No full text
    Deep neural networks have been successfully applied in domain adaptation which uses the labeled data of source domain to supplement useful information for target domain. Deep Adaptation Network (DAN) is one of these efficient frameworks, it utilizes Multi-Kernel Maximum Mean Discrepancy (MK-MMD) to align the feature distribution in a reproducing kernel Hilbert space. However, DAN does not perform very well in feature level transfer, and the assumption that source and target domain share classifiers is too strict in different adaptation scenarios. In this paper, we further improve the adaptability of DAN by incorporating Domain Confusion (DC) and Classifier Adaptation (CA). To achieve this, we propose a novel domain adaptation method named C2DAN. Our approach first enables Domain Confusion (DC) by using a domain discriminator for adversarial training. For Classifier Adaptation (CA), a residual block is added to the source domain classifier in order to learn the difference between source classifier and target classifier. Beyond validating our framework on the standard domain adaptation dataset office-31, we also introduce and evaluate on the Comprehensive Cars (CompCars) dataset, and the experiment results demonstrate the effectiveness of the proposed framework C2DAN

    Learning Calibrated-Guidance for Object Detection in Aerial Images

    Full text link
    Object detection is one of the most fundamental yet challenging research topics in the domain of computer vision. Recently, the study on this topic in aerial images has made tremendous progress. However, complex background and worse imaging quality are obvious problems in aerial object detection. Most state-of-the-art approaches tend to develop elaborate attention mechanisms for the space-time feature calibrations with arduous computational complexity, while surprisingly ignoring the importance of feature calibrations in channel-wise. In this work, we propose a simple yet effective Calibrated-Guidance(CG) scheme to enhance channel communications in a feature transformer fashion, which can adaptively determine the calibration weights for each channel based on the global feature affinity correlations. Specifically, for a given set of feature maps, CG first computes the feature similarity between each channel and the remaining channels as the intermediary calibration guidance. Then, re-representing each channel by aggregating all the channels weighted together via the guidance operation. Our CG is a general module that can be plugged into any deep neural networks, which is named as CG-Net. To demonstrate its effectiveness and efficiency, extensive experiments are carried out on both oriented object detection task and horizontal object detection task in aerial images. Experimental results on two challenging benchmarks(i.e., DOTA and HRSC2016) demonstrate that our CG-Net can achieve the new state-of-the-art performance in accuracy with a fair computational overhead. The source code has been open sourced at https://github.com/WeiZongqi/CG-Ne

    Anchor Retouching via Model Interaction for Robust Object Detection in Aerial Images

    Full text link
    Object detection has made tremendous strides in computer vision. Small object detection with appearance degradation is a prominent challenge, especially for aerial observations. To collect sufficient positive/negative samples for heuristic training, most object detectors preset region anchors in order to calculate Intersection-over-Union (IoU) against the ground-truthed data. In this case, small objects are frequently abandoned or mislabeled. In this paper, we present an effective Dynamic Enhancement Anchor (DEA) network to construct a novel training sample generator. Different from the other state-of-the-art techniques, the proposed network leverages a sample discriminator to realize interactive sample screening between an anchor-based unit and an anchor-free unit to generate eligible samples. Besides, multi-task joint training with a conservative anchor-based inference scheme enhances the performance of the proposed model while reducing computational complexity. The proposed scheme supports both oriented and horizontal object detection tasks. Extensive experiments on two challenging aerial benchmarks (i.e., DOTA and HRSC2016) indicate that our method achieves state-of-the-art performance in accuracy with moderate inference speed and computational overhead for training. On DOTA, DEA-Net surpasses the other state-of-the-art by 0.40% mean-Average-Precision (mAP) for oriented object detection with a weaker backbone network (ResNet-101vsResNet-152) and 3.08% mean-Average-Precision (mAP)for horizontal object detection with the same backbone. OnHRSC2016, it surpasses the previous best model by 1.1% using only 3 horizontal anchors

    Score-specific Non-maximum Suppression and Coexistence Prior for Multi-scale Face Detection

    Full text link
    Face detection is an ultimate component to support various visual facial related tasks. However, detecting faces with extremely low resolution or high occlusion is still an open problem. In this paper, we propose a two-step general approach to refine the performance of modern face detectors according to human's high-level context-aware ability. First, we propose Score-specific Non-Maximum Suppression (SNMS) to preserve overlapped faces. Second, we consider the coexistence prior among faces in the scene, which could raise the sensitivity of face detection in the crowd. When integrating our approach to the existing face detectors, most of them have better results on a challenging benchmark (WIDER FACE) and a newly proposed dataset (Faces in Crowd, FIC) made by us. Codes are available on https://github.com/AIoTP/SNMSandCoexistence
    corecore