1,292 research outputs found

    SSDA-YOLO: Semi-supervised Domain Adaptive YOLO for Cross-Domain Object Detection

    Full text link
    Domain adaptive object detection (DAOD) aims to alleviate transfer performance degradation caused by the cross-domain discrepancy. However, most existing DAOD methods are dominated by outdated and computationally intensive two-stage Faster R-CNN, which is not the first choice for industrial applications. In this paper, we propose a novel semi-supervised domain adaptive YOLO (SSDA-YOLO) based method to improve cross-domain detection performance by integrating the compact one-stage stronger detector YOLOv5 with domain adaptation. Specifically, we adapt the knowledge distillation framework with the Mean Teacher model to assist the student model in obtaining instance-level features of the unlabeled target domain. We also utilize the scene style transfer to cross-generate pseudo images in different domains for remedying image-level differences. In addition, an intuitive consistency loss is proposed to further align cross-domain predictions. We evaluate SSDA-YOLO on public benchmarks including PascalVOC, Clipart1k, Cityscapes, and Foggy Cityscapes. Moreover, to verify its generalization, we conduct experiments on yawning detection datasets collected from various real classrooms. The results show considerable improvements of our method in these DAOD tasks, which reveals both the effectiveness of proposed adaptive modules and the urgency of applying more advanced detectors in DAOD. Our code is available on \url{https://github.com/hnuzhy/SSDA-YOLO}.Comment: submitted to CVI

    Pseudo-labels for Supervised Learning on Dynamic Vision Sensor Data, Applied to Object Detection under Ego-motion

    Full text link
    In recent years, dynamic vision sensors (DVS), also known as event-based cameras or neuromorphic sensors, have seen increased use due to various advantages over conventional frame-based cameras. Using principles inspired by the retina, its high temporal resolution overcomes motion blurring, its high dynamic range overcomes extreme illumination conditions and its low power consumption makes it ideal for embedded systems on platforms such as drones and self-driving cars. However, event-based data sets are scarce and labels are even rarer for tasks such as object detection. We transferred discriminative knowledge from a state-of-the-art frame-based convolutional neural network (CNN) to the event-based modality via intermediate pseudo-labels, which are used as targets for supervised learning. We show, for the first time, event-based car detection under ego-motion in a real environment at 100 frames per second with a test average precision of 40.3% relative to our annotated ground truth. The event-based car detector handles motion blur and poor illumination conditions despite not explicitly trained to do so, and even complements frame-based CNN detectors, suggesting that it has learnt generalized visual representations

    Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited Data

    Full text link
    Retinal image matching plays a crucial role in monitoring disease progression and treatment response. However, datasets with matched keypoints between temporally separated pairs of images are not available in abundance to train transformer-based model. We propose a novel approach based on reverse knowledge distillation to train large models with limited data while preventing overfitting. Firstly, we propose architectural modifications to a CNN-based semi-supervised method called SuperRetina that help us improve its results on a publicly available dataset. Then, we train a computationally heavier model based on a vision transformer encoder using the lighter CNN-based model, which is counter-intuitive in the field knowledge-distillation research where training lighter models based on heavier ones is the norm. Surprisingly, such reverse knowledge distillation improves generalization even further. Our experiments suggest that high-dimensional fitting in representation space may prevent overfitting unlike training directly to match the final output. We also provide a public dataset with annotations for retinal image keypoint detection and matching to help the research community develop algorithms for retinal image applications

    Semi-Supervised Domain Generalization for Object Detection via Language-Guided Feature Alignment

    Full text link
    Existing domain adaptation (DA) and generalization (DG) methods in object detection enforce feature alignment in the visual space but face challenges like object appearance variability and scene complexity, which make it difficult to distinguish between objects and achieve accurate detection. In this paper, we are the first to address the problem of semi-supervised domain generalization by exploring vision-language pre-training and enforcing feature alignment through the language space. We employ a novel Cross-Domain Descriptive Multi-Scale Learning (CDDMSL) aiming to maximize the agreement between descriptions of an image presented with different domain-specific characteristics in the embedding space. CDDMSL significantly outperforms existing methods, achieving 11.7% and 7.5% improvement in DG and DA settings, respectively. Comprehensive analysis and ablation studies confirm the effectiveness of our method, positioning CDDMSL as a promising approach for domain generalization in object detection tasks.Comment: Accepted at BMVC 202
    • …
    corecore