2,453 research outputs found

    PointOBB: Learning Oriented Object Detection via Single Point Supervision

    Full text link
    Single point-supervised object detection is gaining attention due to its cost-effectiveness. However, existing approaches focus on generating horizontal bounding boxes (HBBs) while ignoring oriented bounding boxes (OBBs) commonly used for objects in aerial images. This paper proposes PointOBB, the first single Point-based OBB generation method, for oriented object detection. PointOBB operates through the collaborative utilization of three distinctive views: an original view, a resized view, and a rotated/flipped (rot/flp) view. Upon the original view, we leverage the resized and rot/flp views to build a scale augmentation module and an angle acquisition module, respectively. In the former module, a Scale-Sensitive Consistency (SSC) loss is designed to enhance the deep network's ability to perceive the object scale. For accurate object angle predictions, the latter module incorporates self-supervised learning to predict angles, which is associated with a scale-guided Dense-to-Sparse (DS) matching strategy for aggregating dense angles corresponding to sparse objects. The resized and rot/flp views are switched using a progressive multi-view switching strategy during training to achieve coupled optimization of scale and angle. Experimental results on the DIOR-R and DOTA-v1.0 datasets demonstrate that PointOBB achieves promising performance, and significantly outperforms potential point-supervised baselines.Comment: 11 pages,5 figures, 6 tables. Code: https://github.com/Luo-Z13/pointob

    DoubleAUG: Single-domain Generalized Object Detector in Urban via Color Perturbation and Dual-style Memory

    Full text link
    Object detection in urban scenarios is crucial for autonomous driving in intelligent traffic systems. However, unlike conventional object detection tasks, urban-scene images vary greatly in style. For example, images taken on sunny days differ significantly from those taken on rainy days. Therefore, models trained on sunny day images may not generalize well to rainy day images. In this paper, we aim to solve the single-domain generalizable object detection task in urban scenarios, meaning that a model trained on images from one weather condition should be able to perform well on images from any other weather conditions. To address this challenge, we propose a novel Double AUGmentation (DoubleAUG) method that includes image- and feature-level augmentation schemes. In the image-level augmentation, we consider the variation in color information across different weather conditions and propose a Color Perturbation (CP) method that randomly exchanges the RGB channels to generate various images. In the feature-level augmentation, we propose to utilize a Dual-Style Memory (DSM) to explore the diverse style information on the entire dataset, further enhancing the model's generalization capability. Extensive experiments demonstrate that our proposed method outperforms state-of-the-art methods. Furthermore, ablation studies confirm the effectiveness of each module in our proposed method. Moreover, our method is plug-and-play and can be integrated into existing methods to further improve model performance.Comment: Accepted by ACM Transactions on Multimedia Computing, Communications, and Application

    A survey on generative adversarial networks for imbalance problems in computer vision tasks

    Get PDF
    Any computer vision application development starts off by acquiring images and data, then preprocessing and pattern recognition steps to perform a task. When the acquired images are highly imbalanced and not adequate, the desired task may not be achievable. Unfortunately, the occurrence of imbalance problems in acquired image datasets in certain complex real-world problems such as anomaly detection, emotion recognition, medical image analysis, fraud detection, metallic surface defect detection, disaster prediction, etc., are inevitable. The performance of computer vision algorithms can significantly deteriorate when the training dataset is imbalanced. In recent years, Generative Adversarial Neural Networks (GANs) have gained immense attention by researchers across a variety of application domains due to their capability to model complex real-world image data. It is particularly important that GANs can not only be used to generate synthetic images, but also its fascinating adversarial learning idea showed good potential in restoring balance in imbalanced datasets. In this paper, we examine the most recent developments of GANs based techniques for addressing imbalance problems in image data. The real-world challenges and implementations of synthetic image generation based on GANs are extensively covered in this survey. Our survey first introduces various imbalance problems in computer vision tasks and its existing solutions, and then examines key concepts such as deep generative image models and GANs. After that, we propose a taxonomy to summarize GANs based techniques for addressing imbalance problems in computer vision tasks into three major categories: 1. Image level imbalances in classification, 2. object level imbalances in object detection and 3. pixel level imbalances in segmentation tasks. We elaborate the imbalance problems of each group, and provide GANs based solutions in each group. Readers will understand how GANs based techniques can handle the problem of imbalances and boost performance of the computer vision algorithms

    Offline-to-Online Knowledge Distillation for Video Instance Segmentation

    Full text link
    In this paper, we present offline-to-online knowledge distillation (OOKD) for video instance segmentation (VIS), which transfers a wealth of video knowledge from an offline model to an online model for consistent prediction. Unlike previous methods that having adopting either an online or offline model, our single online model takes advantage of both models by distilling offline knowledge. To transfer knowledge correctly, we propose query filtering and association (QFA), which filters irrelevant queries to exact instances. Our KD with QFA increases the robustness of feature matching by encoding object-centric features from a single frame supplemented by long-range global information. We also propose a simple data augmentation scheme for knowledge distillation in the VIS task that fairly transfers the knowledge of all classes into the online model. Extensive experiments show that our method significantly improves the performance in video instance segmentation, especially for challenging datasets including long, dynamic sequences. Our method also achieves state-of-the-art performance on YTVIS-21, YTVIS-22, and OVIS datasets, with mAP scores of 46.1%, 43.6%, and 31.1%, respectively

    Optimizing The Design Of Multimodal User Interfaces

    Get PDF
    Due to a current lack of principle-driven multimodal user interface design guidelines, designers may encounter difficulties when choosing the most appropriate display modality for given users or specific tasks (e.g., verbal versus spatial tasks). The development of multimodal display guidelines from both a user and task domain perspective is thus critical to the achievement of successful human-system interaction. Specifically, there is a need to determine how to design task information presentation (e.g., via which modalities) to capitalize on an individual operator\u27s information processing capabilities and the inherent efficiencies associated with redundant sensory information, thereby alleviating information overload. The present effort addresses this issue by proposing a theoretical framework (Architecture for Multi-Modal Optimization, AMMO) from which multimodal display design guidelines and adaptive automation strategies may be derived. The foundation of the proposed framework is based on extending, at a functional working memory (WM) level, existing information processing theories and models with the latest findings in cognitive psychology, neuroscience, and other allied sciences. The utility of AMMO lies in its ability to provide designers with strategies for directing system design, as well as dynamic adaptation strategies (i.e., multimodal mitigation strategies) in support of real-time operations. In an effort to validate specific components of AMMO, a subset of AMMO-derived multimodal design guidelines was evaluated with a simulated weapons control system multitasking environment. The results of this study demonstrated significant performance improvements in user response time and accuracy when multimodal display cues were used (i.e., auditory and tactile, individually and in combination) to augment the visual display of information, thereby distributing human information processing resources across multiple sensory and WM resources. These results provide initial empirical support for validation of the overall AMMO model and a sub-set of the principle-driven multimodal design guidelines derived from it. The empirically-validated multimodal design guidelines may be applicable to a wide range of information-intensive computer-based multitasking environments
    • …
    corecore