421 research outputs found

    Online Domain Adaptation for Multi-Object Tracking

    Full text link
    Automatically detecting, labeling, and tracking objects in videos depends first and foremost on accurate category-level object detectors. These might, however, not always be available in practice, as acquiring high-quality large scale labeled training datasets is either too costly or impractical for all possible real-world application scenarios. A scalable solution consists in re-using object detectors pre-trained on generic datasets. This work is the first to investigate the problem of on-line domain adaptation of object detectors for causal multi-object tracking (MOT). We propose to alleviate the dataset bias by adapting detectors from category to instances, and back: (i) we jointly learn all target models by adapting them from the pre-trained one, and (ii) we also adapt the pre-trained model on-line. We introduce an on-line multi-task learning algorithm to efficiently share parameters and reduce drift, while gradually improving recall. Our approach is applicable to any linear object detector, and we evaluate both cheap "mini-Fisher Vectors" and expensive "off-the-shelf" ConvNet features. We quantitatively measure the benefit of our domain adaptation strategy on the KITTI tracking benchmark and on a new dataset (PASCAL-to-KITTI) we introduce to study the domain mismatch problem in MOT.Comment: To appear at BMVC 201

    Deep learning with very few and no labels

    Get PDF
    Deep neural networks have achieved remarkable performance in many computer vision applications such as image classification, object detection, instance segmentation, image retrieval, and person re-identification. However, to achieve the desired performance, deep neural networks often need a tremendously large set of labeled training samples to learn its huge network model. Labeling a large dataset is labor-intensive, time-consuming, and sometimes requiring expert knowledge. In this research, we study the following important question: how to train deep neural networks with very few or even no labeled samples? This leads to our research tasks in the following two major areas: semi-supervised and unsupervised learning. Specifically, for semi-supervised learning, we developed two major approaches. The first one is the Snowball approach which learns a deep neural network from very few samples based on iterative model evolution and confident sample discovery. The second one is the learned model composition approach which composes more efficient master networks from student models of past iterations through a network learning process. Critical sample discovery is developed to discover new critical unlabeled samples near the model decision boundary and provide the master model with lookahead access to these samples to enhance its guidance capability. For unsupervised learning, we have explored two major ideas. The first idea is transformed attention consistency where the network is learned based on selfsupervision information across images instead of within one single image. The second one is spatial assembly networks for image representation learning. We introduce a new learnable module, called spatial assembly network (SAN), which performs a learned re-organization and assembly of feature points and improves the network capabilities in handling spatial variations and structural changes of the image scene. Our experimental results on benchmark datasets demonstrate that our proposed methods have significantly improved the state-of-the-art in semi-supervised and unsupervised learning, outperforming existing methods by large margins.Includes bibliographical references

    Hierarchical Disentanglement-Alignment Network for Robust SAR Vehicle Recognition

    Full text link
    Vehicle recognition is a fundamental problem in SAR image interpretation. However, robustly recognizing vehicle targets is a challenging task in SAR due to the large intraclass variations and small interclass variations. Additionally, the lack of large datasets further complicates the task. Inspired by the analysis of target signature variations and deep learning explainability, this paper proposes a novel domain alignment framework named the Hierarchical Disentanglement-Alignment Network (HDANet) to achieve robustness under various operating conditions. Concisely, HDANet integrates feature disentanglement and alignment into a unified framework with three modules: domain data generation, multitask-assisted mask disentanglement, and domain alignment of target features. The first module generates diverse data for alignment, and three simple but effective data augmentation methods are designed to simulate target signature variations. The second module disentangles the target features from background clutter using the multitask-assisted mask to prevent clutter from interfering with subsequent alignment. The third module employs a contrastive loss for domain alignment to extract robust target features from generated diverse data and disentangled features. Lastly, the proposed method demonstrates impressive robustness across nine operating conditions in the MSTAR dataset, and extensive qualitative and quantitative analyses validate the effectiveness of our framework
    • …
    corecore