283 research outputs found

    Sample Imbalance Adjustment and Similar Object Exclusion in Underwater Object Tracking

    Full text link
    Although modern trackers exhibit competitive performance for underwater image degradation assessment, two problems remain when these are applied to underwater object tracking (UOT). A single-object tracker is trained on open-air datasets, which results in a serious sample imbalance between underwater objects and open-air objects when it is applied to UOT. Moreover, underwater targets such as fish and dolphins usually have a similar appearance, and it is challenging for models to discriminate weak discriminative features. Existing detection-based post-processing approaches struggle to distinguish a tracked target from similar objects. In this study, the UOSTrack is proposed, which involves the use of underwater images and open-air sequence hybrid training (UOHT), and motion-based post-processing (MBPP). The UOHT training paradigm is designed to train the sample-imbalanced underwater tracker. In particular, underwater object detection (UOD) images are converted into image pairs through customised data augmentation, such that the tracker is exposed to more underwater domain training samples and learns the feature expressions of underwater objects. The MBPP paradigm is proposed to exclude similar objects near the target. In particular, it employs the estimation box predicted using a Kalman filter and the candidate boxes in each frame to reconfirm the tracked target that is hidden in the candidate area when it has been lost. UOSTrack provides an average performance improvement of 3.5 % compared to OSTrack on similar object challenge attribute in UOT100 and UTB180. The average performance improvements provided by UOSTrack are 1 % and 3 %, respectively. The results from two UOT benchmarks demonstrate that UOSTrack sets a new state-of-the-art benchmark, and the effectiveness of UOHT and MBPP, and the generalisation and applicability of the MBPP for use in UOT

    Semi-Supervised Visual Tracking of Marine Animals using Autonomous Underwater Vehicles

    Full text link
    In-situ visual observations of marine organisms is crucial to developing behavioural understandings and their relations to their surrounding ecosystem. Typically, these observations are collected via divers, tags, and remotely-operated or human-piloted vehicles. Recently, however, autonomous underwater vehicles equipped with cameras and embedded computers with GPU capabilities are being developed for a variety of applications, and in particular, can be used to supplement these existing data collection mechanisms where human operation or tags are more difficult. Existing approaches have focused on using fully-supervised tracking methods, but labelled data for many underwater species are severely lacking. Semi-supervised trackers may offer alternative tracking solutions because they require less data than fully-supervised counterparts. However, because there are not existing realistic underwater tracking datasets, the performance of semi-supervised tracking algorithms in the marine domain is not well understood. To better evaluate their performance and utility, in this paper we provide (1) a novel dataset specific to marine animals located at http://warp.whoi.edu/vmat/, (2) an evaluation of state-of-the-art semi-supervised algorithms in the context of underwater animal tracking, and (3) an evaluation of real-world performance through demonstrations using a semi-supervised algorithm on-board an autonomous underwater vehicle to track marine animals in the wild.Comment: To appear in IJCV SI: Animal Trackin

    Lightweight Full-Convolutional Siamese Tracker

    Full text link
    Although single object trackers have achieved advanced performance, their large-scale models hinder their application on limited resources platforms. Moreover, existing lightweight trackers only achieve a balance between 2-3 points in terms of parameters, performance, Flops and FPS. To achieve the optimal balance among these points, this paper proposes a lightweight full-convolutional Siamese tracker called LightFC. LightFC employs a novel efficient cross-correlation module (ECM) and a novel efficient rep-center head (ERH) to improve the feature representation of the convolutional tracking pipeline. The ECM uses an attention-like module design, which conducts spatial and channel linear fusion of fused features and enhances the nonlinearity of the fused features. Additionally, it refers to successful factors of current lightweight trackers and introduces skip-connections and reuse of search area features. The ERH reparameterizes the feature dimensional stage in the standard center-head and introduces channel attention to optimize the bottleneck of key feature flows. Comprehensive experiments show that LightFC achieves the optimal balance between performance, parameters, Flops and FPS. The precision score of LightFC outperforms MixFormerV2-S on LaSOT and TNL2K by 3.7 % and 6.5 %, respectively, while using 5x fewer parameters and 4.6x fewer Flops. Besides, LightFC runs 2x faster than MixFormerV2-S on CPUs. In addition, a higher-performance version named LightFC-vit is proposed by replacing a more powerful backbone network. The code and raw results can be found at https://github.com/LiYunfengLYF/LightFC

    Forward-Looking Sonar Patch Matching:Modern CNNs, Ensembling, and Uncertainty

    Get PDF
    Application of underwater robots are on the rise, most of them are dependent on sonar for underwater vision, but the lack of strong perception capabilities limits them in this task. An important issue in sonar perception is matching image patches, which can enable other techniques like localization, change detection, and mapping. There is a rich literature for this problem in color images, but for acoustic images, it is lacking, due to the physics that produce these images. In this paper we improve on our previous results for this problem (Valdenegro-Toro et al, 2017), instead of modeling features manually, a Convolutional Neural Network (CNN) learns a similarity function and predicts if two input sonar images are similar or not. With the objective of improving the sonar image matching problem further, three state of the art CNN architectures are evaluated on the Marine Debris dataset, namely DenseNet, and VGG, with a siamese or two-channel architecture, and contrastive loss. To ensure a fair evaluation of each network, thorough hyper-parameter optimization is executed. We find that the best performing models are DenseNet Two-Channel network with 0.955 AUC, VGG-Siamese with contrastive loss at 0.949 AUC and DenseNet Siamese with 0.921 AUC. By ensembling the top performing DenseNet two-channel and DenseNet-Siamese models overall highest prediction accuracy obtained is 0.978 AUC, showing a large improvement over the 0.91 AUC in the state of the art
    • …
    corecore