283 research outputs found
Sample Imbalance Adjustment and Similar Object Exclusion in Underwater Object Tracking
Although modern trackers exhibit competitive performance for underwater image
degradation assessment, two problems remain when these are applied to
underwater object tracking (UOT). A single-object tracker is trained on
open-air datasets, which results in a serious sample imbalance between
underwater objects and open-air objects when it is applied to UOT. Moreover,
underwater targets such as fish and dolphins usually have a similar appearance,
and it is challenging for models to discriminate weak discriminative features.
Existing detection-based post-processing approaches struggle to distinguish a
tracked target from similar objects. In this study, the UOSTrack is proposed,
which involves the use of underwater images and open-air sequence hybrid
training (UOHT), and motion-based post-processing (MBPP). The UOHT training
paradigm is designed to train the sample-imbalanced underwater tracker. In
particular, underwater object detection (UOD) images are converted into image
pairs through customised data augmentation, such that the tracker is exposed to
more underwater domain training samples and learns the feature expressions of
underwater objects. The MBPP paradigm is proposed to exclude similar objects
near the target. In particular, it employs the estimation box predicted using a
Kalman filter and the candidate boxes in each frame to reconfirm the tracked
target that is hidden in the candidate area when it has been lost. UOSTrack
provides an average performance improvement of 3.5 % compared to OSTrack on
similar object challenge attribute in UOT100 and UTB180. The average
performance improvements provided by UOSTrack are 1 % and 3 %, respectively.
The results from two UOT benchmarks demonstrate that UOSTrack sets a new
state-of-the-art benchmark, and the effectiveness of UOHT and MBPP, and the
generalisation and applicability of the MBPP for use in UOT
Semi-Supervised Visual Tracking of Marine Animals using Autonomous Underwater Vehicles
In-situ visual observations of marine organisms is crucial to developing
behavioural understandings and their relations to their surrounding ecosystem.
Typically, these observations are collected via divers, tags, and
remotely-operated or human-piloted vehicles. Recently, however, autonomous
underwater vehicles equipped with cameras and embedded computers with GPU
capabilities are being developed for a variety of applications, and in
particular, can be used to supplement these existing data collection mechanisms
where human operation or tags are more difficult. Existing approaches have
focused on using fully-supervised tracking methods, but labelled data for many
underwater species are severely lacking. Semi-supervised trackers may offer
alternative tracking solutions because they require less data than
fully-supervised counterparts. However, because there are not existing
realistic underwater tracking datasets, the performance of semi-supervised
tracking algorithms in the marine domain is not well understood. To better
evaluate their performance and utility, in this paper we provide (1) a novel
dataset specific to marine animals located at http://warp.whoi.edu/vmat/, (2)
an evaluation of state-of-the-art semi-supervised algorithms in the context of
underwater animal tracking, and (3) an evaluation of real-world performance
through demonstrations using a semi-supervised algorithm on-board an autonomous
underwater vehicle to track marine animals in the wild.Comment: To appear in IJCV SI: Animal Trackin
Lightweight Full-Convolutional Siamese Tracker
Although single object trackers have achieved advanced performance, their
large-scale models hinder their application on limited resources platforms.
Moreover, existing lightweight trackers only achieve a balance between 2-3
points in terms of parameters, performance, Flops and FPS. To achieve the
optimal balance among these points, this paper proposes a lightweight
full-convolutional Siamese tracker called LightFC. LightFC employs a novel
efficient cross-correlation module (ECM) and a novel efficient rep-center head
(ERH) to improve the feature representation of the convolutional tracking
pipeline. The ECM uses an attention-like module design, which conducts spatial
and channel linear fusion of fused features and enhances the nonlinearity of
the fused features. Additionally, it refers to successful factors of current
lightweight trackers and introduces skip-connections and reuse of search area
features. The ERH reparameterizes the feature dimensional stage in the standard
center-head and introduces channel attention to optimize the bottleneck of key
feature flows. Comprehensive experiments show that LightFC achieves the optimal
balance between performance, parameters, Flops and FPS. The precision score of
LightFC outperforms MixFormerV2-S on LaSOT and TNL2K by 3.7 % and 6.5 %,
respectively, while using 5x fewer parameters and 4.6x fewer Flops. Besides,
LightFC runs 2x faster than MixFormerV2-S on CPUs. In addition, a
higher-performance version named LightFC-vit is proposed by replacing a more
powerful backbone network. The code and raw results can be found at
https://github.com/LiYunfengLYF/LightFC
Forward-Looking Sonar Patch Matching:Modern CNNs, Ensembling, and Uncertainty
Application of underwater robots are on the rise, most of them are dependent on sonar for underwater vision, but the lack of strong perception capabilities limits them in this task. An important issue in sonar perception is matching image patches, which can enable other techniques like localization, change detection, and mapping. There is a rich literature for this problem in color images, but for acoustic images, it is lacking, due to the physics that produce these images. In this paper we improve on our previous results for this problem (Valdenegro-Toro et al, 2017), instead of modeling features manually, a Convolutional Neural Network (CNN) learns a similarity function and predicts if two input sonar images are similar or not. With the objective of improving the sonar image matching problem further, three state of the art CNN architectures are evaluated on the Marine Debris dataset, namely DenseNet, and VGG, with a siamese or two-channel architecture, and contrastive loss. To ensure a fair evaluation of each network, thorough hyper-parameter optimization is executed. We find that the best performing models are DenseNet Two-Channel network with 0.955 AUC, VGG-Siamese with contrastive loss at 0.949 AUC and DenseNet Siamese with 0.921 AUC. By ensembling the top performing DenseNet two-channel and DenseNet-Siamese models overall highest prediction accuracy obtained is 0.978 AUC, showing a large improvement over the 0.91 AUC in the state of the art
- …