A Density-Guided Temporal Attention Transformer for Indiscernible Object
  Counting in Underwater Video

Huang, Hsiang-Wei; Hwang, Jenq-Neng; Jiang, Zhongyu; Wallace, Farron; Wang, Hao; Yang, Cheng-Yen

A Density-Guided Temporal Attention Transformer for Indiscernible Object Counting in Underwater Video

Authors: Hsiang-Wei Huang
Jenq-Neng Hwang
Zhongyu Jiang
Farron Wallace
Hao Wang
Cheng-Yen Yang
Publication date: 5 March 2024
Publisher

Abstract

Dense object counting or crowd counting has come a long way thanks to the recent development in the vision community. However, indiscernible object counting, which aims to count the number of targets that are blended with respect to their surroundings, has been a challenge. Image-based object counting datasets have been the mainstream of the current publicly available datasets. Therefore, we propose a large-scale dataset called YoutubeFish-35, which contains a total of 35 sequences of high-definition videos with high frame-per-second and more than 150,000 annotated center points across a selected variety of scenes. For benchmarking purposes, we select three mainstream methods for dense object counting and carefully evaluate them on the newly collected dataset. We propose TransVidCount, a new strong baseline that combines density and regression branches along the temporal domain in a unified framework and can effectively tackle indiscernible object counting with state-of-the-art performance on YoutubeFish-35 dataset.Comment: Accepted by ICASSP 2024 (IEEE International Conference on Acoustics, Speech, and Signal Processing

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2403.03461

Last time updated on 26/09/2024