The Antillean manatee (\emph{Trichechus manatus}) is an endangered
herbivorous aquatic mammal whose role as an ecological balancer and umbrella
species underscores the importance of its conservation. An innovative approach
to monitor manatee populations is passive acoustic monitoring (PAM), where
vocalisations are extracted from submarine audio. We propose a novel end-to-end
approach to detect manatee vocalisations building on the Audio Spectrogram
Transformer (AST). In a transfer learning spirit, we fine-tune AST to detect
manatee calls by redesigning its filterbanks and adapting a real-world dataset
containing partial positive labels. Our experimental evaluation reveals the two
key features of the proposed model: i) it performs on par with the state of the
art without requiring hand-tuned denoising or detection stages, and ii) it can
successfully identify missed vocalisations in the training dataset, thus
reducing the workload of expert bioacoustic labellers. This work is a
preliminary relevant step to develop novel, user-friendly tools for the
conservation of the different species of manatees.Comment: Accepted at MLSP 202