4,388 research outputs found
Regularized Contrastive Pre-training for Few-shot Bioacoustic Sound Detection
Bioacoustic sound event detection allows for better understanding of animal
behavior and for better monitoring biodiversity using audio. Deep learning
systems can help achieve this goal, however it is difficult to acquire
sufficient annotated data to train these systems from scratch. To address this
limitation, the Detection and Classification of Acoustic Scenes and Events
(DCASE) community has recasted the problem within the framework of few-shot
learning and organize an annual challenge for learning to detect animal sounds
from only five annotated examples. In this work, we regularize supervised
contrastive pre-training to learn features that can transfer well on new target
tasks with animal sounds unseen during training, achieving a high F-score of
61.52%(0.48) when no feature adaptation is applied, and an F-score of
68.19%(0.75) when we further adapt the learned features for each new target
task. This work aims to lower the entry bar to few-shot bioacoustic sound event
detection by proposing a simple and yet effective framework for this task, by
also providing open-source code
Leveraging label hierarchies for few-shot everyday sound recognition
Everyday sounds cover a considerable range of sound categories in our daily life, yet for certain sound categories it is hard to collect sufficient data. Although existing works have applied few-shot learning paradigms to sound recognition successfully, most of them have not exploited the relationship between labels in audio taxonomies. This work adopts a hierarchical prototypical network to leverage the knowledge rooted in audio taxonomies. Specifically, a VGG-like convolutional neural network is used to extract acoustic features. Prototypical nodes are then calculated in each level of the tree structure. A multi-level loss is obtained by multiplying a weight decay with multiple losses. Experimental results demonstrate our hierarchical prototypical networks not only outperform prototypical networks with no hierarchy information but yield a better result than other state-of-the art algorithms. Our code is available in: https://github.com/JinhuaLiang/HPNs_taggin
MetaAudio: A Few-Shot Audio Classification Benchmark
Currently available benchmarks for few-shot learning (machine learning with
few training examples) are limited in the domains they cover, primarily
focusing on image classification. This work aims to alleviate this reliance on
image-based benchmarks by offering the first comprehensive, public and fully
reproducible audio based alternative, covering a variety of sound domains and
experimental settings. We compare the few-shot classification performance of a
variety of techniques on seven audio datasets (spanning environmental sounds to
human-speech). Extending this, we carry out in-depth analyses of joint training
(where all datasets are used during training) and cross-dataset adaptation
protocols, establishing the possibility of a generalised audio few-shot
classification algorithm. Our experimentation shows gradient-based
meta-learning methods such as MAML and Meta-Curvature consistently outperform
both metric and baseline methods. We also demonstrate that the joint training
routine helps overall generalisation for the environmental sound databases
included, as well as being a somewhat-effective method of tackling the
cross-dataset/domain setting.Comment: 9 pages with 1 figure and 2 main results tables. V1 Preprin
Few-shot Bioacoustic Event Detection with Machine Learning Methods
Few-shot learning is a type of classification through which predictions are
made based on a limited number of samples for each class. This type of
classification is sometimes referred to as a meta-learning problem, in which
the model learns how to learn to identify rare cases. We seek to extract
information from five exemplar vocalisations of mammals or birds and detect and
classify these sounds in field recordings [2]. This task was provided in the
Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge of
2021. Rather than utilize deep learning, as is most commonly done, we
formulated a novel solution using only machine learning methods. Various models
were tested, and it was found that logistic regression outperformed both linear
regression and template matching. However, all of these methods over-predicted
the number of events in the field recordings.Comment: 7 pages, 6 tables, 1 figur
Few-shot Class-incremental Audio Classification Using Adaptively-refined Prototypes
New classes of sounds constantly emerge with a few samples, making it
challenging for models to adapt to dynamic acoustic environments. This
challenge motivates us to address the new problem of few-shot class-incremental
audio classification. This study aims to enable a model to continuously
recognize new classes of sounds with a few training samples of new classes
while remembering the learned ones. To this end, we propose a method to
generate discriminative prototypes and use them to expand the model's
classifier for recognizing sounds of new and learned classes. The model is
first trained with a random episodic training strategy, and then its backbone
is used to generate the prototypes. A dynamic relation projection module
refines the prototypes to enhance their discriminability. Results on two
datasets (derived from the corpora of Nsynth and FSD-MIX-CLIPS) show that the
proposed method exceeds three state-of-the-art methods in average accuracy and
performance dropping rate.Comment: 5 pages,2 figures, Accepted by Interspeech 202
Few-shot Class-incremental Audio Classification Using Stochastic Classifier
It is generally assumed that number of classes is fixed in current audio
classification methods, and the model can recognize pregiven classes only. When
new classes emerge, the model needs to be retrained with adequate samples of
all classes. If new classes continually emerge, these methods will not work
well and even infeasible. In this study, we propose a method for fewshot
class-incremental audio classification, which continually recognizes new
classes and remember old ones. The proposed model consists of an embedding
extractor and a stochastic classifier. The former is trained in base session
and frozen in incremental sessions, while the latter is incrementally expanded
in all sessions. Two datasets (NS-100 and LS-100) are built by choosing samples
from audio corpora of NSynth and LibriSpeech, respectively. Results show that
our method exceeds four baseline ones in average accuracy and performance
dropping rate. Code is at https://github.com/vinceasvp/meta-sc.Comment: 5 pages, 3 figures, 4 tables. Accepted for publication in INTERSPEECH
202
Prototypical Networks for Domain Adaptation in Acoustic Scene Classification
Acoustic Scene Classification (ASC) refers to the task of assigning a semantic label to an audio stream that characterizes the environment in which it was recorded. In recent times, Deep Neural Networks (DNNs) have emerged as the model of choice for ASC. However, in real world scenarios, domain adaptation remains a persistent problem for ASC models. In the search for an optimal solution to the said problem, we explore a metric learning approach called prototypical networks using the TUT Urban Acoustic Scenes dataset, which consists of 10 different acoustic scenes recorded across 10 cities. In order to replicate the domain adaptation scenario, we divide the dataset into source domain data consisting of data samples from eight randomly selected cities and target domain data consisting of data from the remaining two cities. We evaluate the performance of the network against a selected baseline network under various experimental scenarios and based on the results we conclude that metric learning is a promising approach towards addressing the domain adaptation problem in ASC
Learning from Very Few Samples: A Survey
Few sample learning (FSL) is significant and challenging in the field of
machine learning. The capability of learning and generalizing from very few
samples successfully is a noticeable demarcation separating artificial
intelligence and human intelligence since humans can readily establish their
cognition to novelty from just a single or a handful of examples whereas
machine learning algorithms typically entail hundreds or thousands of
supervised samples to guarantee generalization ability. Despite the long
history dated back to the early 2000s and the widespread attention in recent
years with booming deep learning technologies, little surveys or reviews for
FSL are available until now. In this context, we extensively review 300+ papers
of FSL spanning from the 2000s to 2019 and provide a timely and comprehensive
survey for FSL. In this survey, we review the evolution history as well as the
current progress on FSL, categorize FSL approaches into the generative model
based and discriminative model based kinds in principle, and emphasize
particularly on the meta learning based FSL approaches. We also summarize
several recently emerging extensional topics of FSL and review the latest
advances on these topics. Furthermore, we highlight the important FSL
applications covering many research hotspots in computer vision, natural
language processing, audio and speech, reinforcement learning and robotic, data
analysis, etc. Finally, we conclude the survey with a discussion on promising
trends in the hope of providing guidance and insights to follow-up researches.Comment: 30 page
- …