4,388 research outputs found

    Regularized Contrastive Pre-training for Few-shot Bioacoustic Sound Detection

    Full text link
    Bioacoustic sound event detection allows for better understanding of animal behavior and for better monitoring biodiversity using audio. Deep learning systems can help achieve this goal, however it is difficult to acquire sufficient annotated data to train these systems from scratch. To address this limitation, the Detection and Classification of Acoustic Scenes and Events (DCASE) community has recasted the problem within the framework of few-shot learning and organize an annual challenge for learning to detect animal sounds from only five annotated examples. In this work, we regularize supervised contrastive pre-training to learn features that can transfer well on new target tasks with animal sounds unseen during training, achieving a high F-score of 61.52%(0.48) when no feature adaptation is applied, and an F-score of 68.19%(0.75) when we further adapt the learned features for each new target task. This work aims to lower the entry bar to few-shot bioacoustic sound event detection by proposing a simple and yet effective framework for this task, by also providing open-source code

    MetaAudio: A Few-Shot Audio Classification Benchmark

    Get PDF

    Leveraging label hierarchies for few-shot everyday sound recognition

    Get PDF
    Everyday sounds cover a considerable range of sound categories in our daily life, yet for certain sound categories it is hard to collect sufficient data. Although existing works have applied few-shot learning paradigms to sound recognition successfully, most of them have not exploited the relationship between labels in audio taxonomies. This work adopts a hierarchical prototypical network to leverage the knowledge rooted in audio taxonomies. Specifically, a VGG-like convolutional neural network is used to extract acoustic features. Prototypical nodes are then calculated in each level of the tree structure. A multi-level loss is obtained by multiplying a weight decay with multiple losses. Experimental results demonstrate our hierarchical prototypical networks not only outperform prototypical networks with no hierarchy information but yield a better result than other state-of-the art algorithms. Our code is available in: https://github.com/JinhuaLiang/HPNs_taggin

    MetaAudio: A Few-Shot Audio Classification Benchmark

    Get PDF
    Currently available benchmarks for few-shot learning (machine learning with few training examples) are limited in the domains they cover, primarily focusing on image classification. This work aims to alleviate this reliance on image-based benchmarks by offering the first comprehensive, public and fully reproducible audio based alternative, covering a variety of sound domains and experimental settings. We compare the few-shot classification performance of a variety of techniques on seven audio datasets (spanning environmental sounds to human-speech). Extending this, we carry out in-depth analyses of joint training (where all datasets are used during training) and cross-dataset adaptation protocols, establishing the possibility of a generalised audio few-shot classification algorithm. Our experimentation shows gradient-based meta-learning methods such as MAML and Meta-Curvature consistently outperform both metric and baseline methods. We also demonstrate that the joint training routine helps overall generalisation for the environmental sound databases included, as well as being a somewhat-effective method of tackling the cross-dataset/domain setting.Comment: 9 pages with 1 figure and 2 main results tables. V1 Preprin

    Few-shot Bioacoustic Event Detection with Machine Learning Methods

    Full text link
    Few-shot learning is a type of classification through which predictions are made based on a limited number of samples for each class. This type of classification is sometimes referred to as a meta-learning problem, in which the model learns how to learn to identify rare cases. We seek to extract information from five exemplar vocalisations of mammals or birds and detect and classify these sounds in field recordings [2]. This task was provided in the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge of 2021. Rather than utilize deep learning, as is most commonly done, we formulated a novel solution using only machine learning methods. Various models were tested, and it was found that logistic regression outperformed both linear regression and template matching. However, all of these methods over-predicted the number of events in the field recordings.Comment: 7 pages, 6 tables, 1 figur

    Few-shot Class-incremental Audio Classification Using Adaptively-refined Prototypes

    Full text link
    New classes of sounds constantly emerge with a few samples, making it challenging for models to adapt to dynamic acoustic environments. This challenge motivates us to address the new problem of few-shot class-incremental audio classification. This study aims to enable a model to continuously recognize new classes of sounds with a few training samples of new classes while remembering the learned ones. To this end, we propose a method to generate discriminative prototypes and use them to expand the model's classifier for recognizing sounds of new and learned classes. The model is first trained with a random episodic training strategy, and then its backbone is used to generate the prototypes. A dynamic relation projection module refines the prototypes to enhance their discriminability. Results on two datasets (derived from the corpora of Nsynth and FSD-MIX-CLIPS) show that the proposed method exceeds three state-of-the-art methods in average accuracy and performance dropping rate.Comment: 5 pages,2 figures, Accepted by Interspeech 202

    Few-shot Class-incremental Audio Classification Using Stochastic Classifier

    Full text link
    It is generally assumed that number of classes is fixed in current audio classification methods, and the model can recognize pregiven classes only. When new classes emerge, the model needs to be retrained with adequate samples of all classes. If new classes continually emerge, these methods will not work well and even infeasible. In this study, we propose a method for fewshot class-incremental audio classification, which continually recognizes new classes and remember old ones. The proposed model consists of an embedding extractor and a stochastic classifier. The former is trained in base session and frozen in incremental sessions, while the latter is incrementally expanded in all sessions. Two datasets (NS-100 and LS-100) are built by choosing samples from audio corpora of NSynth and LibriSpeech, respectively. Results show that our method exceeds four baseline ones in average accuracy and performance dropping rate. Code is at https://github.com/vinceasvp/meta-sc.Comment: 5 pages, 3 figures, 4 tables. Accepted for publication in INTERSPEECH 202

    Prototypical Networks for Domain Adaptation in Acoustic Scene Classification

    Get PDF
    Acoustic Scene Classification (ASC) refers to the task of assigning a semantic label to an audio stream that characterizes the environment in which it was recorded. In recent times, Deep Neural Networks (DNNs) have emerged as the model of choice for ASC. However, in real world scenarios, domain adaptation remains a persistent problem for ASC models. In the search for an optimal solution to the said problem, we explore a metric learning approach called prototypical networks using the TUT Urban Acoustic Scenes dataset, which consists of 10 different acoustic scenes recorded across 10 cities. In order to replicate the domain adaptation scenario, we divide the dataset into source domain data consisting of data samples from eight randomly selected cities and target domain data consisting of data from the remaining two cities. We evaluate the performance of the network against a selected baseline network under various experimental scenarios and based on the results we conclude that metric learning is a promising approach towards addressing the domain adaptation problem in ASC

    Learning from Very Few Samples: A Survey

    Full text link
    Few sample learning (FSL) is significant and challenging in the field of machine learning. The capability of learning and generalizing from very few samples successfully is a noticeable demarcation separating artificial intelligence and human intelligence since humans can readily establish their cognition to novelty from just a single or a handful of examples whereas machine learning algorithms typically entail hundreds or thousands of supervised samples to guarantee generalization ability. Despite the long history dated back to the early 2000s and the widespread attention in recent years with booming deep learning technologies, little surveys or reviews for FSL are available until now. In this context, we extensively review 300+ papers of FSL spanning from the 2000s to 2019 and provide a timely and comprehensive survey for FSL. In this survey, we review the evolution history as well as the current progress on FSL, categorize FSL approaches into the generative model based and discriminative model based kinds in principle, and emphasize particularly on the meta learning based FSL approaches. We also summarize several recently emerging extensional topics of FSL and review the latest advances on these topics. Furthermore, we highlight the important FSL applications covering many research hotspots in computer vision, natural language processing, audio and speech, reinforcement learning and robotic, data analysis, etc. Finally, we conclude the survey with a discussion on promising trends in the hope of providing guidance and insights to follow-up researches.Comment: 30 page
    • …
    corecore