10 research outputs found

    Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets

    No full text
    In training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amount of labelled training data to achieve good quality performance, yet in practice it is difficult to collect enough samples for most classes of interest. In this paper, we propose factorising the final task of audio transcription into multiple intermediate tasks in order to improve the training performance when dealing with this kind of low-resource datasets. We evaluate three data-efficient approaches of training a stacked convolutional and recurrent neural network for the intermediate tasks. Our results show that different methods of training have different advantages and disadvantages

    NIPS4Bplus: Transcriptions of NIPS4B 2013 Bird Challenge Training Dataset

    No full text
    <div>Created By</div><div>-------------</div><div><br></div><div>Veronica Morfi (1), Dan Stowell (1) and Hanna Pamula (2).</div><div><br></div><div>(1): Machine Listening Lab, Centre for Digital Music (C4DM), Queen Mary University of London (QMUL), UK</div><div>(2): AGH University of Science and Technology, Department of Mechanics and Vibroacoustics, Kraków, Poland</div><div><br></div><div>Description</div><div>--------------</div><div><br></div><div>The zip file contains 674 individual recording temporal annotations for the training set of the NIPS4B 2013 dataset in the birdsong classifications task (original size of dataset is 687 recordings).</div><div><br></div><div>Task and dataset description can be found in: http://sabiod.univ-tln.fr/nips4b/challenge1.html </div><div>Donwload zip file of dataset and weak annotations at: http://sabiod.univ-tln.fr/nips4b/media/birds/NIPS4B_BIRD_CHALLENGE_TRAIN_TEST_WAV.tar.gz</div><div><br></div><div><br></div><div>Annotation Files</div><div>------------</div><div><br></div><div>Transcriptions were produced using Sonic Visualiser: https://www.sonicvisualiser.org/ by an experienced birdwatcher, Hanna Pamula.</div><div><br></div><div>Number of missing annotations: 13 (6 of these files contained sounds which could not be unambiguously labelled and the rest 7 of them only included insects)</div><div><br></div><div>The original (weak) labels provided during the NIPS4B 2013 challenge were used for guidance. However, some files were judged to have a slightly different set of species present than was given in the original metadata.</div><div><br></div><div>Extra Unknown label was added to the dataset for the vocalisations that couldn't be classified to a specific species. Also, extra Human label was added for a few recordings that had human sounds present in them.</div><div><br></div><div>Transcription format:</div><div>[Starting time (sec)],[Duration of event (sec)],[Label]</div

    Bird song comparison using deep learning trained from avian perceptual judgments

    No full text
    Our understanding of bird song, a model system for animal communication and the neurobiology of learning, depends critically on making reliable, validated comparisons between the complex multidimensional syllables that are used in songs. However, most assessments of song similarity are based on human inspection of spectrograms, or computational methods developed from human intuitions. Using a novel automated operant conditioning system, we collected a large corpus of zebra finches’ (Taeniopygia guttata) decisions about song syllable similarity. We use this dataset to compare and externally validate similarity algorithms in widely-used publicly available software (Raven, Sound Analysis Pro, Luscinia). Although these methods all perform better than chance, they do not closely emulate the avian assessments. We then introduce a novel deep learning method that can produce perceptual similarity judgements trained on such avian decisions. We find that this new method outperforms the established methods in accuracy and more closely approaches the avian assessments. Inconsistent (hence ambiguous) decisions are a common occurrence in animal behavioural data; we show that a modification of the deep learning training that accommodates these leads to the strongest performance. We argue this approach is the best way to validate methods to compare song similarity, that our dataset can be used to validate novel methods, and that the general approach can easily be extended to other species

    Machine Learning for Bird Song Learning (ML4BL) dataset

    No full text
    General description This dataset contains Zebra Finch decisions about perceptual similarity on song units. All the data and files are used for reproducing the results of the paper \u27Bird song comparison using deep learning trained from avian perceptual judgments\u27 by the same authors. Git repo on Zenodo: https://doi.org/10.5281/zenodo.5545932 Git repo access: https://github.com/veronicamorfi/ml4bl/tree/v1.0.0 Directory organisation: ML4BL_ZF |_files |_Final_probes_20200816.csv - all trials and decisions of the birds (aviary 1 cycle 1 data are removed from experiments) |_luscinia_triplets_filtered.csv - triplets to use for training |_mean_std_luscinia_pretraining.pckl - mean and std of luscinia triplets used for trianing |_*_cons_* - % side consistency on triplets (train/test) - train set contains both train and val splits |_*_gt_* - cycle accuracy for triplets of the specific bird (train/test) - train set contains both train and val splits |_*_trials_* - number of decisions made for a triplet (train/test) - train set contains both train and val splits |_*_triplets_* - triplet information (aviary_cycle-acc_birdID, POS, NEG, ANC) (train/test) - train set contains both train and val splits |_*_low*_ - low-margin (ambiguous) triplets (train/val/test) |_*_high_ - high-margin (unambiguous) triplets (train/val/test) |_*_cycle_bird_keys_* - unique aviary_cycle-acc_birdID keys (train/test) - train set contains both train and val splits |_TunedLusciniaV1e.csv - pairwise distance of two recordings computed by Luscinia |_training_setup_1_ordered_acc_single_cons_50_70_trials.pckl - dictionary containing everything needed for training the model (keys: \u27train_keys\u27, \u27train_triplets\u27, \u27val_keys\u27, \u27vali_triplets\u27, \u27test_triplets\u27, \u27test_keys\u27, \u27train_mean\u27, \u27train_std\u27) |_melspecs - *.pckl - melspectrograms of recordings |_wavs - *wav - recordings |_README.txt Recordings 887 syllables extracted from zebra finch song recordings, with a sampling rate of 48kHz and high pass filtered (100Hz), with a 20ms intro/outro fade. Decisions Triplets were created from the recordings and the birds made side based decisions about their similarity (see \u27Bird song comparison using deep learning trained from avian perceptual judgments\u27 for further information). Training dictionary Information Dictionary keys: \u27train_keys\u27, \u27train_triplets\u27, \u27val_keys\u27, \u27vali_triplets\u27, \u27test_triplets\u27, \u27test_keys\u27, \u27train_mean\u27, \u27train_std\u27 train_triplets/vali_triplets/test_triplets: Aviary_Cycle_birdID, POS, NEG, ANC, Decisions, Cycle_ACC(%), Consistency(%) train_keys/val_keys/test_keys: Aviary_Cycle_birdID train_mean/train_std: shape: (1, mel_bins) Open Access This dataset is available under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. Contact info Please send any questions about the recordings to: Lies Zandberg: [email protected] Please send any feedback or questions about the code and the rest of the data to: Veronica Morfi: [email protected]

    Few-shot bioacoustic event detection: A new task at the DCASE 2021 challenge

    No full text
    Few-shot bioacoustic event detection is a novel area of research that emerged from a need in monitoring biodiversity and animal behaviour: to annotate long recordings, that experts usually can only provide very few annotations for due to the task being specialist and labour-intensive. This paper presents an overview of the first evaluation of few-shot bioacoustic sound event detection, organised as a task of the DCASE 2021 Challenge. A set of datasets consisting of mammal and bird multi-species recordings in the wild, along with class-specific temporal annotations, was compiled for the challenge, for the purpose of training learning-based approaches and for evaluation of the submissions in a few-shot labelled dataset. This paper describes the task in detail, the datasets that were used for both development and evaluation of the submitted systems, along with how system performance was ranked and the characteristics of the best-performing submissions. Some common strategies that the participating teams used are discussed, including input features, model architectures, transferring of prior knowledge, use of public datasets and data augmentation. Ranking for the challenge was based on overall performance of the evaluation set, however in this paper we also present results on each of the subsets of the evaluation set. This new analysis reveals submissions that performed better on specific subsets and gives an insight as to characteristics of the subsets that can influence performance
    corecore