11 research outputs found

    Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets

    No full text
    In training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amount of labelled training data to achieve good quality performance, yet in practice it is difficult to collect enough samples for most classes of interest. In this paper, we propose factorising the final task of audio transcription into multiple intermediate tasks in order to improve the training performance when dealing with this kind of low-resource datasets. We evaluate three data-efficient approaches of training a stacked convolutional and recurrent neural network for the intermediate tasks. Our results show that different methods of training have different advantages and disadvantages

    Performance MIDI-to-Score Conversion by Neural Beat Tracking: Pretrained models

    No full text
    <p>These are the<strong> pre-trained models</strong> for the code in our paper:</p> <ul> <li> <p>Lele Liu, Qiuqiang Kong, Veronica Morfi and Emmanouil Benetos, “<a href="https://www.turing.ac.uk/sites/default/files/2022-09/midi_quantisation_paper_ismir_2022_0.pdf">Performance MIDI-to-Score Conversion by Neural Beat Tracking</a>,” in Proceedings of the 23rd International Society for Music Information Retrieval Conference, Bengaluru, India, Dec 2022.</p> </li> </ul> <p>The code release is available at <a href="https://github.com/cheriell/PM2S">https://github.com/cheriell/PM2S</a></p> <p><strong>Funding</strong>: L. Liu conducted this work as an intern at ByteDance. L. Liu is a research student at the UKRI Centre for Doctoral Training in Artificial Intelligence and Music, supported jointly by the China Scholarship Council and Queen Mary University of London. The work of L. Liu is supported by The Alan Turing Institute through an Enrichment Scheme.</p&gt

    NIPS4Bplus: Transcriptions of NIPS4B 2013 Bird Challenge Training Dataset

    No full text
    <div>Created By</div><div>-------------</div><div><br></div><div>Veronica Morfi (1), Dan Stowell (1) and Hanna Pamula (2).</div><div><br></div><div>(1): Machine Listening Lab, Centre for Digital Music (C4DM), Queen Mary University of London (QMUL), UK</div><div>(2): AGH University of Science and Technology, Department of Mechanics and Vibroacoustics, Kraków, Poland</div><div><br></div><div>Description</div><div>--------------</div><div><br></div><div>The zip file contains 674 individual recording temporal annotations for the training set of the NIPS4B 2013 dataset in the birdsong classifications task (original size of dataset is 687 recordings).</div><div><br></div><div>Task and dataset description can be found in: http://sabiod.univ-tln.fr/nips4b/challenge1.html </div><div>Donwload zip file of dataset and weak annotations at: http://sabiod.univ-tln.fr/nips4b/media/birds/NIPS4B_BIRD_CHALLENGE_TRAIN_TEST_WAV.tar.gz</div><div><br></div><div><br></div><div>Annotation Files</div><div>------------</div><div><br></div><div>Transcriptions were produced using Sonic Visualiser: https://www.sonicvisualiser.org/ by an experienced birdwatcher, Hanna Pamula.</div><div><br></div><div>Number of missing annotations: 13 (6 of these files contained sounds which could not be unambiguously labelled and the rest 7 of them only included insects)</div><div><br></div><div>The original (weak) labels provided during the NIPS4B 2013 challenge were used for guidance. However, some files were judged to have a slightly different set of species present than was given in the original metadata.</div><div><br></div><div>Extra Unknown label was added to the dataset for the vocalisations that couldn't be classified to a specific species. Also, extra Human label was added for a few recordings that had human sounds present in them.</div><div><br></div><div>Transcription format:</div><div>[Starting time (sec)],[Duration of event (sec)],[Label]</div

    Bird song comparison using deep learning trained from avian perceptual judgments

    No full text
    Our understanding of bird song, a model system for animal communication and the neurobiology of learning, depends critically on making reliable, validated comparisons between the complex multidimensional syllables that are used in songs. However, most assessments of song similarity are based on human inspection of spectrograms, or computational methods developed from human intuitions. Using a novel automated operant conditioning system, we collected a large corpus of zebra finches’ (Taeniopygia guttata) decisions about song syllable similarity. We use this dataset to compare and externally validate similarity algorithms in widely-used publicly available software (Raven, Sound Analysis Pro, Luscinia). Although these methods all perform better than chance, they do not closely emulate the avian assessments. We then introduce a novel deep learning method that can produce perceptual similarity judgements trained on such avian decisions. We find that this new method outperforms the established methods in accuracy and more closely approaches the avian assessments. Inconsistent (hence ambiguous) decisions are a common occurrence in animal behavioural data; we show that a modification of the deep learning training that accommodates these leads to the strongest performance. We argue this approach is the best way to validate methods to compare song similarity, that our dataset can be used to validate novel methods, and that the general approach can easily be extended to other species

    Machine Learning for Bird Song Learning (ML4BL) dataset

    No full text
    General description This dataset contains Zebra Finch decisions about perceptual similarity on song units. All the data and files are used for reproducing the results of the paper \u27Bird song comparison using deep learning trained from avian perceptual judgments\u27 by the same authors. Git repo on Zenodo: https://doi.org/10.5281/zenodo.5545932 Git repo access: https://github.com/veronicamorfi/ml4bl/tree/v1.0.0 Directory organisation: ML4BL_ZF |_files |_Final_probes_20200816.csv - all trials and decisions of the birds (aviary 1 cycle 1 data are removed from experiments) |_luscinia_triplets_filtered.csv - triplets to use for training |_mean_std_luscinia_pretraining.pckl - mean and std of luscinia triplets used for trianing |_*_cons_* - % side consistency on triplets (train/test) - train set contains both train and val splits |_*_gt_* - cycle accuracy for triplets of the specific bird (train/test) - train set contains both train and val splits |_*_trials_* - number of decisions made for a triplet (train/test) - train set contains both train and val splits |_*_triplets_* - triplet information (aviary_cycle-acc_birdID, POS, NEG, ANC) (train/test) - train set contains both train and val splits |_*_low*_ - low-margin (ambiguous) triplets (train/val/test) |_*_high_ - high-margin (unambiguous) triplets (train/val/test) |_*_cycle_bird_keys_* - unique aviary_cycle-acc_birdID keys (train/test) - train set contains both train and val splits |_TunedLusciniaV1e.csv - pairwise distance of two recordings computed by Luscinia |_training_setup_1_ordered_acc_single_cons_50_70_trials.pckl - dictionary containing everything needed for training the model (keys: \u27train_keys\u27, \u27train_triplets\u27, \u27val_keys\u27, \u27vali_triplets\u27, \u27test_triplets\u27, \u27test_keys\u27, \u27train_mean\u27, \u27train_std\u27) |_melspecs - *.pckl - melspectrograms of recordings |_wavs - *wav - recordings |_README.txt Recordings 887 syllables extracted from zebra finch song recordings, with a sampling rate of 48kHz and high pass filtered (100Hz), with a 20ms intro/outro fade. Decisions Triplets were created from the recordings and the birds made side based decisions about their similarity (see \u27Bird song comparison using deep learning trained from avian perceptual judgments\u27 for further information). Training dictionary Information Dictionary keys: \u27train_keys\u27, \u27train_triplets\u27, \u27val_keys\u27, \u27vali_triplets\u27, \u27test_triplets\u27, \u27test_keys\u27, \u27train_mean\u27, \u27train_std\u27 train_triplets/vali_triplets/test_triplets: Aviary_Cycle_birdID, POS, NEG, ANC, Decisions, Cycle_ACC(%), Consistency(%) train_keys/val_keys/test_keys: Aviary_Cycle_birdID train_mean/train_std: shape: (1, mel_bins) Open Access This dataset is available under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. Contact info Please send any questions about the recordings to: Lies Zandberg: [email protected] Please send any feedback or questions about the code and the rest of the data to: Veronica Morfi: [email protected]
    corecore