40 research outputs found

    Hypernetworks for sound event detection: a proof-of-concept

    Get PDF
    Polyphonic sound event detection (SED) involves the prediction of sound events present in an audio recording along with their onset and offset times. Recently, Deep Neural Networks, specifically convolutional recurrent neural networks (CRNN) have achieved impressive results for this task. The convolution part of the architecture is used to extract translational invariant features from the input and the recurrent part learns the underlying temporal relationship between audio frames. Recent studies showed that the weight sharing paradigm of recurrent networks might be a hindering factor in certain kinds of time series data, specifically where there is a temporal conditional shift, i.e. the conditional distribution of a label changes across the temporal scale. This warrants a relevant question - is there a similar phenomenon in polyphonic sound events due to dynamic polyphony level across the temporal axis? In this work, we explore this question and inquire if relaxed weight sharing improves performance of a CRNN for polyphonic SED. We propose to use hypernetworks to relax weight sharing in the recurrent part and show that the CRNN’s performance is improved by ~3% across two datasets, thus paving the way for further exploration of the existence of temporal conditional shift for polyphonic SED

    Leveraging label hierarchies for few-shot everyday sound recognition

    Get PDF
    Everyday sounds cover a considerable range of sound categories in our daily life, yet for certain sound categories it is hard to collect sufficient data. Although existing works have applied few-shot learning paradigms to sound recognition successfully, most of them have not exploited the relationship between labels in audio taxonomies. This work adopts a hierarchical prototypical network to leverage the knowledge rooted in audio taxonomies. Specifically, a VGG-like convolutional neural network is used to extract acoustic features. Prototypical nodes are then calculated in each level of the tree structure. A multi-level loss is obtained by multiplying a weight decay with multiple losses. Experimental results demonstrate our hierarchical prototypical networks not only outperform prototypical networks with no hierarchy information but yield a better result than other state-of-the art algorithms. Our code is available in: https://github.com/JinhuaLiang/HPNs_taggin

    Perceptual musical similarity metric learning with graph neural networks

    Get PDF
    Sound retrieval for assisted music composition depends on evaluating similarity between musical instrument sounds, which is partly influenced by playing techniques. Previous methods utilizing Euclidean nearest neighbours over acoustic features show some limitations in retrieving sounds sharing equivalent timbral properties, but potentially generated using a different instrument, playing technique, pitch or dynamic. In this paper, we present a metric learning system designed to approximate human similarity judgments between extended musical playing techniques using graph neural networks. Such structure is a natural candidate for solving similarity retrieval tasks, yet have seen little application in modelling perceptual music similarity. We optimize a Graph Convolutional Network (GCN) over acoustic features via a proxy metric learning loss to learn embeddings that reflect perceptual similarities. Specifically, we construct the graph's adjacency matrix from the acoustic data manifold with an example-wise adaptive k-nearest neighbourhood graph: Adaptive Neighbourhood Graph Neural Network (AN-GNN). Our approach achieves 96.4% retrieval accuracy compared to 38.5% with a Euclidean metric and 86.0% with a multilayer perceptron (MLP), while effectively considering retrievals from distinct playing techniques to the query example

    The αGal Epitope of the Histo-Blood Group Antigen Family Is a Ligand for Bovine Norovirus Newbury2 Expected to Prevent Cross-Species Transmission

    Get PDF
    Among Caliciviridae, the norovirus genus encompasses enteric viruses that infect humans as well as several animal species, causing gastroenteritis. Porcine strains are classified together with human strains within genogroup II, whilst bovine norovirus strains represent genogroup III. Various GI and GII human strains bind to carbohydrates of the histo-blood group family which may be shared among mammalian species. Genetic relatedness of human and animal strains as well as the presence of potentially shared ligands raises the possibility of norovirus cross-species transmission. In the present study, we identified a carbohydrate ligand for the prototype bovine norovirus strain Bo/Newbury2/76/UK (NB2). Attachment of virus-like particles (VLPs) of the NB2 strain to bovine gut tissue sections showed a complete match with the staining by reagents recognizing the Galα1,3 motif. Alpha-galactosidase treatment confirmed involvement of a terminal alpha-linked galactose. Specific binding of VLPs to the αGal epitope (Galα3Galβ4GlcNAcβ-R) was observed. The binding of Galα3GalαOMe to rNB2 VLPs was characterized at atomic resolution employing saturation transfer difference (STD) NMR experiments. Transfection of human cells with an α1,3galactosyltransferase cDNA allowed binding of NB2 VLPs, whilst inversely, attachment to porcine vascular endothelial cells was lost when the cells originated from an α1,3galactosyltransferase KO animal. The αGal epitope is expressed in all mammalian species with the exception of the Hominidaea family due to the inactivation of the α1,3galactosyltransferase gene (GGTA1). Accordingly, the NB2 carbohydrate ligand is absent from human tissues. Although expressed on porcine vascular endothelial cells, we observed that unlike in cows, it is not present on gut epithelial cells, suggesting that neither man nor pig could be infected by the NB2 bovine strain

    Genome sequences of a novel Vietnamese bat bunyavirus

    Get PDF
    To document the viral zoonotic risks in Vietnam, fecal samples were systematically collected from a number of mammals in southern Vietnam and subjected to agnostic deep sequencing. We describe here novel Vietnamese bunyavirus sequences detected in bat feces. The complete L and S segments from 14 viruses were determined

    Complete genome characterization of two wild-type measles viruses from Vietnamese infants during the 2014 outbreak

    Get PDF
    A large measles virus outbreak occurred across Vietnam in 2014. We identified and obtained complete measles virus genomes in stool samples collected from two diarrheal pediatric patients in Dong Thap Province. These are the first complete genome sequences of circulating measles viruses in Vietnam during the 2014 measles outbreak

    ATGNN: audio tagging graph neural network

    No full text
    Deep learning models such as CNNs and Transformers have achieved impressive performance for end-to-end audio tagging. Recent works have shown that despite stacking multiple layers, the receptive field of CNNs remains severely limited. Transformers on the other hand are able to map global context through self-attention, but treat the spectrogram as a sequence of patches which is not flexible enough to capture irregular audio objects. In this work, we treat the spectrogram in a more flexible way by considering it as graph structure and process it with a novel graph neural architecture called ATGNN. ATGNN not only combines the capability of CNNs with the global information sharing ability of Graph Neural Networks, but also maps semantic relationships between learnable class embeddings and corresponding spectrogram regions. We evaluate ATGNN on two audio tagging tasks, where it achieves 0.585 mAP on the FSD50K dataset and 0.335 mAP on the AudioSet-balanced dataset, achieving comparable results to Transformer based models with significantly lower number of learnable parameters

    Learning from taxonomy: multi-label few-shot classification for everyday sound recognition

    No full text
    Humans categorise and structure perceived acoustic signals into hierarchies of auditory objects. The semantics of these objects are thus informative in sound classification, especially in few-shot scenarios. However, existing works have only represented audio semantics as binary labels (e.g., whether a recording contains dog barking or not), and thus failed to learn a more generic semantic relationship among labels. In this work, we introduce an ontology-aware framework to train multi-label few-shot audio networks with both relative and absolute relationships in an audio taxonomy. Specifically, we propose label-dependent prototypical networks (LaD-ProtoNet) to learn coarse-to-fine acoustic patterns by exploiting direct connections between parent and children classes of sound events. We also present a label smoothing method to take into account the taxonomic knowledge by taking into account absolute distance between two labels w.r.t the taxonomy. For evaluation in a real-world setting, we curate a new dataset, namely FSD-FS, based on the FSD50K dataset and compare the proposed methods and other few-shot classifiers using this dataset. Experiments demonstrate that the proposed method outperforms non-ontology-based methods on the FSD-FS dataset
    corecore