8 research outputs found

    Spectrogram classification using dissimilarity space

    Get PDF
    In this work, we combine a Siamese neural network and different clustering techniques to generate a dissimilarity space that is then used to train an SVM for automated animal audio classification. The animal audio datasets used are (i) birds and (ii) cat sounds, which are freely available. We exploit different clustering methods to reduce the spectrograms in the dataset to a number of centroids that are used to generate the dissimilarity space through the Siamese network. Once computed, we use the dissimilarity space to generate a vector space representation of each pattern, which is then fed into an support vector machine (SVM) to classify a spectrogram by its dissimilarity vector. Our study shows that the proposed approach based on dissimilarity space performs well on both classification problems without ad-hoc optimization of the clustering methods. Moreover, results show that the fusion of CNN-based approaches applied to the animal audio classification problem works better than the stand-alone CNNs

    Animal sound classification using dissimilarity spaces

    Get PDF
    The classifier system proposed in this work combines the dissimilarity spaces produced by a set of Siamese neural networks (SNNs) designed using four different backbones with different clustering techniques for training SVMs for automated animal audio classification. The system is evaluated on two animal audio datasets: one for cat and another for bird vocalizations. The proposed approach uses clustering methods to determine a set of centroids (in both a supervised and unsupervised fashion) from the spectrograms in the dataset. Such centroids are exploited to generate the dissimilarity space through the Siamese networks. In addition to feeding the SNNs with spectrograms, experiments process the spectrograms using the heterogeneous auto-similarities of characteristics. Once the similarity spaces are computed, each pattern is \u201cprojected\u201d into the space to obtain a vector space representation; this descriptor is then coupled to a support vector machine (SVM) to classify a spectrogram by its dissimilarity vector. Results demonstrate that the proposed approach performs competitively (without ad-hoc optimization of the clustering methods) on both animal vocalization datasets. To further demonstrate the power of the proposed system, the best standalone approach is also evaluated on the challenging Dataset for Environmental Sound Classification (ESC50) dataset

    One-shot Learning with Siamese Networks for Environmental Audio

    Get PDF
    In the recent years deep learning based approaches have dominated different types of classification problems. Usually these approaches require large amounts of training data to train a model capable of generalizing to any unseen data of the same type. However, in some applications it might be difficult to gather training data efficiently and it would be beneficial to classify new samples using only a few or even a single training example. For us humans the knowledge from previously learned concepts is relatively easy to transfer to unfamiliar concepts, therefore many researchers have experimented with this idea in machine learning classification tasks. The idea of only using a single labelled example to classify unseen data is known as one-shot learning and has been successful especially in the field of computer vision. Many of the modern approaches for one-shot learning utilize a special neural network architecture named siamese network. This architecture can be trained to predict similarities between inputs, and can be used for a metric-based approach to one-shot learning. Siamese networks have been used for different audio related tasks before, however their usage in one-shot learning for audio classification has received less attention compared to computer vision. The purpose of this thesis is to extend the idea of one-shot learning to environmental audio classification and see if this approach is feasible. The proposed system was trained and evaluated on the ESC dataset, consisting of 50 different environmental audio categories. The final one-shot evaluation was done to 5 completely unseen classes, using only a single example of each class when performing the classification. The results show that convolutional siamese networks are indeed a valid approach to the difficult one-shot classification task for environmental audio
    corecore