25 research outputs found

    Predominant Musical Instrument Classification based on Spectral Features

    Full text link
    This work aims to examine one of the cornerstone problems of Musical Instrument Retrieval (MIR), in particular, instrument classification. IRMAS (Instrument recognition in Musical Audio Signals) data set is chosen for this purpose. The data includes musical clips recorded from various sources in the last century, thus having a wide variety of audio quality. We have presented a very concise summary of past work in this domain. Having implemented various supervised learning algorithms for this classification task, SVM classifier has outperformed the other state-of-the-art models with an accuracy of 79%. We also implemented Unsupervised techniques out of which Hierarchical Clustering has performed well.Comment: Appeared in Proceedings of SPIN 202

    Audio Mixing using Image Neural Style Transfer Networks

    Get PDF
    Image style transfer networks are used to blend images, producing images that are a mix of source images. The process is based on controlled extraction of style and content aspects of images, using pre-trained Convolutional Neural Networks (CNNs). Our interest lies in adopting these image style transfer networks for the purpose of transforming sounds. Audio signals can be presented as grey-scale images of audio spectrograms. The purpose of our work is to investigate whether audio spectrogram inputs can be used with image neural transfer networks to produce new sounds. Using musical instrument sounds as source sounds, we apply and compare three existing image neural style transfer networks for the task of sound mixing. Our evaluation shows that all three networks are successful in producing consistent, new sounds based on the two source sounds. We use classification models to demonstrate that the new audio signals are consistent and distinguishable from the source instrument sounds. We further apply t-SNE cluster visualisation to visualise the feature maps of the new sounds and original source sounds, confirming that they form different sound groups from the source sounds. Our work paves the way to using CNNs for creative and targeted production of new sounds from source sounds, with specified source qualities, including pitch and timbre

    Sound Transformation: Applying Image Neural Style Transfer Networks to Audio Spectrograms

    Get PDF
    Image style transfer networks are used to blend images, producing images that are a mix of source images. The process is based on controlled extraction of style and content aspects of images, using pre-trained Convolutional Neural Networks (CNNs). Our interest lies in adopting these image style transfer networks for the purpose of transforming sounds. Audio signals can be presented as grey-scale images of audio spectrograms. The purpose of our work is to investigate whether audio spectrogram inputs can be used with image neural transfer networks to produce new sounds. Using musical instrument sounds as source sounds, we apply and compare three existing image neural style transfer networks for the task of sound mixing. Our evaluation shows that all three networks are successful in producing consistent, new sounds based on the two source sounds. We use classification models to demonstrate that the new audio signals are consistent and distinguishable from the source instrument sounds. We further apply t-SNE cluster visualisation to visualise the feature maps of the new sounds and original source sounds, confirming that they form different sound groups from the source sounds. Our work paves the way to using CNNs for creative and targeted production of new sounds from source sounds, with specified source qualities, including pitch and timbre

    A new feature-based wavelet completed local ternary pattern (FEAT-WCLTP) for texture and medical image classification

    Get PDF
    Nowadays, texture image descriptors are used in many important real-life applications. The use of texture analysis in texture and medical image classification has attracted considerable attention. Local Binary Patterns (LBP) is one of the simplest yet eff ective texture descriptors. But it has some limitations that may affect its accuracy. Hence, different variants of LBP were proposed to overcome LBP’s drawbacks and enhance its classification accuracy. Completed local ternary pattern (CLTP) is one of the significant LBP variants. However, CLTP suffers from two main limitations: the selection of the threshold value is manually based and the high dimensionality which is negatively affected the descriptor performance and leads to high computations. This research aims to improve the classification accuracy of CLTP and overcome the computational limitation by proposing new descriptors inspired by CLTP. Therefore, this research introduces two contributions: The first one is a proposed new descriptor that integrates redundant discrete wavelet transform (RDWT) with the original CLTP, namely, wavelet completed local ternary pattern (WCLTP). Extracting CLTP in wavelet transform will help increase the classification accuracy due to the shift invariant property of RDWT. Firstly, the image is decomposed into four sub-bands (LL, LH, HL, HH) by using RDWT. Then, CLTP is extracted based on the LL wavelet coefficients. The latter one is the reduction in the dimensionality of WCLTP by reducing its size and a proposed new texture descriptor, namely, feature-based wavelet completed local ternary pattern (FeatWCLTP). The proposed Feat-WCLTP can enhance CLTP’s performance and reduce high dimensionality. The mean and variance of the values of the selected texture pattern are used instead of the normal magnitude texture descriptor of CLTP. The performance of the proposed WCLTP and Feat-WCLTP was evaluated using four textures (i.e. OuTex, CUReT, UIUC and Kylberg) and two medical (i.e. 2D HeLa and Breast Cancer) datasets then compared with several well-known LBP variants. The proposed WCLTP outperformed the previous descriptors and achieved the highest classification accuracy in all experiments. The results for the texture dataset are 99.35% in OuTex, 96.57% in CUReT, 94.80% in UIUC and 99.88% in the Kylberg dataset. The results for the medical dataset are 84.19% in the 2D HeLa dataset and 92.14% in the Breast Cancer dataset. The proposed Feat-WCLTP not only overcomes the dimensionality problem but also considerably improves the classification accuracy. The results for Feat-WCLTP for texture dataset are 99.66% in OuTex, 96.89% in CUReT, 95.23% in UIUC and 99.92% in the Kylberg dataset. The results for the medical dataset are 84.42% in the 2D HeLa dataset and 89.12% in the Breast Cancer dataset. Moreover, the proposed Feat-WCLTP reduces the size of the feature vector for texture pattern (1,8) to 160 bins instead of 400 bins in WCLTP. The proposed WCLTP and Feat-WCLTP have better classification accuracy and dimensionality than the original CLTP

    Extended playing techniques: The next milestone in musical instrument recognition

    Full text link
    The expressive variability in producing a musical note conveys information essential to the modeling of orchestration and style. As such, it plays a crucial role in computer-assisted browsing of massive digital music corpora. Yet, although the automatic recognition of a musical instrument from the recording of a single "ordinary" note is considered a solved problem, automatic identification of instrumental playing technique (IPT) remains largely underdeveloped. We benchmark machine listening systems for query-by-example browsing among 143 extended IPTs for 16 instruments, amounting to 469 triplets of instrument, mute, and technique. We identify and discuss three necessary conditions for significantly outperforming the traditional mel-frequency cepstral coefficient (MFCC) baseline: the addition of second-order scattering coefficients to account for amplitude modulation, the incorporation of long-range temporal dependencies, and metric learning using large-margin nearest neighbors (LMNN) to reduce intra-class variability. Evaluating on the Studio On Line (SOL) dataset, we obtain a precision at rank 5 of 99.7% for instrument recognition (baseline at 89.0%) and of 61.0% for IPT recognition (baseline at 44.5%). We interpret this gain through a qualitative assessment of practical usability and visualization using nonlinear dimensionality reduction.Comment: 10 pages, 9 figures. The source code to reproduce the experiments of this paper is made available at: https://www.github.com/mathieulagrange/dlfm201

    AI and Tempo Estimation: A Review

    Full text link
    The author's goal in this paper is to explore how artificial intelligence (AI) has been utilised to inform our understanding of and ability to estimate at scale a critical aspect of musical creativity - musical tempo. The central importance of tempo to musical creativity can be seen in how it is used to express specific emotions (Eerola and Vuoskoski 2013), suggest particular musical styles (Li and Chan 2011), influence perception of expression (Webster and Weir 2005) and mediate the urge to move one's body in time to the music (Burger et al. 2014). Traditional tempo estimation methods typically detect signal periodicities that reflect the underlying rhythmic structure of the music, often using some form of autocorrelation of the amplitude envelope (Lartillot and Toiviainen 2007). Recently, AI-based methods utilising convolutional or recurrent neural networks (CNNs, RNNs) on spectral representations of the audio signal have enjoyed significant improvements in accuracy (Aarabi and Peeters 2022). Common AI-based techniques include those based on probability (e.g., Bayesian approaches, hidden Markov models (HMM)), classification and statistical learning (e.g., support vector machines (SVM)), and artificial neural networks (ANNs) (e.g., self-organising maps (SOMs), CNNs, RNNs, deep learning (DL)). The aim here is to provide an overview of some of the more common AI-based tempo estimation algorithms and to shine a light on notable benefits and potential drawbacks of each. Limitations of AI in this field in general are also considered, as is the capacity for such methods to account for idiosyncrasies inherent in tempo perception, i.e., how well AI-based approaches are able to think and act like humans.Comment: 9 page

    APPRENTISSAGE PROFOND POUR LA RECONNAISSANCE EN TEMPS REEL DES MODES DE JEU INSTRUMENTAUX

    Get PDF
    International audienceAu cours des dernières années, l'apprentissage profond s'est établi comme la nouvelle méthode de référence pour les problèmes de classification audio et notamment la reconnaissance d'instruments. Cependant, ces modèles ne traitent généralement pas la classification de modes de jeux avancés, question pourtant centrale dans la composition contemporaine. Les quelques études réalisées se cantonnent à une évaluation sur une seule banque de sons, dont rien n'assure la généralisation sur des données réelles. Dans cet article, nous étendons les méthodes de l'état de l'art à la classification de modes de jeu instrumentaux en temps réel à partir d'enregistrements de solistes. Nous montrons qu'une combinaison de réseaux convolutionnels (CNN) et récurrents (RNN) permet d'obtenir d'excellents résultats sur un corpus homogène provenant de 5 banques de sons. Toutefois, leur performance s'affaiblit sensiblement sur un corpus hétérogène, ce qui pourrait indiquer une faible capacité à généraliser à des données réelles. Nous proposons des pistes pour résoudre ce problème. Enfin, nous détaillons plusieurs utilisations possibles de nos modèles dans le cadre de systèmes interactifs
    corecore