217 research outputs found

    A Feature Learning Siamese Model for Intelligent Control of the Dynamic Range Compressor

    Full text link
    In this paper, a siamese DNN model is proposed to learn the characteristics of the audio dynamic range compressor (DRC). This facilitates an intelligent control system that uses audio examples to configure the DRC, a widely used non-linear audio signal conditioning technique in the areas of music production, speech communication and broadcasting. Several alternative siamese DNN architectures are proposed to learn feature embeddings that can characterise subtle effects due to dynamic range compression. These models are compared with each other as well as handcrafted features proposed in previous work. The evaluation of the relations between the hyperparameters of DNN and DRC parameters are also provided. The best model is able to produce a universal feature embedding that is capable of predicting multiple DRC parameters simultaneously, which is a significant improvement from our previous research. The feature embedding shows better performance than handcrafted audio features when predicting DRC parameters for both mono-instrument audio loops and polyphonic music pieces.Comment: 8 pages, accepted in IJCNN 201

    Mel-Frequency Cepstral Coefficients and Convolutional Neural Network for Genre Classification of Indigenous Nigerian Music

    Get PDF
    Music genre classification is a field of study within the broader domain of Music Information Retrieval (MIR) that is still an open problem. This study aims at classifying music by Nigerian artists into respective genres using Convolutional Neural Networks (CNNs) and audio features extracted from the songs. To achieve this, a dataset of 524 Nigerian songs was collected from different genres. Each downloaded music file was converted from standard MP3 to WAV format and then trimmed to 30 seconds. The Librosa sc library was used for the analysis, visualization and further pre-processing of the music file which includes converting the audio signals to Mel-frequency cepstral coefficients (MFCCs). The MFCCs were obtained by taking performing a Discrete Cosine Transform on the logarithm of the Mel-scale filtered power spectrum of the audio signals. CNN architecture with multiple convolutional and pooling layers was used to learn the relevant features and classify the genres. Six models were trained using a categorical cross-entropy loss function with different learning rates and optimizers. Performance of the models was evaluated using accuracy, precision, recall, and F1-score. The models returned varying results from the classification experiments but model 3 which was trained with an Adagrad optimizer and learning rate of 0.01 had accuracy and recall of 75.1% and 84%, respectively. The results from the study demonstrated the effectiveness of MFCC and CNNs in music genre classification particularly with indigenous Nigerian artists

    Using Principal Paths to Walk Through Music and Visual Art Style Spaces Induced by Convolutional Neural Networks

    Get PDF
    AbstractComputational intelligence, particularly deep learning, offers powerful tools for discriminating and generating samples such as images. Deep learning methods have been used in different artistic contexts for neural style transfer, artistic style recognition, and musical genre recognition. Using a constrained manifold analysis protocol, we discuss to what extent spaces induced by deep-learning convolutional neural networks can capture historical/stylistic progressions in music and visual art. We use a path-finding algorithm, called principal path, to move from one point to another. We apply it to the vector space induced by convolutional neural networks. We perform experiments with visual artworks and songs, considering a subset of classes. Within this simplified scenario, we recover a reasonable historical/stylistic progression in several cases. We use the principal path algorithm to conduct an evolutionary analysis of vector spaces induced by convolutional neural networks. We perform several experiments in the visual art and music spaces. The principal path algorithm finds reasonable connections between visual artworks and songs from different styles/genres with respect to the historical evolution when a subset of classes is considered. This approach could be used in many areas to extract evolutionary information from an arbitrary high-dimensional space and deliver interesting cognitive insights

    Convolution channel separation and frequency sub-bands aggregation for music genre classification

    Full text link
    In music, short-term features such as pitch and tempo constitute long-term semantic features such as melody and narrative. A music genre classification (MGC) system should be able to analyze these features. In this research, we propose a novel framework that can extract and aggregate both short- and long-term features hierarchically. Our framework is based on ECAPA-TDNN, where all the layers that extract short-term features are affected by the layers that extract long-term features because of the back-propagation training. To prevent the distortion of short-term features, we devised the convolution channel separation technique that separates short-term features from long-term feature extraction paths. To extract more diverse features from our framework, we incorporated the frequency sub-bands aggregation method, which divides the input spectrogram along frequency bandwidths and processes each segment. We evaluated our framework using the Melon Playlist dataset which is a large-scale dataset containing 600 times more data than GTZAN which is a widely used dataset in MGC studies. As the result, our framework achieved 70.4% accuracy, which was improved by 16.9% compared to a conventional framework

    Deep Learning vs Spectral Clustering into an active clustering with pairwise constraints propagation

    Get PDF
    International audienceIn our data driven world, categorization is of major importance to help end-users and decision makers understanding information structures. Supervised learning techniques rely on annotated samples that are often difficult to obtain and training often overfits. On the other hand, unsupervised clustering techniques study the structure of the data without disposing of any training data. Given the difficulty of the task, supervised learning often outperforms unsupervised learning. A compromise is to use a partial knowledge, selected in a smart way, in order to boost performance while minimizing learning costs, what is called semi-supervised learning. In such use case, Spectral Clustering proved to be an efficient method. Also, Deep Learning outperformed several state of the art classification approaches and it is interesting to test it in our context. In this paper, we firstly introduce the concept of Deep Learning into an active semi-supervised clustering process and compare it with Spectral Clustering. Secondly, we introduce constraint propagation and demonstrate how it maximizes partitioning quality while reducing annotation costs. Experimental validation is conducted on two different real datasets. Results show the potential of the clustering methods
    • …
    corecore