55 research outputs found

    Music genre classification using On-line Dictionary Learning

    Get PDF
    In this paper, an approach for music genre classification based on sparse representation using MARSYAS features is proposed. The MARSYAS feature descriptor consisting of timbral texture, pitch and beat related features is used for the classification of music genre. On-line Dictionary Learning (ODL) is used to achieve sparse representation of the features for developing dictionaries for each musical genre. We demonstrate the efficacy of the proposed framework on the Latin Music Database (LMD) consisting of over 3000 tracks spanning 10 genres namely Axé, Bachata, Bolero, Forró, Gaúcha, Merengue, Pagode, Salsa, Sertaneja and Tango

    Music genre classification using On-line Dictionary Learning

    Get PDF
    In this paper, an approach for music genre classification based on sparse representation using MARSYAS features is proposed. The MARSYAS feature descriptor consisting of timbral texture, pitch and beat related features is used for the classification of music genre. On-line Dictionary Learning (ODL) is used to achieve sparse representation of the features for developing dictionaries for each musical genre. We demonstrate the efficacy of the proposed framework on the Latin Music Database (LMD) consisting of over 3000 tracks spanning 10 genres namely Axé, Bachata, Bolero, Forró, Gaúcha, Merengue, Pagode, Salsa, Sertaneja and Tango

    Classification of EMI discharge sources using time–frequency features and multi-class support vector machine

    Get PDF
    This paper introduces the first application of feature extraction and machine learning to Electromagnetic Interference (EMI) signals for discharge sources classification in high voltage power generating plants. This work presents an investigation on signals that represent different discharge sources, which are measured using EMI techniques from operating electrical machines within power plant. The analysis involves Time-Frequency image calculation of EMI signals using General Linear Chirplet Analysis (GLCT) which reveals both time and frequency varying characteristics. Histograms of uniform Local Binary Patterns (LBP) are implemented as a feature reduction and extraction technique for the classification of discharge sources using Multi-Class Support Vector Machine (MCSVM). The novelty that this paper introduces is the combination of GLCT and LBP applications to develop a new feature extraction algorithm applied to EMI signals classification. The proposed algorithm is demonstrated to be successful with excellent classification accuracy being achieved. For the first time, this work transfers expert's knowledge on EMI faults to an intelligent system which could potentially be exploited to develop an automatic condition monitoring system

    Ensemble of convolutional neural networks to improve animal audio classification

    Get PDF
    Abstract In this work, we present an ensemble for automated audio classification that fuses different types of features extracted from audio files. These features are evaluated, compared, and fused with the goal of producing better classification accuracy than other state-of-the-art approaches without ad hoc parameter optimization. We present an ensemble of classifiers that performs competitively on different types of animal audio datasets using the same set of classifiers and parameter settings. To produce this general-purpose ensemble, we ran a large number of experiments that fine-tuned pretrained convolutional neural networks (CNNs) for different audio classification tasks (bird, bat, and whale audio datasets). Six different CNNs were tested, compared, and combined. Moreover, a further CNN, trained from scratch, was tested and combined with the fine-tuned CNNs. To the best of our knowledge, this is the largest study on CNNs in animal audio classification. Our results show that several CNNs can be fine-tuned and fused for robust and generalizable audio classification. Finally, the ensemble of CNNs is combined with handcrafted texture descriptors obtained from spectrograms for further improvement of performance. The MATLAB code used in our experiments will be provided to other researchers for future comparisons at https://github.com/LorisNanni

    Machine Learning-based Classification of Birds through Birdsong

    Full text link
    Audio sound recognition and classification is used for many tasks and applications including human voice recognition, music recognition and audio tagging. In this paper we apply Mel Frequency Cepstral Coefficients (MFCC) in combination with a range of machine learning models to identify (Australian) birds from publicly available audio files of their birdsong. We present approaches used for data processing and augmentation and compare the results of various state of the art machine learning models. We achieve an overall accuracy of 91% for the top-5 birds from the 30 selected as the case study. Applying the models to more challenging and diverse audio files comprising 152 bird species, we achieve an accuracy of 58

    Mel-Frequency Cepstral Coefficients and Convolutional Neural Network for Genre Classification of Indigenous Nigerian Music

    Get PDF
    Music genre classification is a field of study within the broader domain of Music Information Retrieval (MIR) that is still an open problem. This study aims at classifying music by Nigerian artists into respective genres using Convolutional Neural Networks (CNNs) and audio features extracted from the songs. To achieve this, a dataset of 524 Nigerian songs was collected from different genres. Each downloaded music file was converted from standard MP3 to WAV format and then trimmed to 30 seconds. The Librosa sc library was used for the analysis, visualization and further pre-processing of the music file which includes converting the audio signals to Mel-frequency cepstral coefficients (MFCCs). The MFCCs were obtained by taking performing a Discrete Cosine Transform on the logarithm of the Mel-scale filtered power spectrum of the audio signals. CNN architecture with multiple convolutional and pooling layers was used to learn the relevant features and classify the genres. Six models were trained using a categorical cross-entropy loss function with different learning rates and optimizers. Performance of the models was evaluated using accuracy, precision, recall, and F1-score. The models returned varying results from the classification experiments but model 3 which was trained with an Adagrad optimizer and learning rate of 0.01 had accuracy and recall of 75.1% and 84%, respectively. The results from the study demonstrated the effectiveness of MFCC and CNNs in music genre classification particularly with indigenous Nigerian artists
    corecore