196,639 research outputs found

    MusCaps: generating captions for music audio

    Get PDF
    Content-based music information retrieval has seen rapid progress with the adoption of deep learning. Current approaches to high-level music description typically make use of classification models, such as in auto tagging or genre and mood classification. In this work, we propose to address music description via audio captioning, defined as the task of generating a natural language description of music audio content in a human-like manner. To this end, we present the first music audio captioning model, MusCaps, consisting of an encoder-decoder with temporal attention. Our method combines convolutional and recurrent neural network architectures to jointly process audio-text inputs through a multimodal encoder and leverages pre-training on audio data to obtain representations that effectively capture and summarise musical features in the input. Evaluation of the generated captions through automatic metrics shows that our method outperforms a baseline designed for non-music audio captioning. Through an ablation study, we unveil that this performance boost can be mainly attributed to pre-training of the audio encoder, while other design choices – modality fusion, decoding strategy and the use of attention -- contribute only marginally. Our model represents a shift away from classification-based music description and combines tasks requiring both auditory and linguistic understanding to bridge the semantic gap in music information retrieval

    Unsupervised automatic music genre classification

    Get PDF
    Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia InformáticaIn this study we explore automatic music genre recognition and classification of digital music. Music has always been a reflection of culture di erences and an influence in our society. Today’s digital content development triggered the massive use of digital music. Nowadays,digital music is manually labeled without following a universal taxonomy, thus, the labeling process to audio indexing is prone to errors. A human labeling will always be influenced by culture di erences, education, tastes, etc. Nonetheless, this indexing process is primordial to guarantee a correct organization of huge databases that contain thousands of music titles. In this study, our interest is about music genre organization. We propose a learning and classification methodology for automatic genre classification able to group several music samples based on their characteristics (this is achieved by the proposed learning process) as well as classify a new test music into the previously learned created groups(this is achieved by the proposed classification process). The learning method intends to group the music samples into di erent clusters only based on audio features and without any previous knowledge on the genre of the samples, and therefore it follows an unsupervised methodology. In addition a Model-Based approach is followed to generate clusters as we do not provide any information about the number of genres in the dataset. Features are related with rhythm analysis, timbre, melody, among others. In addition, Mahalanobis distance was used so that the classification method can deal with non-spherical clusters. The proposed learning method achieves a clustering accuracy of 55% when the dataset contains 11 di erent music genres: Blues, Classical, Country, Disco, Fado, Hiphop, Jazz, Metal,Pop, Reggae and Rock. The clustering accuracy improves significantly when the number of genres is reduced; with 4 genres (Classical, Fado, Metal and Reggae), we obtain an accuracy of 100%. As for the classification process, 82% of the submitted music samples were correctly classified

    Learning a feature space for similarity in world music

    Get PDF
    In this study we investigate computational methods for assessing music similarity in world music styles. We use state-of-the-art audio features to describe musical content in world music recordings. Our music collection is a subset of the Smithsonian Folkways Recordings with audio examples from 31 countries from around the world. Using supervised and unsupervised dimensionality reduction techniques we learn feature representations for music similarity. We evaluate how well music styles separate in this learned space with a classification experiment. We obtained moderate performance classifying the recordings by country. Analysis of misclassifications revealed cases of geographical or cultural proximity. We further evaluate the learned space by detecting outliers, i.e. identifying recordings that stand out in the collection. We use a data mining technique based on Mahalanobis distances to detect outliers and perform a listening experiment in the ‘odd one out’ style to evaluate our findings. We are able to detect, amongst others, recordings of non-musical content as outliers as well as music with distinct timbral and harmonic content. The listening experiment reveals moderate agreement between subjects’ ratings and our outlier estimation

    Exploring the Features to Classify the Musical Period of Western Classical Music

    Get PDF
    Music Information Retrieval (MIR) focuses on extracting meaningful information from music content. MIR is a growing field of research with many applications such as music recommendation systems, fingerprinting, query-by-humming or music genre classification. This study aims to classify the styles of Western classical music, as this has not been explored to a great extent by MIR. In particular, this research will evaluate the impact of different music characteristics on identifying the musical period of Baroque, Classical, Romantic and Modern. In order to easily extract features related to music theory, symbolic representation or music scores were used, instead of audio format. A collection of 870 Western classical music piano scores was downloaded from different sources such as KernScore library (humdrum format) or the Musescore community (MusicXML format). Several global features were constructed by parsing the files and accessing the symbolic information, including notes and duration. These features include melodic intervals, chord types, pitch and rhythm histograms and were based on previous studies and music theory research. Using a radial kernel support vector machine algorithm, different classification models were created to analyse the contribution of the main musical properties: rhythm, pitch, harmony and melody. The study findings revealed that the harmony features were significant predictors of the music styles. The research also confirmed that the musical styles evolved gradually and that the changes in the tonal system through the years, appeared to be the most significant change to identify the styles. This is consistent with the findings of other researchers. The overall accuracy of the model using all the available features achieved an accuracy of 84.3%. It was found that of the four periods studied, it was most difficult to classify music from the Modern period

    Content-based feature selection for music genre classification

    Get PDF
    The most important aspect that one should consider in a content-based analysis study is the feature that represents the information. In music analysis one should know the details of the music contents that can be used to differentiate the songs. The selection of features to represent each music genre is an important step to identify, label, and classify the songs according to the genres. This research investigates, analyzes, and select timbre, rhythm, and pitch-based features to classify music genres. The features that were extracted from the songs consist the singer's voice, the instruments and the melody. The feature selection process focuses on the supervised and unsupervised methods with the reason to select significant generalized and specialized music features. Besides the selection process, two modules of Negative Selection Algorithm; censoring and monitoring are highlighted as well in this work. We then proposed the Modified AIS-based classification algorithm to solve the music genre classification problem. The results from our experiments demonstrate that the features selection process contributes to the proposed modified AIS-based music genre classification performs significantly in classifying the music genres

    Automatic Genre Classification of Latin Music Using Ensemble of Classifiers

    Get PDF
    This paper presents a novel approach to the task of automatic music genre classification which is based on ensemble learning. Feature vectors are extracted from three 30-second music segments from the beginning, middle and end of each music piece. Individual classifiers are trained to account for each music segment. During classification, the output provided by each classifier is combined with the aim of improving music genre classification accuracy. Experiments carried out on a dataset containing 600 music samples from two Latin genres (Tango and Salsa) have shown that for the task of automatic music genre classification, the features extracted from the middle and end music segments provide better results than using the beginning music segment. Furthermore, the proposed ensemble method provides better accuracy than using single classifiers and any individual segment

    The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use

    Get PDF
    The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge the interpretability of any result derived using it. In this article, we disprove the claims that all MGR systems are affected in the same ways by these faults, and that the performances of MGR systems in GTZAN are still meaningfully comparable since they all face the same faults. We identify and analyze the contents of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN, but to use it with consideration of its contents.Comment: 29 pages, 7 figures, 6 tables, 128 reference
    corecore