8 research outputs found
Exploring new features for music classification
International audienceAutomatic music classification aims at grouping unknown songs in predefined categories such as music genre or induced emotion. To obtain perceptually relevant results, it is needed to design appropriate features that carry important information for semantic inference. In this paper, we explore novel features and evaluate them in a task of music automatic tagging. The proposed features span various aspects of the music: timbre, textual metadata, visual descriptors of cover art, and features characterizing the lyrics of sung music. The merit of these novel features is then evaluated using a classification system based on a boosting algorithm on binary decision trees. Their effectiveness for the task at hand is discussed with reference to the very common Mel Frequency Cepstral Coefficients features. We show that some of these features alone bring useful information, and that the classification system takes great advantage of a description covering such diverse aspects of songs
Recommended from our members
Combining Sources of Description for Approximating Music Similarity Ratings
In this paper, we compare the effectiveness of basic acoustic features and genre annotations when adapting a music similarity model to user ratings. We use the Metric Learning to Rank algorithm to learn a Mahalanobis metric from comparative similarity ratings in in the MagnaTagATune database. Using common formats for feature data, our approach can easily be transferred to other existing databases. Our results show that genre data allow more effective learning of a metric than simple audio features, but a combination of both feature sets clearly outperforms either individual set
Learning Combinations of Multiple Feature Representations for Music Emotion Prediction
Music consists of several structures and patterns evolving through time which greatly influences the human decoding of higher-level cognitive aspects of music like the emotions expressed in music. For tasks, such as genre, tag and emotion recognition, these structures have often been identified and used as individual and non-temporal features and representations. In this work, we address the hypothesis whether using multiple temporal and non-temporal representations of different features is beneficial for modeling music structure with the aim to predict the emotions expressed in music. We test this hypothesis by representing temporal and non-temporal structures using generative models of multiple audio features. The representations are used in a discriminative setting via the Product Probability Kernel and the Gaussian Process model enabling Multiple Kernel Learning, finding optimized combinations of both features and temporal/ non-temporal representations. We show the increased predictive performance using the combination of different features and representations along with the great interpretive prospects of this approach
The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use
The GTZAN dataset appears in at least 100 published works, and is the
most-used public dataset for evaluation in machine listening research for music
genre recognition (MGR). Our recent work, however, shows GTZAN has several
faults (repetitions, mislabelings, and distortions), which challenge the
interpretability of any result derived using it. In this article, we disprove
the claims that all MGR systems are affected in the same ways by these faults,
and that the performances of MGR systems in GTZAN are still meaningfully
comparable since they all face the same faults. We identify and analyze the
contents of GTZAN, and provide a catalog of its faults. We review how GTZAN has
been used in MGR research, and find few indications that its faults have been
known and considered. Finally, we rigorously study the effects of its faults on
evaluating five different MGR systems. The lesson is not to banish GTZAN, but
to use it with consideration of its contents.Comment: 29 pages, 7 figures, 6 tables, 128 reference