1,168 research outputs found
Revisiting the problem of audio-based hit song prediction using convolutional neural networks
Being able to predict whether a song can be a hit has impor- tant
applications in the music industry. Although it is true that the popularity of
a song can be greatly affected by exter- nal factors such as social and
commercial influences, to which degree audio features computed from musical
signals (whom we regard as internal factors) can predict song popularity is an
interesting research question on its own. Motivated by the recent success of
deep learning techniques, we attempt to ex- tend previous work on hit song
prediction by jointly learning the audio features and prediction models using
deep learning. Specifically, we experiment with a convolutional neural net-
work model that takes the primitive mel-spectrogram as the input for feature
learning, a more advanced JYnet model that uses an external song dataset for
supervised pre-training and auto-tagging, and the combination of these two
models. We also consider the inception model to characterize audio infor-
mation in different scales. Our experiments suggest that deep structures are
indeed more accurate than shallow structures in predicting the popularity of
either Chinese or Western Pop songs in Taiwan. We also use the tags predicted
by JYnet to gain insights into the result of different models.Comment: To appear in the proceedings of 2017 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP
Transfer learning by supervised pre-training for audio-based music classification
Very few large-scale music research datasets are publicly available. There is an increasing need for such datasets, because the shift from physical to digital distribution in the music industry has given the listener access to a large body of music, which needs to be cataloged efficiently and be easily browsable. Additionally, deep learning and feature learning techniques are becoming increasingly popular for music information retrieval applications, and they typically require large amounts of training data to work well. In this paper, we propose to exploit an available large-scale music dataset, the Million Song Dataset (MSD), for classification tasks on other datasets, by reusing models trained on the MSD for feature extraction. This transfer learning approach, which we refer to as supervised pre-training, was previously shown to be very effective for computer vision problems. We show that features learned from MSD audio fragments in a supervised manner, using tag labels and user listening data, consistently outperform features learned in an unsupervised manner in this setting, provided that the learned feature extractor is of limited complexity. We evaluate our approach on the GTZAN, 1517-Artists, Unique and Magnatagatune datasets
Revisit Behavior in Social Media: The Phoenix-R Model and Discoveries
How many listens will an artist receive on a online radio? How about plays on
a YouTube video? How many of these visits are new or returning users? Modeling
and mining popularity dynamics of social activity has important implications
for researchers, content creators and providers. We here investigate the effect
of revisits (successive visits from a single user) on content popularity. Using
four datasets of social activity, with up to tens of millions media objects
(e.g., YouTube videos, Twitter hashtags or LastFM artists), we show the effect
of revisits in the popularity evolution of such objects. Secondly, we propose
the Phoenix-R model which captures the popularity dynamics of individual
objects. Phoenix-R has the desired properties of being: (1) parsimonious, being
based on the minimum description length principle, and achieving lower root
mean squared error than state-of-the-art baselines; (2) applicable, the model
is effective for predicting future popularity values of objects.Comment: To appear on European Conference on Machine Learning and Principles
and Practice of Knowledge Discovery in Databases 201
Large-Scale User Modeling with Recurrent Neural Networks for Music Discovery on Multiple Time Scales
The amount of content on online music streaming platforms is immense, and
most users only access a tiny fraction of this content. Recommender systems are
the application of choice to open up the collection to these users.
Collaborative filtering has the disadvantage that it relies on explicit
ratings, which are often unavailable, and generally disregards the temporal
nature of music consumption. On the other hand, item co-occurrence algorithms,
such as the recently introduced word2vec-based recommenders, are typically left
without an effective user representation. In this paper, we present a new
approach to model users through recurrent neural networks by sequentially
processing consumed items, represented by any type of embeddings and other
context features. This way we obtain semantically rich user representations,
which capture a user's musical taste over time. Our experimental analysis on
large-scale user data shows that our model can be used to predict future songs
a user will likely listen to, both in the short and long term.Comment: Author pre-print version, 20 pages, 6 figures, 4 table
The Skipping Behavior of Users of Music Streaming Services and its Relation to Musical Structure
The behavior of users of music streaming services is investigated from the
point of view of the temporal dimension of individual songs; specifically, the
main object of the analysis is the point in time within a song at which users
stop listening and start streaming another song ("skip"). The main contribution
of this study is the ascertainment of a correlation between the distribution in
time of skipping events and the musical structure of songs. It is also shown
that such distribution is not only specific to the individual songs, but also
independent of the cohort of users and, under stationary conditions, date of
observation. Finally, user behavioral data is used to train a predictor of the
musical structure of a song solely from its acoustic content; it is shown that
the use of such data, available in large quantities to music streaming
services, yields significant improvements in accuracy over the customary
fashion of training this class of algorithms, in which only smaller amounts of
hand-labeled data are available
FMA: A Dataset For Music Analysis
We introduce the Free Music Archive (FMA), an open and easily accessible
dataset suitable for evaluating several tasks in MIR, a field concerned with
browsing, searching, and organizing large music collections. The community's
growing interest in feature and end-to-end learning is however restrained by
the limited availability of large audio datasets. The FMA aims to overcome this
hurdle by providing 917 GiB and 343 days of Creative Commons-licensed audio
from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a
hierarchical taxonomy of 161 genres. It provides full-length and high-quality
audio, pre-computed features, together with track- and user-level metadata,
tags, and free-form text such as biographies. We here describe the dataset and
how it was created, propose a train/validation/test split and three subsets,
discuss some suitable MIR tasks, and evaluate some baselines for genre
recognition. Code, data, and usage examples are available at
https://github.com/mdeff/fmaComment: ISMIR 2017 camera-read
To what extent homophily and influencer networks explain song popularity
Forecasting the popularity of new songs has become a standard practice in the
music industry and provides a comparative advantage for those that do it well.
Considerable efforts were put into machine learning prediction models for that
purpose. It is known that in these models, relevant predictive parameters
include intrinsic lyrical and acoustic characteristics, extrinsic factors
(e.g., publisher influence and support), and the previous popularity of the
artists. Much less attention was given to the social components of the
spreading of song popularity. Recently, evidence for musical homophily - the
tendency that people who are socially linked also share musical tastes - was
reported. Here we determine how musical homophily can be used to predict song
popularity. The study is based on an extensive dataset from the last.fm online
music platform from which we can extract social links between listeners and
their listening patterns. To quantify the importance of networks in the
spreading of songs that eventually determines their popularity, we use musical
homophily to design a predictive influence parameter and show that its
inclusion in state-of-the-art machine learning models enhances predictions of
song popularity. The influence parameter improves the prediction precision
(TP/(TP+FN)) by about 50% from 0.14 to 0.21, indicating that the social
component in the spreading of music plays at least as significant a role as the
artist's popularity or the impact of the genre.Comment: 7 pages, 3 figure
- …