2,634 research outputs found
Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms
Recent work has shown that the end-to-end approach using convolutional neural
network (CNN) is effective in various types of machine learning tasks. For
audio signals, the approach takes raw waveforms as input using an 1-D
convolution layer. In this paper, we improve the 1-D CNN architecture for music
auto-tagging by adopting building blocks from state-of-the-art image
classification models, ResNets and SENets, and adding multi-level feature
aggregation to it. We compare different combinations of the modules in building
CNN architectures. The results show that they achieve significant improvements
over previous state-of-the-art models on the MagnaTagATune dataset and
comparable results on Million Song Dataset. Furthermore, we analyze and
visualize our model to show how the 1-D CNN operates.Comment: Accepted for publication at ICASSP 201
Collaborative Deep Learning for Recommender Systems
Collaborative filtering (CF) is a successful approach commonly used by many
recommender systems. Conventional CF-based methods use the ratings given to
items by users as the sole source of information for learning to make
recommendation. However, the ratings are often very sparse in many
applications, causing CF-based methods to degrade significantly in their
recommendation performance. To address this sparsity problem, auxiliary
information such as item content information may be utilized. Collaborative
topic regression (CTR) is an appealing recent method taking this approach which
tightly couples the two components that learn from two different sources of
information. Nevertheless, the latent representation learned by CTR may not be
very effective when the auxiliary information is very sparse. To address this
problem, we generalize recent advances in deep learning from i.i.d. input to
non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian
model called collaborative deep learning (CDL), which jointly performs deep
representation learning for the content information and collaborative filtering
for the ratings (feedback) matrix. Extensive experiments on three real-world
datasets from different domains show that CDL can significantly advance the
state of the art
An open dataset for research on audio field recording archives: freefield1010
We introduce a free and open dataset of 7690 audio clips sampled from the
field-recording tag in the Freesound audio archive. The dataset is designed for
use in research related to data mining in audio archives of field recordings /
soundscapes. Audio is standardised, and audio and metadata are Creative Commons
licensed. We describe the data preparation process, characterise the dataset
descriptively, and illustrate its use through an auto-tagging experiment
Cold-start music recommendation using a hybrid representation.
Digital music systems are a new and exciting way to dis- cover, share, and listen to new music. Their success is so great, that digital downloads are now included alongside tra- ditional record sales in many o cial music charts [10]. In the past listeners would rely on magazine, radio, and friends reviews to decide on the music they listen to and purchase. In the internet age, this style of nding music is being su- perseded by music recommender systems. The shift from listening to hard copies of music, such as CDs, to online copies like MP3s, presents the interesting new challenge of how to recommend music to a listener. In such recommender systems, a user will typically provide a track that they like as a query, often implicitly as they listen to the track. The system must then provide a list of further tracks that the user will want to listen to. Many websites exist that provide such recommender systems, and many of the systems provide very good recommendations. However, there are still scenarios that these systems struggle to han- dle, and where recommendations can be unreliable. Online music systems allow users to tag any track with a free-text description. A recommender system can then determine the similarity between tracks based on these tags, and make recommendations. However, when a track is new to the system it will have no tags. This means that the track is never recommended, and in turn, the track is very unlikely to be tagged. Turnbull et. al [11] show that social tags tend to be very sparse, and that a huge popularity bias exists. This is further con rmed by data released by Last.fm [7] as part of the million song dataset [3]: from a vocabulary of over 500000 tags, each track, on average, has only 17 tags; 46% of tracks have no tags at all. This scenario is often referred to as the cold-start prob- lem; the results of which means large volumes of music are excluded from recommendations, even if they may be an excellent recommendation. The aim of our hybrid repre- sentation is to reduce the e ects of the cold-start problem, therefore increasing the recommendation quality of the over- all system
- …