2,634 research outputs found

    Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms

    Full text link
    Recent work has shown that the end-to-end approach using convolutional neural network (CNN) is effective in various types of machine learning tasks. For audio signals, the approach takes raw waveforms as input using an 1-D convolution layer. In this paper, we improve the 1-D CNN architecture for music auto-tagging by adopting building blocks from state-of-the-art image classification models, ResNets and SENets, and adding multi-level feature aggregation to it. We compare different combinations of the modules in building CNN architectures. The results show that they achieve significant improvements over previous state-of-the-art models on the MagnaTagATune dataset and comparable results on Million Song Dataset. Furthermore, we analyze and visualize our model to show how the 1-D CNN operates.Comment: Accepted for publication at ICASSP 201

    Collaborative Deep Learning for Recommender Systems

    Full text link
    Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional CF-based methods use the ratings given to items by users as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in many applications, causing CF-based methods to degrade significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as item content information may be utilized. Collaborative topic regression (CTR) is an appealing recent method taking this approach which tightly couples the two components that learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be very effective when the auxiliary information is very sparse. To address this problem, we generalize recent advances in deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix. Extensive experiments on three real-world datasets from different domains show that CDL can significantly advance the state of the art

    An open dataset for research on audio field recording archives: freefield1010

    Full text link
    We introduce a free and open dataset of 7690 audio clips sampled from the field-recording tag in the Freesound audio archive. The dataset is designed for use in research related to data mining in audio archives of field recordings / soundscapes. Audio is standardised, and audio and metadata are Creative Commons licensed. We describe the data preparation process, characterise the dataset descriptively, and illustrate its use through an auto-tagging experiment

    Cold-start music recommendation using a hybrid representation.

    Get PDF
    Digital music systems are a new and exciting way to dis- cover, share, and listen to new music. Their success is so great, that digital downloads are now included alongside tra- ditional record sales in many o cial music charts [10]. In the past listeners would rely on magazine, radio, and friends reviews to decide on the music they listen to and purchase. In the internet age, this style of nding music is being su- perseded by music recommender systems. The shift from listening to hard copies of music, such as CDs, to online copies like MP3s, presents the interesting new challenge of how to recommend music to a listener. In such recommender systems, a user will typically provide a track that they like as a query, often implicitly as they listen to the track. The system must then provide a list of further tracks that the user will want to listen to. Many websites exist that provide such recommender systems, and many of the systems provide very good recommendations. However, there are still scenarios that these systems struggle to han- dle, and where recommendations can be unreliable. Online music systems allow users to tag any track with a free-text description. A recommender system can then determine the similarity between tracks based on these tags, and make recommendations. However, when a track is new to the system it will have no tags. This means that the track is never recommended, and in turn, the track is very unlikely to be tagged. Turnbull et. al [11] show that social tags tend to be very sparse, and that a huge popularity bias exists. This is further con rmed by data released by Last.fm [7] as part of the million song dataset [3]: from a vocabulary of over 500000 tags, each track, on average, has only 17 tags; 46% of tracks have no tags at all. This scenario is often referred to as the cold-start prob- lem; the results of which means large volumes of music are excluded from recommendations, even if they may be an excellent recommendation. The aim of our hybrid repre- sentation is to reduce the e ects of the cold-start problem, therefore increasing the recommendation quality of the over- all system
    • …
    corecore