9,814 research outputs found
Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms
Recent work has shown that the end-to-end approach using convolutional neural
network (CNN) is effective in various types of machine learning tasks. For
audio signals, the approach takes raw waveforms as input using an 1-D
convolution layer. In this paper, we improve the 1-D CNN architecture for music
auto-tagging by adopting building blocks from state-of-the-art image
classification models, ResNets and SENets, and adding multi-level feature
aggregation to it. We compare different combinations of the modules in building
CNN architectures. The results show that they achieve significant improvements
over previous state-of-the-art models on the MagnaTagATune dataset and
comparable results on Million Song Dataset. Furthermore, we analyze and
visualize our model to show how the 1-D CNN operates.Comment: Accepted for publication at ICASSP 201
Learning to rank music tracks using triplet loss
Most music streaming services rely on automatic recommendation algorithms to
exploit their large music catalogs. These algorithms aim at retrieving a ranked
list of music tracks based on their similarity with a target music track. In
this work, we propose a method for direct recommendation based on the audio
content without explicitly tagging the music tracks. To that aim, we propose
several strategies to perform triplet mining from ranked lists. We train a
Convolutional Neural Network to learn the similarity via triplet loss. These
different strategies are compared and validated on a large-scale experiment
against an auto-tagging based approach. The results obtained highlight the
efficiency of our system, especially when associated with an Auto-pooling
layer
A Feature Learning Siamese Model for Intelligent Control of the Dynamic Range Compressor
In this paper, a siamese DNN model is proposed to learn the characteristics
of the audio dynamic range compressor (DRC). This facilitates an intelligent
control system that uses audio examples to configure the DRC, a widely used
non-linear audio signal conditioning technique in the areas of music
production, speech communication and broadcasting. Several alternative siamese
DNN architectures are proposed to learn feature embeddings that can
characterise subtle effects due to dynamic range compression. These models are
compared with each other as well as handcrafted features proposed in previous
work. The evaluation of the relations between the hyperparameters of DNN and
DRC parameters are also provided. The best model is able to produce a universal
feature embedding that is capable of predicting multiple DRC parameters
simultaneously, which is a significant improvement from our previous research.
The feature embedding shows better performance than handcrafted audio features
when predicting DRC parameters for both mono-instrument audio loops and
polyphonic music pieces.Comment: 8 pages, accepted in IJCNN 201
- …