4 research outputs found
Toward Interpretable Music Tagging with Self-Attention
Self-attention is an attention mechanism that learns a representation by
relating different positions in the sequence. The transformer, which is a
sequence model solely based on self-attention, and its variants achieved
state-of-the-art results in many natural language processing tasks. Since music
composes its semantics based on the relations between components in sparse
positions, adopting the self-attention mechanism to solve music information
retrieval (MIR) problems can be beneficial. Hence, we propose a self-attention
based deep sequence model for music tagging. The proposed architecture consists
of shallow convolutional layers followed by stacked Transformer encoders.
Compared to conventional approaches using fully convolutional or recurrent
neural networks, our model is more interpretable while reporting competitive
results. We validate the performance of our model with the MagnaTagATune and
the Million Song Dataset. In addition, we demonstrate the interpretability of
the proposed architecture with a heat map visualization.Comment: 13 pages, 12 figures; code:
https://github.com/minzwon/self-attention-music-taggin
Deep Layered Learning in MIR
Deep learning has boosted the performance of many music information retrieval
(MIR) systems in recent years. Yet, the complex hierarchical arrangement of
music makes end-to-end learning hard for some MIR tasks - a very deep and
flexible processing chain is necessary to model some aspect of music audio.
Representations involving tones, chords, and rhythm are fundamental building
blocks of music. This paper discusses how these can be used as intermediate
targets and priors in MIR to deal with structurally complex learning problems,
with learning modules connected in a directed acyclic graph. It is suggested
that this strategy for inference, referred to as deep layered learning (DLL),
can help generalization by (1) - enforcing the validity and invariance of
intermediate representations during processing, and by (2) - letting the
inferred representations establish the musical organization to support
higher-level invariant processing. A background to modular music processing is
provided together with an overview of previous publications. Relevant concepts
from information processing, such as pruning, skip connections, and performance
supervision are reviewed within the context of DLL. A test is finally
performed, showing how layered learning affects pitch tracking. It is indicated
that especially offsets are easier to detect if guided by extracted framewise
fundamental frequencies.Comment: Submitted for publication. Feedback always welcom
Deep Learning-Based Automatic Downbeat Tracking: A Brief Review
As an important format of multimedia, music has filled almost everyone's
life. Automatic analyzing music is a significant step to satisfy people's need
for music retrieval and music recommendation in an effortless way. Thereinto,
downbeat tracking has been a fundamental and continuous problem in Music
Information Retrieval (MIR) area. Despite significant research efforts,
downbeat tracking still remains a challenge. Previous researches either focus
on feature engineering (extracting certain features by signal processing, which
are semi-automatic solutions); or have some limitations: they can only model
music audio recordings within limited time signatures and tempo ranges.
Recently, deep learning has surpassed traditional machine learning methods and
has become the primary algorithm in feature learning; the combination of
traditional and deep learning methods also has made better performance. In this
paper, we begin with a background introduction of downbeat tracking problem.
Then, we give detailed discussions of the following topics: system
architecture, feature extraction, deep neural network algorithms, datasets, and
evaluation strategy. In addition, we take a look at the results from the annual
benchmark evaluation--Music Information Retrieval Evaluation eXchange
(MIREX)--as well as the developments in software implementations. Although much
has been achieved in the area of automatic downbeat tracking, some problems
still remain. We point out these problems and conclude with possible directions
and challenges for future research.Comment: 22 pages, 7 figures. arXiv admin note: text overlap with
arXiv:1605.08396 by other author
Artificial Musical Intelligence: A Survey
Computers have been used to analyze and create music since they were first
introduced in the 1950s and 1960s. Beginning in the late 1990s, the rise of the
Internet and large scale platforms for music recommendation and retrieval have
made music an increasingly prevalent domain of machine learning and artificial
intelligence research. While still nascent, several different approaches have
been employed to tackle what may broadly be referred to as "musical
intelligence." This article provides a definition of musical intelligence,
introduces a taxonomy of its constituent components, and surveys the wide range
of AI methods that can be, and have been, brought to bear in its pursuit, with
a particular emphasis on machine learning methods.Comment: 99 pages, 5 figures, preprint: currently under revie