4 research outputs found

    Toward Interpretable Music Tagging with Self-Attention

    Full text link
    Self-attention is an attention mechanism that learns a representation by relating different positions in the sequence. The transformer, which is a sequence model solely based on self-attention, and its variants achieved state-of-the-art results in many natural language processing tasks. Since music composes its semantics based on the relations between components in sparse positions, adopting the self-attention mechanism to solve music information retrieval (MIR) problems can be beneficial. Hence, we propose a self-attention based deep sequence model for music tagging. The proposed architecture consists of shallow convolutional layers followed by stacked Transformer encoders. Compared to conventional approaches using fully convolutional or recurrent neural networks, our model is more interpretable while reporting competitive results. We validate the performance of our model with the MagnaTagATune and the Million Song Dataset. In addition, we demonstrate the interpretability of the proposed architecture with a heat map visualization.Comment: 13 pages, 12 figures; code: https://github.com/minzwon/self-attention-music-taggin

    Deep Layered Learning in MIR

    Full text link
    Deep learning has boosted the performance of many music information retrieval (MIR) systems in recent years. Yet, the complex hierarchical arrangement of music makes end-to-end learning hard for some MIR tasks - a very deep and flexible processing chain is necessary to model some aspect of music audio. Representations involving tones, chords, and rhythm are fundamental building blocks of music. This paper discusses how these can be used as intermediate targets and priors in MIR to deal with structurally complex learning problems, with learning modules connected in a directed acyclic graph. It is suggested that this strategy for inference, referred to as deep layered learning (DLL), can help generalization by (1) - enforcing the validity and invariance of intermediate representations during processing, and by (2) - letting the inferred representations establish the musical organization to support higher-level invariant processing. A background to modular music processing is provided together with an overview of previous publications. Relevant concepts from information processing, such as pruning, skip connections, and performance supervision are reviewed within the context of DLL. A test is finally performed, showing how layered learning affects pitch tracking. It is indicated that especially offsets are easier to detect if guided by extracted framewise fundamental frequencies.Comment: Submitted for publication. Feedback always welcom

    Deep Learning-Based Automatic Downbeat Tracking: A Brief Review

    Full text link
    As an important format of multimedia, music has filled almost everyone's life. Automatic analyzing music is a significant step to satisfy people's need for music retrieval and music recommendation in an effortless way. Thereinto, downbeat tracking has been a fundamental and continuous problem in Music Information Retrieval (MIR) area. Despite significant research efforts, downbeat tracking still remains a challenge. Previous researches either focus on feature engineering (extracting certain features by signal processing, which are semi-automatic solutions); or have some limitations: they can only model music audio recordings within limited time signatures and tempo ranges. Recently, deep learning has surpassed traditional machine learning methods and has become the primary algorithm in feature learning; the combination of traditional and deep learning methods also has made better performance. In this paper, we begin with a background introduction of downbeat tracking problem. Then, we give detailed discussions of the following topics: system architecture, feature extraction, deep neural network algorithms, datasets, and evaluation strategy. In addition, we take a look at the results from the annual benchmark evaluation--Music Information Retrieval Evaluation eXchange (MIREX)--as well as the developments in software implementations. Although much has been achieved in the area of automatic downbeat tracking, some problems still remain. We point out these problems and conclude with possible directions and challenges for future research.Comment: 22 pages, 7 figures. arXiv admin note: text overlap with arXiv:1605.08396 by other author

    Artificial Musical Intelligence: A Survey

    Full text link
    Computers have been used to analyze and create music since they were first introduced in the 1950s and 1960s. Beginning in the late 1990s, the rise of the Internet and large scale platforms for music recommendation and retrieval have made music an increasingly prevalent domain of machine learning and artificial intelligence research. While still nascent, several different approaches have been employed to tackle what may broadly be referred to as "musical intelligence." This article provides a definition of musical intelligence, introduces a taxonomy of its constituent components, and surveys the wide range of AI methods that can be, and have been, brought to bear in its pursuit, with a particular emphasis on machine learning methods.Comment: 99 pages, 5 figures, preprint: currently under revie
    corecore