5,317 research outputs found
An end-to-end machine learning system for harmonic analysis of music
We present a new system for simultaneous estimation of keys, chords, and bass
notes from music audio. It makes use of a novel chromagram representation of
audio that takes perception of loudness into account. Furthermore, it is fully
based on machine learning (instead of expert knowledge), such that it is
potentially applicable to a wider range of genres as long as training data is
available. As compared to other models, the proposed system is fast and memory
efficient, while achieving state-of-the-art performance.Comment: MIREX report and preparation of Journal submissio
The Audio Degradation Toolbox and its Application to Robustness Evaluation
We introduce the Audio Degradation Toolbox (ADT) for the controlled degradation of audio signals, and propose its usage as a means of evaluating and comparing the robustness of audio processing algorithms. Music recordings encountered in practical applications are subject to varied, sometimes unpredictable degradation. For example, audio is degraded by low-quality microphones, noisy recording environments, MP3 compression, dynamic compression in broadcasting or vinyl decay. In spite of this, no standard software for the degradation of audio exists, and music processing methods are usually evaluated against clean data. The ADT fills this gap by providing Matlab scripts that emulate a wide range of degradation types. We describe 14 degradation units, and how they can be chained to create more complex, `real-world' degradations. The ADT also provides functionality to adjust existing ground-truth, correcting for temporal distortions introduced by degradation. Using four different music informatics tasks, we show that performance strongly depends on the combination of method and degradation applied. We demonstrate that specific degradations can reduce or even reverse the performance difference between two competing methods. ADT source code, sounds, impulse responses and definitions are freely available for download
Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations
The increasing accuracy of automatic chord estimation systems, the
availability of vast amounts of heterogeneous reference annotations, and
insights from annotator subjectivity research make chord label personalization
increasingly important. Nevertheless, automatic chord estimation systems are
historically exclusively trained and evaluated on a single reference
annotation. We introduce a first approach to automatic chord label
personalization by modeling subjectivity through deep learning of a harmonic
interval-based chord label representation. After integrating these
representations from multiple annotators, we can accurately personalize chord
labels for individual annotators from a single model and the annotators' chord
label vocabulary. Furthermore, we show that chord personalization using
multiple reference annotations outperforms using a single reference annotation.Comment: Proceedings of the First International Conference on Deep Learning
and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [cs.NE]
Computational Tonality Estimation: Signal Processing and Hidden Markov Models
PhDThis thesis investigates computational musical tonality estimation from an audio signal. We
present a hidden Markov model (HMM) in which relationships between chords and keys are
expressed as probabilities of emitting observable chords from a hidden key sequence. The model
is tested first using symbolic chord annotations as observations, and gives excellent global key
recognition rates on a set of Beatles songs.
The initial model is extended for audio input by using an existing chord recognition algorithm,
which allows it to be tested on a much larger database. We show that a simple model of the
upper partials in the signal improves percentage scores. We also present a variant of the HMM
which has a continuous observation probability density, but show that the discrete version gives
better performance.
Then follows a detailed analysis of the effects on key estimation and computation time of
changing the low level signal processing parameters. We find that much of the high frequency
information can be omitted without loss of accuracy, and significant computational savings can
be made by applying a threshold to the transform kernels. Results show that there is no single
ideal set of parameters for all music, but that tuning the parameters can make a difference to
accuracy.
We discuss methods of evaluating more complex tonal changes than a single global key, and
compare a metric that measures similarity to a ground truth to metrics that are rooted in music
retrieval. We show that the two measures give different results, and so recommend that the choice
of evaluation metric is determined by the intended application.
Finally we draw together our conclusions and use them to suggest areas for continuation of this
research, in the areas of tonality model development, feature extraction, evaluation methodology,
and applications of computational tonality estimation.Engineering and Physical
Sciences Research Council (EPSRC)
Deep Learning and Music Adversaries
OA Monitor ExerciseOA Monitor ExerciseAn {\em adversary} is essentially an algorithm intent on making a classification system perform in some particular way given an input, e.g., increase the probability of a false negative. Recent work builds adversaries for deep learning systems applied to image object recognition, which exploits the parameters of the system to find the minimal perturbation of the input image such that the network misclassifies it with high confidence. We adapt this approach to construct and deploy an adversary of deep learning systems applied to music content analysis. In our case, however, the input to the systems is magnitude spectral frames, which requires special care in order to produce valid input audio signals from network-derived perturbations. For two different train-test partitionings of two benchmark datasets, and two different deep architectures, we find that this adversary is very effective in defeating the resulting systems. We find the convolutional networks are more robust, however, compared with systems based on a majority vote over individually classified audio frames. Furthermore, we integrate the adversary into the training of new deep systems, but do not find that this improves their resilience against the same adversary
์ฌ์ธต ์ ๊ฒฝ๋ง ๊ธฐ๋ฐ์ ์์ ๋ฆฌ๋ ์ํธ ์๋ ์ฑ๋ณด ๋ฐ ๋ฉ๋ก๋ ์ ์ฌ๋ ํ๊ฐ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๊ณต๊ณผ๋ํ ์ฐ์
๊ณตํ๊ณผ, 2023. 2. ์ด๊ฒฝ์.Since the composition, arrangement, and distribution of music became convenient thanks to the digitization of the music industry, the number of newly supplied music recordings is increasing. Recently, due to platform environments being established whereby anyone can become a creator, user-created music such as their songs, cover songs, and remixes is being distributed through YouTube and TikTok. With such a large volume of musical recordings, the demand to transcribe music into sheet music has always existed for musicians.
However, it requires musical knowledge and is time-consuming.
This thesis studies automatic lead sheet transcription using deep neural networks. The development of transcription artificial intelligence (AI) can greatly reduce the time and cost for people in the music industry to find or transcribe sheet music. In addition, since the conversion from music sources to the form of digital music is possible, the applications could be expanded, such as music plagiarism detection and music composition AI.
The thesis first proposes a model recognizing chords from audio signals. Chord recognition is an important task in music information retrieval since chords are highly abstract and descriptive features of music. We utilize a self-attention mechanism for chord recognition to focus on certain regions of chords. Through an attention map analysis, we visualize how attention is performed. It turns out that the model is able to divide segments of chords by utilizing the adaptive receptive field of the attention mechanism.
This thesis proposes a note-level singing melody transcription model using sequence-to-sequence transformers. Overlapping decoding is introduced to solve the problem of the context between segments being broken. Applying pitch augmentation and adding a noisy dataset with data cleansing turns out to be effective in preventing overfitting and generalizing the model performance. Ablation studies demonstrate the effects of the proposed techniques in note-level singing melody transcription, both quantitatively and qualitatively. The proposed model outperforms other models in note-level singing melody transcription performance for all the metrics considered. Finally, subjective human evaluation demonstrates that the results of the proposed models are perceived as more accurate than the results of a previous study.
Utilizing the above research results, we introduce the entire process of an automatic music lead sheet transcription. By combining various music information recognized from audio signals, we show that it is possible to transcribe lead sheets that express the core of popular music. Furthermore, we compare the results with lead sheets transcribed by musicians.
Finally, we propose a melody similarity assessment method based on self-supervised learning by applying the automatic lead sheet transcription. We present convolutional neural networks that express the melody of lead sheet transcription results in embedding space. To apply self-supervised learning, we introduce methods of generating training data by musical data augmentation techniques. Furthermore, a loss function is presented to utilize the training data. Experimental results demonstrate that the proposed model is able to detect similar melodies of popular music from plagiarism and cover song cases.์์
์ฐ์
์ ๋์งํธํ๋ฅผ ํตํด ์์
์ ์๊ณก, ํธ๊ณก ๋ฐ ์ ํต์ด ํธ๋ฆฌํด์ก๊ธฐ ๋๋ฌธ์ ์๋กญ๊ฒ ๊ณต๊ธ๋๋ ์์์ ์๊ฐ ์ฆ๊ฐํ๊ณ ์๋ค. ์ต๊ทผ์๋ ๋๊ตฌ๋ ํฌ๋ฆฌ์์ดํฐ๊ฐ ๋ ์ ์๋ ํ๋ซํผ ํ๊ฒฝ์ด ๊ตฌ์ถ๋์ด, ์ฌ์ฉ์๊ฐ ๋ง๋ ์์๊ณก, ์ปค๋ฒ๊ณก, ๋ฆฌ๋ฏน์ค ๋ฑ์ด ์ ํ๋ธ, ํฑํก์ ํตํด ์ ํต๋๊ณ ์๋ค. ์ด๋ ๊ฒ ๋ง์ ์์ ์์
์ ๋ํด, ์์
์ ์
๋ณด๋ก ์ฑ๋ณดํ๊ณ ์ ํ๋ ์์๋ ์์
๊ฐ๋ค์๊ฒ ํญ์ ์กด์ฌํ๋ค. ๊ทธ๋ฌ๋ ์
๋ณด ์ฑ๋ณด์๋ ์์
์ ์ง์์ด ํ์ํ๊ณ , ์๊ฐ๊ณผ ๋น์ฉ์ด ๋ง์ด ์์๋๋ค๋ ๋ฌธ์ ์ ์ด ์๋ค.
๋ณธ ๋
ผ๋ฌธ์์๋ ์ฌ์ธต ์ ๊ฒฝ๋ง์ ํ์ฉํ์ฌ ์์
๋ฆฌ๋ ์ํธ ์
๋ณด ์๋ ์ฑ๋ณด ๊ธฐ๋ฒ์ ์ฐ๊ตฌํ๋ค. ์ฑ๋ณด ์ธ๊ณต์ง๋ฅ์ ๊ฐ๋ฐ์ ์์
์ข
์ฌ์ ๋ฐ ์ฐ์ฃผ์๋ค์ด ์
๋ณด๋ฅผ ๊ตฌํ๊ฑฐ๋ ๋ง๋ค๊ธฐ ์ํด ์๋ชจํ๋ ์๊ฐ๊ณผ ๋น์ฉ์ ํฌ๊ฒ ์ค์ฌ ์ค ์ ์๋ค. ๋ํ ์์์์ ๋์งํธ ์
๋ณด ํํ๋ก ๋ณํ์ด ๊ฐ๋ฅํด์ง๋ฏ๋ก, ์๋ ํ์ ํ์ง, ์๊ณก ์ธ๊ณต์ง๋ฅ ํ์ต ๋ฑ ๋ค์ํ๊ฒ ํ์ฉ์ด ๊ฐ๋ฅํ๋ค.
๋ฆฌ๋ ์ํธ ์ฑ๋ณด๋ฅผ ์ํด, ๋จผ์ ์ค๋์ค ์ ํธ๋ก๋ถํฐ ์ฝ๋๋ฅผ ์ธ์ํ๋ ๋ชจ๋ธ์ ์ ์ํ๋ค. ์์
์์ ์ฝ๋๋ ํจ์ถ์ ์ด๊ณ ํํ์ ์ธ ์์
์ ์ค์ํ ํน์ง์ด๋ฏ๋ก ์ด๋ฅผ ์ธ์ํ๋ ๊ฒ์ ๋งค์ฐ ์ค์ํ๋ค. ์ฝ๋ ๊ตฌ๊ฐ ์ธ์์ ์ํด, ์ดํ
์
๋งค์ปค๋์ฆ์ ์ด์ฉํ๋ ํธ๋์คํฌ๋จธ ๊ธฐ๋ฐ ๋ชจ๋ธ์ ์ ์ํ๋ค. ์ดํ
์
์ง๋ ๋ถ์์ ํตํด, ์ดํ
์
์ด ์ค์ ๋ก ์ด๋ป๊ฒ ์ ์ฉ๋๋์ง ์๊ฐํํ๊ณ , ๋ชจ๋ธ์ด ์ฝ๋์ ๊ตฌ๊ฐ์ ๋๋๊ณ ์ธ์ํ๋ ๊ณผ์ ์ ์ดํด๋ณธ๋ค.
๊ทธ๋ฆฌ๊ณ ์ํ์ค ํฌ ์ํ์ค ํธ๋์คํฌ๋จธ๋ฅผ ์ด์ฉํ ์ํ ์์ค์ ๊ฐ์ฐฝ ๋ฉ๋ก๋ ์ฑ๋ณด ๋ชจ๋ธ์ ์ ์ํ๋ค. ๋์ฝ๋ฉ ๊ณผ์ ์์ ๊ฐ ๊ตฌ๊ฐ ์ฌ์ด์ ๋ฌธ๋งฅ ์ ๋ณด๊ฐ ๋จ์ ๋๋ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ์ค์ฒฉ ๋์ฝ๋ฉ์ ๋์
ํ๋ค. ๋ฐ์ดํฐ ๋ณํ ๊ธฐ๋ฒ์ผ๋ก ์๋์ด ๋ณํ์ ์ ์ฉํ๋ ๋ฐฉ๋ฒ๊ณผ ๋ฐ์ดํฐ ํด๋ ์ง์ ํตํด ํ์ต ๋ฐ์ดํฐ๋ฅผ ์ถ๊ฐํ๋ ๋ฐฉ๋ฒ์ ์๊ฐํ๋ค. ์ ๋ ๋ฐ ์ ์ฑ์ ์ธ ๋น๊ต๋ฅผ ํตํด ์ ์ํ ๊ธฐ๋ฒ๋ค์ด ์ฑ๋ฅ ๊ฐ์ ์ ๋์์ด ๋๋ ๊ฒ์ ํ์ธํ์๊ณ , ์ ์๋ชจ๋ธ์ด MIR-ST500 ๋ฐ์ดํฐ ์
์ ๋ํ ์ํ ์์ค์ ๊ฐ์ฐฝ ๋ฉ๋ก๋ ์ฑ๋ณด ์ฑ๋ฅ์์ ๊ฐ์ฅ ์ฐ์ํ ์ฑ๋ฅ์ ๋ณด์๋ค. ์ถ๊ฐ๋ก ์ฃผ๊ด์ ์ธ ์ฌ๋์ ํ๊ฐ์์ ์ ์ ๋ชจ๋ธ์ ์ฑ๋ณด ๊ฒฐ๊ณผ๊ฐ ์ด์ ๋ชจ๋ธ๋ณด๋ค ์ ์ ํํ๋ค๊ณ ์ธ์๋จ์ ํ์ธํ์๋ค.
์์ ์ฐ๊ตฌ์ ๊ฒฐ๊ณผ๋ฅผ ํ์ฉํ์ฌ, ์์
๋ฆฌ๋ ์ํธ ์๋ ์ฑ๋ณด์ ์ ์ฒด ๊ณผ์ ์ ์ ์ํ๋ค. ์ค๋์ค ์ ํธ๋ก๋ถํฐ ์ธ์ํ ๋ค์ํ ์์
์ ๋ณด๋ฅผ ์ข
ํฉํ์ฌ, ๋์ค ์์
์ค๋์ค ์ ํธ์ ํต์ฌ์ ํํํ๋ ๋ฆฌ๋ ์ํธ ์
๋ณด ์ฑ๋ณด๊ฐ ๊ฐ๋ฅํจ์ ๋ณด์ธ๋ค. ๊ทธ๋ฆฌ๊ณ ์ด๋ฅผ ์ ๋ฌธ๊ฐ๊ฐ ์ ์ํ ๋ฆฌ๋์ํธ์ ๋น๊ตํ์ฌ ๋ถ์ํ๋ค.
๋ง์ง๋ง์ผ๋ก ๋ฆฌ๋ ์ํธ ์
๋ณด ์๋ ์ฑ๋ณด ๊ธฐ๋ฒ์ ์์ฉํ์ฌ, ์๊ธฐ ์ง๋ ํ์ต ๊ธฐ๋ฐ ๋ฉ๋ก๋ ์ ์ฌ๋ ํ๊ฐ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ๋ฆฌ๋ ์ํธ ์ฑ๋ณด ๊ฒฐ๊ณผ์ ๋ฉ๋ก๋๋ฅผ ์๋ฒ ๋ฉ ๊ณต๊ฐ์ ํํํ๋ ํฉ์ฑ๊ณฑ ์ ๊ฒฝ๋ง ๋ชจ๋ธ์ ์ ์ํ๋ค. ์๊ธฐ์ง๋ ํ์ต ๋ฐฉ๋ฒ๋ก ์ ์ ์ฉํ๊ธฐ ์ํด, ์์
์ ๋ฐ์ดํฐ ๋ณํ ๊ธฐ๋ฒ์ ์ ์ฉํ์ฌ ํ์ต ๋ฐ์ดํฐ๋ฅผ ์์ฑํ๋ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ๊ทธ๋ฆฌ๊ณ ์ค๋น๋ ํ์ต ๋ฐ์ดํฐ๋ฅผ ํ์ฉํ๋ ์ฌ์ธต ๊ฑฐ๋ฆฌ ํ์ต ์์คํจ์๋ฅผ ์ค๊ณํ๋ค. ์คํ ๊ฒฐ๊ณผ ๋ถ์์ ํตํด, ์ ์ ๋ชจ๋ธ์ด ํ์ ๋ฐ ์ปค๋ฒ์ก ์ผ์ด์ค์์ ๋์ค์์
์ ์ ์ฌํ ๋ฉ๋ก๋๋ฅผ ํ์งํ ์ ์์์ ํ์ธํ๋ค.Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Objectives 4
1.3 Thesis Outline 6
Chapter 2 Literature Review 7
2.1 Attention Mechanism and Transformers 7
2.1.1 Attention-based Models 7
2.1.2 Transformers with Musical Event Sequence 8
2.2 Chord Recognition 11
2.3 Note-level Singing Melody Transcription 13
2.4 Musical Key Estimation 15
2.5 Beat Tracking 17
2.6 Music Plagiarism Detection and Cover Song Identi cation 19
2.7 Deep Metric Learning and Triplet Loss 21
Chapter 3 Problem De nition 23
3.1 Lead Sheet Transcription 23
3.1.1 Chord Recognition 24
3.1.2 Singing Melody Transcription 25
3.1.3 Post-processing for Lead Sheet Representation 26
3.2 Melody Similarity Assessment 28
Chapter 4 A Bi-directional Transformer for Musical Chord Recognition 29
4.1 Methodology 29
4.1.1 Model Architecture 29
4.1.2 Self-attention in Chord Recognition 33
4.2 Experiments 35
4.2.1 Datasets 35
4.2.2 Preprocessing 35
4.2.3 Evaluation Metrics 36
4.2.4 Training 37
4.3 Results 38
4.3.1 Quantitative Evaluation 38
4.3.2 Attention Map Analysis 41
Chapter 5 Note-level Singing Melody Transcription 44
5.1 Methodology 44
5.1.1 Monophonic Note Event Sequence 44
5.1.2 Audio Features 45
5.1.3 Model Architecture 46
5.1.4 Autoregressive Decoding and Monophonic Masking 47
5.1.5 Overlapping Decoding 47
5.1.6 Pitch Augmentation 49
5.1.7 Adding Noisy Dataset with Data Cleansing 50
5.2 Experiments 51
5.2.1 Dataset 51
5.2.2 Experiment Con gurations 52
5.2.3 Evaluation Metrics 53
5.2.4 Comparison Models 54
5.2.5 Human Evaluation 55
5.3 Results 56
5.3.1 Ablation Study 56
5.3.2 Note-level Transcription Model Comparison 59
5.3.3 Transcription Performance Distribution Analysis 59
5.3.4 Fundamental Frequency (F0) Metric Evaluation 60
5.4 Qualitative Analysis 62
5.4.1 Visualization of Ablation Study 62
5.4.2 Spectrogram Analysis 65
5.4.3 Human Evaluation 67
Chapter 6 Automatic Music Lead Sheet Transcription 68
6.1 Post-processing for Lead Sheet Representation 68
6.2 Lead Sheet Transcription Results 71
Chapter 7 Melody Similarity Assessment with Self-supervised Convolutional Neural Networks 77
7.1 Methodology 77
7.1.1 Input Data Representation 77
7.1.2 Data Augmentation 78
7.1.3 Model Architecture 82
7.1.4 Loss Function 84
7.1.5 De nition of Distance between Songs 85
7.2 Experiments 87
7.2.1 Dataset 87
7.2.2 Training 88
7.2.3 Evaluation Metrics 88
7.3 Results 89
7.3.1 Quantitative Evaluation 89
7.3.2 Qualitative Evaluation 99
Chapter 8 Conclusion 107
8.1 Summary and Contributions 107
8.2 Limitations and Future Research 110
Bibliography 111
๊ตญ๋ฌธ์ด๋ก 126๋ฐ
- โฆ