349 research outputs found

    Guitar Chords Classification Using Uncertainty Measurements of Frequency Bins

    Get PDF
    This paper presents a method to perform chord classification from recorded audio. The signal harmonics are obtained by using the Fast Fourier Transform, and timbral information is suppressed by spectral whitening. A multiple fundamental frequency estimation of whitened data is achieved by adding attenuated harmonics by a weighting function. This paper proposes a method that performs feature selection by using a thresholding of the uncertainty of all frequency bins. Those measurements under the threshold are removed from the signal in the frequency domain. This allows a reduction of 95.53% of the signal characteristics, and the other 4.47% of frequency bins are used as enhanced information for the classifier. An Artificial Neural Network was utilized to classify four types of chords: major, minor, major 7th, and minor 7th. Those, played in the twelve musical notes, give a total of 48 different chords. Two reference methods (based on Hidden Markov Models) were compared with the method proposed in this paper by having the same database for the evaluation test. In most of the performed tests, the proposed method achieved a reasonably high performance, with an accuracy of 93%

    Sparse and structured decomposition of audio signals on hybrid dictionaries using musical priors

    No full text
    International audienceThis paper investigates the use of musical priors for sparse expansion of audio signals of music, on an overcomplete dual-resolution dictionary taken from the union of two orthonormal bases that can describe both transient and tonal components of a music audio signal. More specifically, chord and metrical structure information are used to build a structured model that takes into account dependencies between coefficients of the decomposition, both for the tonal and for the transient layer. The denoising task application is used to provide a proof of concept of the proposed musical priors. Several configurations of the model are analyzed. Evaluation on monophonic and complex polyphonic excerpts of real music signals shows that the proposed approach provides results whose quality measured by the signal-to-noise ratio is competitive with state-of-the-art approaches, and more coherent with the semantic content of the signal. A detailed analysis of the model in terms of sparsity and in terms of interpretability of the representation is also provided, and shows that the model is capable of giving a relevant and legible representation of Western tonal music audio signals

    Computational Tonality Estimation: Signal Processing and Hidden Markov Models

    Get PDF
    PhDThis thesis investigates computational musical tonality estimation from an audio signal. We present a hidden Markov model (HMM) in which relationships between chords and keys are expressed as probabilities of emitting observable chords from a hidden key sequence. The model is tested first using symbolic chord annotations as observations, and gives excellent global key recognition rates on a set of Beatles songs. The initial model is extended for audio input by using an existing chord recognition algorithm, which allows it to be tested on a much larger database. We show that a simple model of the upper partials in the signal improves percentage scores. We also present a variant of the HMM which has a continuous observation probability density, but show that the discrete version gives better performance. Then follows a detailed analysis of the effects on key estimation and computation time of changing the low level signal processing parameters. We find that much of the high frequency information can be omitted without loss of accuracy, and significant computational savings can be made by applying a threshold to the transform kernels. Results show that there is no single ideal set of parameters for all music, but that tuning the parameters can make a difference to accuracy. We discuss methods of evaluating more complex tonal changes than a single global key, and compare a metric that measures similarity to a ground truth to metrics that are rooted in music retrieval. We show that the two measures give different results, and so recommend that the choice of evaluation metric is determined by the intended application. Finally we draw together our conclusions and use them to suggest areas for continuation of this research, in the areas of tonality model development, feature extraction, evaluation methodology, and applications of computational tonality estimation.Engineering and Physical Sciences Research Council (EPSRC)

    Automatic chord transcription from audio using computational models of musical context

    Get PDF
    PhDThis thesis is concerned with the automatic transcription of chords from audio, with an emphasis on modern popular music. Musical context such as the key and the structural segmentation aid the interpretation of chords in human beings. In this thesis we propose computational models that integrate such musical context into the automatic chord estimation process. We present a novel dynamic Bayesian network (DBN) which integrates models of metric position, key, chord, bass note and two beat-synchronous audio features (bass and treble chroma) into a single high-level musical context model. We simultaneously infer the most probable sequence of metric positions, keys, chords and bass notes via Viterbi inference. Several experiments with real world data show that adding context parameters results in a significant increase in chord recognition accuracy and faithfulness of chord segmentation. The proposed, most complex method transcribes chords with a state-of-the-art accuracy of 73% on the song collection used for the 2009 MIREX Chord Detection tasks. This method is used as a baseline method for two further enhancements. Firstly, we aim to improve chord confusion behaviour by modifying the audio front end processing. We compare the effect of learning chord profiles as Gaussian mixtures to the effect of using chromagrams generated from an approximate pitch transcription method. We show that using chromagrams from approximate transcription results in the most substantial increase in accuracy. The best method achieves 79% accuracy and significantly outperforms the state of the art. Secondly, we propose a method by which chromagram information is shared between repeated structural segments (such as verses) in a song. This can be done fully automatically using a novel structural segmentation algorithm tailored to this task. We show that the technique leads to a significant increase in accuracy and readability. The segmentation algorithm itself also obtains state-of-the-art results. A method that combines both of the above enhancements reaches an accuracy of 81%, a statistically significant improvement over the best result (74%) in the 2009 MIREX Chord Detection tasks.Engineering and Physical Research Council U

    Towards automatic extraction of harmony information from music signals

    Get PDF
    PhDIn this thesis we address the subject of automatic extraction of harmony information from audio recordings. We focus on chord symbol recognition and methods for evaluating algorithms designed to perform that task. We present a novel six-dimensional model for equal tempered pitch space based on concepts from neo-Riemannian music theory. This model is employed as the basis of a harmonic change detection function which we use to improve the performance of a chord recognition algorithm. We develop a machine readable text syntax for chord symbols and present a hand labelled chord transcription collection of 180 Beatles songs annotated using this syntax. This collection has been made publicly available and is already widely used for evaluation purposes in the research community. We also introduce methods for comparing chord symbols which we subsequently use for analysing the statistics of the transcription collection. To ensure that researchers are able to use our transcriptions with confidence, we demonstrate a novel alignment algorithm based on simple audio fingerprints that allows local copies of the Beatles audio files to be accurately aligned to our transcriptions automatically. Evaluation methods for chord symbol recall and segmentation measures are discussed in detail and we use our chord comparison techniques as the basis for a novel dictionary-based chord symbol recall calculation. At the end of the thesis, we evaluate the performance of fifteen chord recognition algorithms (three of our own and twelve entrants to the 2009 MIREX chord detection evaluation) on the Beatles collection. Results are presented for several different evaluation measures using a range of evaluation parameters. The algorithms are compared with each other in terms of performance but we also pay special attention to analysing and discussing the benefits and drawbacks of the different evaluation methods that are used

    Visualizing music structure using Spotify data

    Get PDF

    ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜์˜ ์Œ์•… ๋ฆฌ๋“œ ์‹œํŠธ ์ž๋™ ์ฑ„๋ณด ๋ฐ ๋ฉœ๋กœ๋”” ์œ ์‚ฌ๋„ ํ‰๊ฐ€

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2023. 2. ์ด๊ฒฝ์‹.Since the composition, arrangement, and distribution of music became convenient thanks to the digitization of the music industry, the number of newly supplied music recordings is increasing. Recently, due to platform environments being established whereby anyone can become a creator, user-created music such as their songs, cover songs, and remixes is being distributed through YouTube and TikTok. With such a large volume of musical recordings, the demand to transcribe music into sheet music has always existed for musicians. However, it requires musical knowledge and is time-consuming. This thesis studies automatic lead sheet transcription using deep neural networks. The development of transcription artificial intelligence (AI) can greatly reduce the time and cost for people in the music industry to find or transcribe sheet music. In addition, since the conversion from music sources to the form of digital music is possible, the applications could be expanded, such as music plagiarism detection and music composition AI. The thesis first proposes a model recognizing chords from audio signals. Chord recognition is an important task in music information retrieval since chords are highly abstract and descriptive features of music. We utilize a self-attention mechanism for chord recognition to focus on certain regions of chords. Through an attention map analysis, we visualize how attention is performed. It turns out that the model is able to divide segments of chords by utilizing the adaptive receptive field of the attention mechanism. This thesis proposes a note-level singing melody transcription model using sequence-to-sequence transformers. Overlapping decoding is introduced to solve the problem of the context between segments being broken. Applying pitch augmentation and adding a noisy dataset with data cleansing turns out to be effective in preventing overfitting and generalizing the model performance. Ablation studies demonstrate the effects of the proposed techniques in note-level singing melody transcription, both quantitatively and qualitatively. The proposed model outperforms other models in note-level singing melody transcription performance for all the metrics considered. Finally, subjective human evaluation demonstrates that the results of the proposed models are perceived as more accurate than the results of a previous study. Utilizing the above research results, we introduce the entire process of an automatic music lead sheet transcription. By combining various music information recognized from audio signals, we show that it is possible to transcribe lead sheets that express the core of popular music. Furthermore, we compare the results with lead sheets transcribed by musicians. Finally, we propose a melody similarity assessment method based on self-supervised learning by applying the automatic lead sheet transcription. We present convolutional neural networks that express the melody of lead sheet transcription results in embedding space. To apply self-supervised learning, we introduce methods of generating training data by musical data augmentation techniques. Furthermore, a loss function is presented to utilize the training data. Experimental results demonstrate that the proposed model is able to detect similar melodies of popular music from plagiarism and cover song cases.์Œ์•… ์‚ฐ์—…์˜ ๋””์ง€ํ„ธํ™”๋ฅผ ํ†ตํ•ด ์Œ์•…์˜ ์ž‘๊ณก, ํŽธ๊ณก ๋ฐ ์œ ํ†ต์ด ํŽธ๋ฆฌํ•ด์กŒ๊ธฐ ๋•Œ๋ฌธ์— ์ƒˆ๋กญ๊ฒŒ ๊ณต๊ธ‰๋˜๋Š” ์Œ์›์˜ ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ์ตœ๊ทผ์—๋Š” ๋ˆ„๊ตฌ๋‚˜ ํฌ๋ฆฌ์—์ดํ„ฐ๊ฐ€ ๋  ์ˆ˜ ์žˆ๋Š” ํ”Œ๋žซํผ ํ™˜๊ฒฝ์ด ๊ตฌ์ถ•๋˜์–ด, ์‚ฌ์šฉ์ž๊ฐ€ ๋งŒ๋“  ์ž์ž‘๊ณก, ์ปค๋ฒ„๊ณก, ๋ฆฌ๋ฏน์Šค ๋“ฑ์ด ์œ ํŠœ๋ธŒ, ํ‹ฑํ†ก์„ ํ†ตํ•ด ์œ ํ†ต๋˜๊ณ  ์žˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๋งŽ์€ ์–‘์˜ ์Œ์•…์— ๋Œ€ํ•ด, ์Œ์•…์„ ์•…๋ณด๋กœ ์ฑ„๋ณดํ•˜๊ณ ์ž ํ•˜๋Š” ์ˆ˜์š”๋Š” ์Œ์•…๊ฐ€๋“ค์—๊ฒŒ ํ•ญ์ƒ ์กด์žฌํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์•…๋ณด ์ฑ„๋ณด์—๋Š” ์Œ์•…์  ์ง€์‹์ด ํ•„์š”ํ•˜๊ณ , ์‹œ๊ฐ„๊ณผ ๋น„์šฉ์ด ๋งŽ์ด ์†Œ์š”๋œ๋‹ค๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ํ™œ์šฉํ•˜์—ฌ ์Œ์•… ๋ฆฌ๋“œ ์‹œํŠธ ์•…๋ณด ์ž๋™ ์ฑ„๋ณด ๊ธฐ๋ฒ•์„ ์—ฐ๊ตฌํ•œ๋‹ค. ์ฑ„๋ณด ์ธ๊ณต์ง€๋Šฅ์˜ ๊ฐœ๋ฐœ์€ ์Œ์•… ์ข…์‚ฌ์ž ๋ฐ ์—ฐ์ฃผ์ž๋“ค์ด ์•…๋ณด๋ฅผ ๊ตฌํ•˜๊ฑฐ๋‚˜ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ์†Œ๋ชจํ•˜๋Š” ์‹œ๊ฐ„๊ณผ ๋น„์šฉ์„ ํฌ๊ฒŒ ์ค„์—ฌ ์ค„ ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ์Œ์›์—์„œ ๋””์ง€ํ„ธ ์•…๋ณด ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜์ด ๊ฐ€๋Šฅํ•ด์ง€๋ฏ€๋กœ, ์ž๋™ ํ‘œ์ ˆ ํƒ์ง€, ์ž‘๊ณก ์ธ๊ณต์ง€๋Šฅ ํ•™์Šต ๋“ฑ ๋‹ค์–‘ํ•˜๊ฒŒ ํ™œ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๋ฆฌ๋“œ ์‹œํŠธ ์ฑ„๋ณด๋ฅผ ์œ„ํ•ด, ๋จผ์ € ์˜ค๋””์˜ค ์‹ ํ˜ธ๋กœ๋ถ€ํ„ฐ ์ฝ”๋“œ๋ฅผ ์ธ์‹ํ•˜๋Š” ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ์Œ์•…์—์„œ ์ฝ”๋“œ๋Š” ํ•จ์ถ•์ ์ด๊ณ  ํ‘œํ˜„์ ์ธ ์Œ์•…์˜ ์ค‘์š”ํ•œ ํŠน์ง•์ด๋ฏ€๋กœ ์ด๋ฅผ ์ธ์‹ํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ์ฝ”๋“œ ๊ตฌ๊ฐ„ ์ธ์‹์„ ์œ„ํ•ด, ์–ดํ…์…˜ ๋งค์ปค๋‹ˆ์ฆ˜์„ ์ด์šฉํ•˜๋Š” ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ์ œ์‹œํ•œ๋‹ค. ์–ดํ…์…˜ ์ง€๋„ ๋ถ„์„์„ ํ†ตํ•ด, ์–ดํ…์…˜์ด ์‹ค์ œ๋กœ ์–ด๋–ป๊ฒŒ ์ ์šฉ๋˜๋Š”์ง€ ์‹œ๊ฐํ™”ํ•˜๊ณ , ๋ชจ๋ธ์ด ์ฝ”๋“œ์˜ ๊ตฌ๊ฐ„์„ ๋‚˜๋ˆ„๊ณ  ์ธ์‹ํ•˜๋Š” ๊ณผ์ •์„ ์‚ดํŽด๋ณธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹œํ€€์Šค ํˆฌ ์‹œํ€€์Šค ํŠธ๋žœ์Šคํฌ๋จธ๋ฅผ ์ด์šฉํ•œ ์Œํ‘œ ์ˆ˜์ค€์˜ ๊ฐ€์ฐฝ ๋ฉœ๋กœ๋”” ์ฑ„๋ณด ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ๋””์ฝ”๋”ฉ ๊ณผ์ •์—์„œ ๊ฐ ๊ตฌ๊ฐ„ ์‚ฌ์ด์˜ ๋ฌธ๋งฅ ์ •๋ณด๊ฐ€ ๋‹จ์ ˆ๋˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ค‘์ฒฉ ๋””์ฝ”๋”ฉ์„ ๋„์ž…ํ•œ๋‹ค. ๋ฐ์ดํ„ฐ ๋ณ€ํ˜• ๊ธฐ๋ฒ•์œผ๋กœ ์Œ๋†’์ด ๋ณ€ํ˜•์„ ์ ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ๋ฐ์ดํ„ฐ ํด๋ Œ์ง•์„ ํ†ตํ•ด ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ •๋Ÿ‰ ๋ฐ ์ •์„ฑ์ ์ธ ๋น„๊ต๋ฅผ ํ†ตํ•ด ์ œ์•ˆํ•œ ๊ธฐ๋ฒ•๋“ค์ด ์„ฑ๋Šฅ ๊ฐœ์„ ์— ๋„์›€์ด ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๊ณ , ์ œ์•ˆ๋ชจ๋ธ์ด MIR-ST500 ๋ฐ์ดํ„ฐ ์…‹์— ๋Œ€ํ•œ ์Œํ‘œ ์ˆ˜์ค€์˜ ๊ฐ€์ฐฝ ๋ฉœ๋กœ๋”” ์ฑ„๋ณด ์„ฑ๋Šฅ์—์„œ ๊ฐ€์žฅ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ์ถ”๊ฐ€๋กœ ์ฃผ๊ด€์ ์ธ ์‚ฌ๋žŒ์˜ ํ‰๊ฐ€์—์„œ ์ œ์•ˆ ๋ชจ๋ธ์˜ ์ฑ„๋ณด ๊ฒฐ๊ณผ๊ฐ€ ์ด์ „ ๋ชจ๋ธ๋ณด๋‹ค ์ € ์ •ํ™•ํ•˜๋‹ค๊ณ  ์ธ์‹๋จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์•ž์˜ ์—ฐ๊ตฌ์˜ ๊ฒฐ๊ณผ๋ฅผ ํ™œ์šฉํ•˜์—ฌ, ์Œ์•… ๋ฆฌ๋“œ ์‹œํŠธ ์ž๋™ ์ฑ„๋ณด์˜ ์ „์ฒด ๊ณผ์ •์„ ์ œ์‹œํ•œ๋‹ค. ์˜ค๋””์˜ค ์‹ ํ˜ธ๋กœ๋ถ€ํ„ฐ ์ธ์‹ํ•œ ๋‹ค์–‘ํ•œ ์Œ์•… ์ •๋ณด๋ฅผ ์ข…ํ•ฉํ•˜์—ฌ, ๋Œ€์ค‘ ์Œ์•… ์˜ค๋””์˜ค ์‹ ํ˜ธ์˜ ํ•ต์‹ฌ์„ ํ‘œํ˜„ํ•˜๋Š” ๋ฆฌ๋“œ ์‹œํŠธ ์•…๋ณด ์ฑ„๋ณด๊ฐ€ ๊ฐ€๋Šฅํ•จ์„ ๋ณด์ธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋ฅผ ์ „๋ฌธ๊ฐ€๊ฐ€ ์ œ์ž‘ํ•œ ๋ฆฌ๋“œ์‹œํŠธ์™€ ๋น„๊ตํ•˜์—ฌ ๋ถ„์„ํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ฆฌ๋“œ ์‹œํŠธ ์•…๋ณด ์ž๋™ ์ฑ„๋ณด ๊ธฐ๋ฒ•์„ ์‘์šฉํ•˜์—ฌ, ์ž๊ธฐ ์ง€๋„ ํ•™์Šต ๊ธฐ๋ฐ˜ ๋ฉœ๋กœ๋”” ์œ ์‚ฌ๋„ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ฆฌ๋“œ ์‹œํŠธ ์ฑ„๋ณด ๊ฒฐ๊ณผ์˜ ๋ฉœ๋กœ๋””๋ฅผ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์— ํ‘œํ˜„ํ•˜๋Š” ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์„ ์ œ์‹œํ•œ๋‹ค. ์ž๊ธฐ์ง€๋„ ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด, ์Œ์•…์  ๋ฐ์ดํ„ฐ ๋ณ€ํ˜• ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ค€๋น„๋œ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜๋Š” ์‹ฌ์ธต ๊ฑฐ๋ฆฌ ํ•™์Šต ์†์‹คํ•จ์ˆ˜๋ฅผ ์„ค๊ณ„ํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ถ„์„์„ ํ†ตํ•ด, ์ œ์•ˆ ๋ชจ๋ธ์ด ํ‘œ์ ˆ ๋ฐ ์ปค๋ฒ„์†ก ์ผ€์ด์Šค์—์„œ ๋Œ€์ค‘์Œ์•…์˜ ์œ ์‚ฌํ•œ ๋ฉœ๋กœ๋””๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•œ๋‹ค.Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Objectives 4 1.3 Thesis Outline 6 Chapter 2 Literature Review 7 2.1 Attention Mechanism and Transformers 7 2.1.1 Attention-based Models 7 2.1.2 Transformers with Musical Event Sequence 8 2.2 Chord Recognition 11 2.3 Note-level Singing Melody Transcription 13 2.4 Musical Key Estimation 15 2.5 Beat Tracking 17 2.6 Music Plagiarism Detection and Cover Song Identi cation 19 2.7 Deep Metric Learning and Triplet Loss 21 Chapter 3 Problem De nition 23 3.1 Lead Sheet Transcription 23 3.1.1 Chord Recognition 24 3.1.2 Singing Melody Transcription 25 3.1.3 Post-processing for Lead Sheet Representation 26 3.2 Melody Similarity Assessment 28 Chapter 4 A Bi-directional Transformer for Musical Chord Recognition 29 4.1 Methodology 29 4.1.1 Model Architecture 29 4.1.2 Self-attention in Chord Recognition 33 4.2 Experiments 35 4.2.1 Datasets 35 4.2.2 Preprocessing 35 4.2.3 Evaluation Metrics 36 4.2.4 Training 37 4.3 Results 38 4.3.1 Quantitative Evaluation 38 4.3.2 Attention Map Analysis 41 Chapter 5 Note-level Singing Melody Transcription 44 5.1 Methodology 44 5.1.1 Monophonic Note Event Sequence 44 5.1.2 Audio Features 45 5.1.3 Model Architecture 46 5.1.4 Autoregressive Decoding and Monophonic Masking 47 5.1.5 Overlapping Decoding 47 5.1.6 Pitch Augmentation 49 5.1.7 Adding Noisy Dataset with Data Cleansing 50 5.2 Experiments 51 5.2.1 Dataset 51 5.2.2 Experiment Con gurations 52 5.2.3 Evaluation Metrics 53 5.2.4 Comparison Models 54 5.2.5 Human Evaluation 55 5.3 Results 56 5.3.1 Ablation Study 56 5.3.2 Note-level Transcription Model Comparison 59 5.3.3 Transcription Performance Distribution Analysis 59 5.3.4 Fundamental Frequency (F0) Metric Evaluation 60 5.4 Qualitative Analysis 62 5.4.1 Visualization of Ablation Study 62 5.4.2 Spectrogram Analysis 65 5.4.3 Human Evaluation 67 Chapter 6 Automatic Music Lead Sheet Transcription 68 6.1 Post-processing for Lead Sheet Representation 68 6.2 Lead Sheet Transcription Results 71 Chapter 7 Melody Similarity Assessment with Self-supervised Convolutional Neural Networks 77 7.1 Methodology 77 7.1.1 Input Data Representation 77 7.1.2 Data Augmentation 78 7.1.3 Model Architecture 82 7.1.4 Loss Function 84 7.1.5 De nition of Distance between Songs 85 7.2 Experiments 87 7.2.1 Dataset 87 7.2.2 Training 88 7.2.3 Evaluation Metrics 88 7.3 Results 89 7.3.1 Quantitative Evaluation 89 7.3.2 Qualitative Evaluation 99 Chapter 8 Conclusion 107 8.1 Summary and Contributions 107 8.2 Limitations and Future Research 110 Bibliography 111 ๊ตญ๋ฌธ์ดˆ๋ก 126๋ฐ•
    • โ€ฆ
    corecore