2,374 research outputs found

    A Fully Convolutional Deep Auditory Model for Musical Chord Recognition

    Full text link
    Chord recognition systems depend on robust feature extraction pipelines. While these pipelines are traditionally hand-crafted, recent advances in end-to-end machine learning have begun to inspire researchers to explore data-driven methods for such tasks. In this paper, we present a chord recognition system that uses a fully convolutional deep auditory model for feature extraction. The extracted features are processed by a Conditional Random Field that decodes the final chord sequence. Both processing stages are trained automatically and do not require expert knowledge for optimising parameters. We show that the learned auditory system extracts musically interpretable features, and that the proposed chord recognition system achieves results on par or better than state-of-the-art algorithms.Comment: In Proceedings of the 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), Vietro sul Mare, Ital

    Automatic Chord Estimation Based on a Frame-wise Convolutional Recurrent Neural Network with Non-Aligned Annotations

    Get PDF
    International audienceThis paper describes a weakly-supervised approach to Automatic Chord Estimation (ACE) task that aims to estimate a sequence of chords from a given music audio signal at the frame level, under a realistic condition that only non-aligned chord annotations are available. In conventional studies assuming the availability of time-aligned chord annotations, Deep Neural Networks (DNNs) that learn frame-wise mappings from acoustic features to chords have attained excellent performance. The major drawback of such frame-wise models is that they cannot be trained without the time alignment information. Inspired by a common approach in automatic speech recognition based on non-aligned speech transcriptions, we propose a two-step method that trains a Hidden Markov Model (HMM) for the forced alignment between chord annotations and music signals, and then trains a powerful frame-wise DNN model for ACE. Experimental results show that although the frame-level accuracy of the forced alignment was just under 90%, the performance of the proposed method was degraded only slightly from that of the DNN model trained by using the ground-truth alignment data. Furthermore, using a sufficient amount of easily collected non-aligned data, the proposed method is able to reach or even outperform the conventional methods based on ground-truth time-aligned annotations

    Lauluyhtyeen intonaation automaattinen maฬˆaฬˆritys

    Get PDF
    The objective of this study is a specific music signal processing task, primarily intended to help vocal ensemble singers practice their intonation. In this case intonation is defined as deviations of pitch in relation to the note written in the score which are small, less than a semitone. These can be either intentional or unintentional. Practicing intonation is typically challenging without an external ear. The algorithm developed in this thesis combined with the presented application concept can act as the external ear, providing real-time information on intonation to support practicing. The method can be applied to the analysis of recorded material as well. The music signal generated by a vocal ensemble is polyphonic. It contains multiple simultaneous tones with partly or completely overlapping harmonic partials. We need to be able to estimate the fundamental frequency of each tone, which then indicates the pitch of each singer. Our experiments show, that the fundamental frequency estimation method based on the Fourier analysis developed in this thesis can be applied to the automatic analysis of vocal ensembles. A sufficient frequency resolution can be achieved without compromising the time resolution too much by using an adequately sized window. The accuracy and robustness can be further increased by taking advantage of solitary partials. The greatest challenge turned out to be the estimation of tones in octave and unison relationships. These intervals are fairly common in tonal music. This question requires further investigation or another type of approach.Taฬˆssaฬˆ tyoฬˆssaฬˆ tutkitaan erityistaฬˆ musiikkisignaalin analysointitehtaฬˆvaฬˆaฬˆ, jonka tarkoi- tuksena on auttaa lauluyhtyelaulajia intonaation harjoittelussa. Intonaatiolla tar- koitetaan taฬˆssaฬˆ yhteydessaฬˆ pieniaฬˆ, alle puolen saฬˆvelaskeleen saฬˆveltasoeroja nuottiin kirjoitettuun saฬˆvelkorkeuteen naฬˆhden, jotka voivat olla joko tarkoituksenmukaisia tai tahattomia. Intonaation harjoittelu on tyypillisesti haastavaa ilman ulkopuolista korvaa. Tyoฬˆssaฬˆ kehitetty algoritmi yhdessaฬˆ esitellyn sovelluskonseptin kanssa voi toimia harjoittelutilanteessa ulkopuolisena korvana tarjoten reaaliaikaista tietoa intonaatiosta harjoittelun tueksi. Vaihtoehtoisesti menetelmaฬˆaฬˆ voidaan hyoฬˆdyntaฬˆaฬˆ harjoitusaฬˆaฬˆnitteiden analysointiin jaฬˆlkikaฬˆteen. Lauluyhtyeen tuottama musiikki- signaali on polyfoninen. Se sisaฬˆltaฬˆaฬˆ useita paฬˆaฬˆllekkaฬˆisiaฬˆ saฬˆveliaฬˆ, joiden osasaฬˆvelet menevaฬˆt toistensa kanssa osittain tai kokonaan paฬˆaฬˆllekkaฬˆin. Taฬˆstaฬˆ signaalista on pystyttaฬˆvaฬˆ tunnistamaan kunkin saฬˆvelen perustaajuus, joka puolestaan kertoo lau- lajan laulaman saฬˆvelkorkeuden. Kokeellisten tulosten perusteella tyoฬˆssaฬˆ kehitettyaฬˆ Fourier-muunnokseen perustuvaa taajuusanalyysiaฬˆ voidaan soveltaa lauluyhtyeen intonaation automaattiseen maฬˆaฬˆritykseen, kun nuottiin kirjoitettua sointua hyoฬˆdyn- netaฬˆaฬˆn analyysin laฬˆhtoฬˆtietona. Sopivankokoista naฬˆyteikkunaa kaฬˆyttaฬˆmaฬˆllaฬˆ paฬˆaฬˆstiin riittaฬˆvaฬˆaฬˆn taajuusresoluutioon aikaresoluution saฬˆilyessaฬˆ kohtuullisena. Yksinaฬˆisiaฬˆ osasaฬˆveliaฬˆ hyoฬˆdyntaฬˆmaฬˆllaฬˆ voidaan edelleen parantaa tarkkuutta ja toimintavar- muutta. Suurimmaksi haasteeksi osoittautui oktaavi- ja priimisuhteissa olevien intervallien luotettava maฬˆaฬˆritys. Naฬˆitaฬˆ intervallisuhteita esiintyy tonaalisessa musii- kissa erityisen paljon. Taฬˆmaฬˆ kysymys vaatii vielaฬˆ lisaฬˆtutkimusta tai uudenlaista laฬˆhestymistapaa

    ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜์˜ ์Œ์•… ๋ฆฌ๋“œ ์‹œํŠธ ์ž๋™ ์ฑ„๋ณด ๋ฐ ๋ฉœ๋กœ๋”” ์œ ์‚ฌ๋„ ํ‰๊ฐ€

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2023. 2. ์ด๊ฒฝ์‹.Since the composition, arrangement, and distribution of music became convenient thanks to the digitization of the music industry, the number of newly supplied music recordings is increasing. Recently, due to platform environments being established whereby anyone can become a creator, user-created music such as their songs, cover songs, and remixes is being distributed through YouTube and TikTok. With such a large volume of musical recordings, the demand to transcribe music into sheet music has always existed for musicians. However, it requires musical knowledge and is time-consuming. This thesis studies automatic lead sheet transcription using deep neural networks. The development of transcription artificial intelligence (AI) can greatly reduce the time and cost for people in the music industry to find or transcribe sheet music. In addition, since the conversion from music sources to the form of digital music is possible, the applications could be expanded, such as music plagiarism detection and music composition AI. The thesis first proposes a model recognizing chords from audio signals. Chord recognition is an important task in music information retrieval since chords are highly abstract and descriptive features of music. We utilize a self-attention mechanism for chord recognition to focus on certain regions of chords. Through an attention map analysis, we visualize how attention is performed. It turns out that the model is able to divide segments of chords by utilizing the adaptive receptive field of the attention mechanism. This thesis proposes a note-level singing melody transcription model using sequence-to-sequence transformers. Overlapping decoding is introduced to solve the problem of the context between segments being broken. Applying pitch augmentation and adding a noisy dataset with data cleansing turns out to be effective in preventing overfitting and generalizing the model performance. Ablation studies demonstrate the effects of the proposed techniques in note-level singing melody transcription, both quantitatively and qualitatively. The proposed model outperforms other models in note-level singing melody transcription performance for all the metrics considered. Finally, subjective human evaluation demonstrates that the results of the proposed models are perceived as more accurate than the results of a previous study. Utilizing the above research results, we introduce the entire process of an automatic music lead sheet transcription. By combining various music information recognized from audio signals, we show that it is possible to transcribe lead sheets that express the core of popular music. Furthermore, we compare the results with lead sheets transcribed by musicians. Finally, we propose a melody similarity assessment method based on self-supervised learning by applying the automatic lead sheet transcription. We present convolutional neural networks that express the melody of lead sheet transcription results in embedding space. To apply self-supervised learning, we introduce methods of generating training data by musical data augmentation techniques. Furthermore, a loss function is presented to utilize the training data. Experimental results demonstrate that the proposed model is able to detect similar melodies of popular music from plagiarism and cover song cases.์Œ์•… ์‚ฐ์—…์˜ ๋””์ง€ํ„ธํ™”๋ฅผ ํ†ตํ•ด ์Œ์•…์˜ ์ž‘๊ณก, ํŽธ๊ณก ๋ฐ ์œ ํ†ต์ด ํŽธ๋ฆฌํ•ด์กŒ๊ธฐ ๋•Œ๋ฌธ์— ์ƒˆ๋กญ๊ฒŒ ๊ณต๊ธ‰๋˜๋Š” ์Œ์›์˜ ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ์ตœ๊ทผ์—๋Š” ๋ˆ„๊ตฌ๋‚˜ ํฌ๋ฆฌ์—์ดํ„ฐ๊ฐ€ ๋  ์ˆ˜ ์žˆ๋Š” ํ”Œ๋žซํผ ํ™˜๊ฒฝ์ด ๊ตฌ์ถ•๋˜์–ด, ์‚ฌ์šฉ์ž๊ฐ€ ๋งŒ๋“  ์ž์ž‘๊ณก, ์ปค๋ฒ„๊ณก, ๋ฆฌ๋ฏน์Šค ๋“ฑ์ด ์œ ํŠœ๋ธŒ, ํ‹ฑํ†ก์„ ํ†ตํ•ด ์œ ํ†ต๋˜๊ณ  ์žˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๋งŽ์€ ์–‘์˜ ์Œ์•…์— ๋Œ€ํ•ด, ์Œ์•…์„ ์•…๋ณด๋กœ ์ฑ„๋ณดํ•˜๊ณ ์ž ํ•˜๋Š” ์ˆ˜์š”๋Š” ์Œ์•…๊ฐ€๋“ค์—๊ฒŒ ํ•ญ์ƒ ์กด์žฌํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์•…๋ณด ์ฑ„๋ณด์—๋Š” ์Œ์•…์  ์ง€์‹์ด ํ•„์š”ํ•˜๊ณ , ์‹œ๊ฐ„๊ณผ ๋น„์šฉ์ด ๋งŽ์ด ์†Œ์š”๋œ๋‹ค๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ํ™œ์šฉํ•˜์—ฌ ์Œ์•… ๋ฆฌ๋“œ ์‹œํŠธ ์•…๋ณด ์ž๋™ ์ฑ„๋ณด ๊ธฐ๋ฒ•์„ ์—ฐ๊ตฌํ•œ๋‹ค. ์ฑ„๋ณด ์ธ๊ณต์ง€๋Šฅ์˜ ๊ฐœ๋ฐœ์€ ์Œ์•… ์ข…์‚ฌ์ž ๋ฐ ์—ฐ์ฃผ์ž๋“ค์ด ์•…๋ณด๋ฅผ ๊ตฌํ•˜๊ฑฐ๋‚˜ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ์†Œ๋ชจํ•˜๋Š” ์‹œ๊ฐ„๊ณผ ๋น„์šฉ์„ ํฌ๊ฒŒ ์ค„์—ฌ ์ค„ ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ์Œ์›์—์„œ ๋””์ง€ํ„ธ ์•…๋ณด ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜์ด ๊ฐ€๋Šฅํ•ด์ง€๋ฏ€๋กœ, ์ž๋™ ํ‘œ์ ˆ ํƒ์ง€, ์ž‘๊ณก ์ธ๊ณต์ง€๋Šฅ ํ•™์Šต ๋“ฑ ๋‹ค์–‘ํ•˜๊ฒŒ ํ™œ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๋ฆฌ๋“œ ์‹œํŠธ ์ฑ„๋ณด๋ฅผ ์œ„ํ•ด, ๋จผ์ € ์˜ค๋””์˜ค ์‹ ํ˜ธ๋กœ๋ถ€ํ„ฐ ์ฝ”๋“œ๋ฅผ ์ธ์‹ํ•˜๋Š” ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ์Œ์•…์—์„œ ์ฝ”๋“œ๋Š” ํ•จ์ถ•์ ์ด๊ณ  ํ‘œํ˜„์ ์ธ ์Œ์•…์˜ ์ค‘์š”ํ•œ ํŠน์ง•์ด๋ฏ€๋กœ ์ด๋ฅผ ์ธ์‹ํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ์ฝ”๋“œ ๊ตฌ๊ฐ„ ์ธ์‹์„ ์œ„ํ•ด, ์–ดํ…์…˜ ๋งค์ปค๋‹ˆ์ฆ˜์„ ์ด์šฉํ•˜๋Š” ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ์ œ์‹œํ•œ๋‹ค. ์–ดํ…์…˜ ์ง€๋„ ๋ถ„์„์„ ํ†ตํ•ด, ์–ดํ…์…˜์ด ์‹ค์ œ๋กœ ์–ด๋–ป๊ฒŒ ์ ์šฉ๋˜๋Š”์ง€ ์‹œ๊ฐํ™”ํ•˜๊ณ , ๋ชจ๋ธ์ด ์ฝ”๋“œ์˜ ๊ตฌ๊ฐ„์„ ๋‚˜๋ˆ„๊ณ  ์ธ์‹ํ•˜๋Š” ๊ณผ์ •์„ ์‚ดํŽด๋ณธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹œํ€€์Šค ํˆฌ ์‹œํ€€์Šค ํŠธ๋žœ์Šคํฌ๋จธ๋ฅผ ์ด์šฉํ•œ ์Œํ‘œ ์ˆ˜์ค€์˜ ๊ฐ€์ฐฝ ๋ฉœ๋กœ๋”” ์ฑ„๋ณด ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ๋””์ฝ”๋”ฉ ๊ณผ์ •์—์„œ ๊ฐ ๊ตฌ๊ฐ„ ์‚ฌ์ด์˜ ๋ฌธ๋งฅ ์ •๋ณด๊ฐ€ ๋‹จ์ ˆ๋˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ค‘์ฒฉ ๋””์ฝ”๋”ฉ์„ ๋„์ž…ํ•œ๋‹ค. ๋ฐ์ดํ„ฐ ๋ณ€ํ˜• ๊ธฐ๋ฒ•์œผ๋กœ ์Œ๋†’์ด ๋ณ€ํ˜•์„ ์ ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ๋ฐ์ดํ„ฐ ํด๋ Œ์ง•์„ ํ†ตํ•ด ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ •๋Ÿ‰ ๋ฐ ์ •์„ฑ์ ์ธ ๋น„๊ต๋ฅผ ํ†ตํ•ด ์ œ์•ˆํ•œ ๊ธฐ๋ฒ•๋“ค์ด ์„ฑ๋Šฅ ๊ฐœ์„ ์— ๋„์›€์ด ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๊ณ , ์ œ์•ˆ๋ชจ๋ธ์ด MIR-ST500 ๋ฐ์ดํ„ฐ ์…‹์— ๋Œ€ํ•œ ์Œํ‘œ ์ˆ˜์ค€์˜ ๊ฐ€์ฐฝ ๋ฉœ๋กœ๋”” ์ฑ„๋ณด ์„ฑ๋Šฅ์—์„œ ๊ฐ€์žฅ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ์ถ”๊ฐ€๋กœ ์ฃผ๊ด€์ ์ธ ์‚ฌ๋žŒ์˜ ํ‰๊ฐ€์—์„œ ์ œ์•ˆ ๋ชจ๋ธ์˜ ์ฑ„๋ณด ๊ฒฐ๊ณผ๊ฐ€ ์ด์ „ ๋ชจ๋ธ๋ณด๋‹ค ์ € ์ •ํ™•ํ•˜๋‹ค๊ณ  ์ธ์‹๋จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์•ž์˜ ์—ฐ๊ตฌ์˜ ๊ฒฐ๊ณผ๋ฅผ ํ™œ์šฉํ•˜์—ฌ, ์Œ์•… ๋ฆฌ๋“œ ์‹œํŠธ ์ž๋™ ์ฑ„๋ณด์˜ ์ „์ฒด ๊ณผ์ •์„ ์ œ์‹œํ•œ๋‹ค. ์˜ค๋””์˜ค ์‹ ํ˜ธ๋กœ๋ถ€ํ„ฐ ์ธ์‹ํ•œ ๋‹ค์–‘ํ•œ ์Œ์•… ์ •๋ณด๋ฅผ ์ข…ํ•ฉํ•˜์—ฌ, ๋Œ€์ค‘ ์Œ์•… ์˜ค๋””์˜ค ์‹ ํ˜ธ์˜ ํ•ต์‹ฌ์„ ํ‘œํ˜„ํ•˜๋Š” ๋ฆฌ๋“œ ์‹œํŠธ ์•…๋ณด ์ฑ„๋ณด๊ฐ€ ๊ฐ€๋Šฅํ•จ์„ ๋ณด์ธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋ฅผ ์ „๋ฌธ๊ฐ€๊ฐ€ ์ œ์ž‘ํ•œ ๋ฆฌ๋“œ์‹œํŠธ์™€ ๋น„๊ตํ•˜์—ฌ ๋ถ„์„ํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ฆฌ๋“œ ์‹œํŠธ ์•…๋ณด ์ž๋™ ์ฑ„๋ณด ๊ธฐ๋ฒ•์„ ์‘์šฉํ•˜์—ฌ, ์ž๊ธฐ ์ง€๋„ ํ•™์Šต ๊ธฐ๋ฐ˜ ๋ฉœ๋กœ๋”” ์œ ์‚ฌ๋„ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ฆฌ๋“œ ์‹œํŠธ ์ฑ„๋ณด ๊ฒฐ๊ณผ์˜ ๋ฉœ๋กœ๋””๋ฅผ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์— ํ‘œํ˜„ํ•˜๋Š” ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์„ ์ œ์‹œํ•œ๋‹ค. ์ž๊ธฐ์ง€๋„ ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด, ์Œ์•…์  ๋ฐ์ดํ„ฐ ๋ณ€ํ˜• ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ค€๋น„๋œ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜๋Š” ์‹ฌ์ธต ๊ฑฐ๋ฆฌ ํ•™์Šต ์†์‹คํ•จ์ˆ˜๋ฅผ ์„ค๊ณ„ํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ถ„์„์„ ํ†ตํ•ด, ์ œ์•ˆ ๋ชจ๋ธ์ด ํ‘œ์ ˆ ๋ฐ ์ปค๋ฒ„์†ก ์ผ€์ด์Šค์—์„œ ๋Œ€์ค‘์Œ์•…์˜ ์œ ์‚ฌํ•œ ๋ฉœ๋กœ๋””๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•œ๋‹ค.Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Objectives 4 1.3 Thesis Outline 6 Chapter 2 Literature Review 7 2.1 Attention Mechanism and Transformers 7 2.1.1 Attention-based Models 7 2.1.2 Transformers with Musical Event Sequence 8 2.2 Chord Recognition 11 2.3 Note-level Singing Melody Transcription 13 2.4 Musical Key Estimation 15 2.5 Beat Tracking 17 2.6 Music Plagiarism Detection and Cover Song Identi cation 19 2.7 Deep Metric Learning and Triplet Loss 21 Chapter 3 Problem De nition 23 3.1 Lead Sheet Transcription 23 3.1.1 Chord Recognition 24 3.1.2 Singing Melody Transcription 25 3.1.3 Post-processing for Lead Sheet Representation 26 3.2 Melody Similarity Assessment 28 Chapter 4 A Bi-directional Transformer for Musical Chord Recognition 29 4.1 Methodology 29 4.1.1 Model Architecture 29 4.1.2 Self-attention in Chord Recognition 33 4.2 Experiments 35 4.2.1 Datasets 35 4.2.2 Preprocessing 35 4.2.3 Evaluation Metrics 36 4.2.4 Training 37 4.3 Results 38 4.3.1 Quantitative Evaluation 38 4.3.2 Attention Map Analysis 41 Chapter 5 Note-level Singing Melody Transcription 44 5.1 Methodology 44 5.1.1 Monophonic Note Event Sequence 44 5.1.2 Audio Features 45 5.1.3 Model Architecture 46 5.1.4 Autoregressive Decoding and Monophonic Masking 47 5.1.5 Overlapping Decoding 47 5.1.6 Pitch Augmentation 49 5.1.7 Adding Noisy Dataset with Data Cleansing 50 5.2 Experiments 51 5.2.1 Dataset 51 5.2.2 Experiment Con gurations 52 5.2.3 Evaluation Metrics 53 5.2.4 Comparison Models 54 5.2.5 Human Evaluation 55 5.3 Results 56 5.3.1 Ablation Study 56 5.3.2 Note-level Transcription Model Comparison 59 5.3.3 Transcription Performance Distribution Analysis 59 5.3.4 Fundamental Frequency (F0) Metric Evaluation 60 5.4 Qualitative Analysis 62 5.4.1 Visualization of Ablation Study 62 5.4.2 Spectrogram Analysis 65 5.4.3 Human Evaluation 67 Chapter 6 Automatic Music Lead Sheet Transcription 68 6.1 Post-processing for Lead Sheet Representation 68 6.2 Lead Sheet Transcription Results 71 Chapter 7 Melody Similarity Assessment with Self-supervised Convolutional Neural Networks 77 7.1 Methodology 77 7.1.1 Input Data Representation 77 7.1.2 Data Augmentation 78 7.1.3 Model Architecture 82 7.1.4 Loss Function 84 7.1.5 De nition of Distance between Songs 85 7.2 Experiments 87 7.2.1 Dataset 87 7.2.2 Training 88 7.2.3 Evaluation Metrics 88 7.3 Results 89 7.3.1 Quantitative Evaluation 89 7.3.2 Qualitative Evaluation 99 Chapter 8 Conclusion 107 8.1 Summary and Contributions 107 8.2 Limitations and Future Research 110 Bibliography 111 ๊ตญ๋ฌธ์ดˆ๋ก 126๋ฐ•

    Lisztโ€™s Eฬtude S.136 no.1: audio data analysis of two different piano recordings

    Get PDF
    In this paper, we review the main signal processing tools of Music Information Retrieval (MIR) from audio data, and we apply them to two recordings (by Leslie Howard and Thomas Rajna) of Franz Lisztโ€™s Eฬtude S.136 no.1, with the aim of uncovering the macro-formal structure and comparing the interpretative styles of the two performers. In particular, after a thorough spectrogram analysis, we perform a segmentation based on the degree of novelty, in the sense of spectral dissimilarity, calculated frame-by-frame via the cosine distance. We then compare the metrical, temporal and timbrical features of the two executions by MIR tools. Via this method, we are able to identify in a data-driven way the different moments of the piece according to their melodic and harmonic content, and to find out that Rajnaโ€™s execution is faster and less various, in terms of intensity and timbre, than Howardโ€™s one. This enquiry represents a case study able to show the potentialities of MIR from audio data in supporting traditional music score analyses and in providing objective information for statistically founded musical execution analyses
    • โ€ฆ
    corecore