393 research outputs found

    Music Information Retrieval for Irish Traditional Music Automatic Analysis of Harmonic, Rhythmic, and Melodic Features for Ef๏ฌcient Key-Invariant Tune Recognition

    Get PDF
    Music making and listening practices increasingly rely on techno logy,and,asaconsequence,techniquesdevelopedinmusicinformation retrieval (MIR) research are more readily available to end users, in par ticular via online tools and smartphone apps. However, the majority of MIRresearchfocusesonWesternpopandclassicalmusic,andthusdoes not address speci๏ฌcities of other musical idioms. Irishtraditionalmusic(ITM)ispopularacrosstheglobe,withregular sessionsorganisedonallcontinents. ITMisadistinctivemusicalidiom, particularly in terms of heterophony and modality, and these character istics can constitute challenges for existing MIR algorithms. The bene ๏ฌtsofdevelopingMIRmethodsspeci๏ฌcallytailoredtoITMisevidenced by Tunepal, a query-by-playing tool that has become popular among ITM practitioners since its release in 2009. As of today, Tunepal is the state of the art for tune recognition in ITM. The research in this thesis addresses existing limitations of Tunepal. The main goal is to ๏ฌnd solutions to add key-invariance to the tune re cognitionsystem,animportantfeaturethatiscurrentlymissinginTune pal. Techniques from digital signal processing and machine learning are used and adapted to the speci๏ฌcities of ITM to extract harmonic iv and temporal features, respectively with improvements on existing key detection methods, and a novel method for rhythm classi๏ฌcation. These featuresarethenusedtodevelopakey-invarianttunerecognitionsystem that is computationally ef๏ฌcient while maintaining retrieval accuracy to a comparable level to that of the existing system

    Automatic recognition of Persian musical modes in audio musical signals

    Get PDF
    This research proposes new approaches for computational identification of Persian musical modes. This involves constructing a database of audio musical files and developing computer algorithms to perform a musical analysis of the samples. Essential features, the spectral average, chroma, and pitch histograms, and the use of symbolic data, are discussed and compared. A tonic detection algorithm is developed to align the feature vectors and to make the mode recognition methods independent of changes in tonality. Subsequently, a geometric distance measure, such as the Manhattan distance, which is preferred, and cross correlation, or a machine learning method (the Gaussian Mixture Models), is used to gauge similarity between a signal and a set of templates that are constructed in the training phase, in which data-driven patterns are made for each dastgร h (Persian mode). The effects of the following parameters are considered and assessed: the amount of training data; the parts of the frequency range to be used for training; down sampling; tone resolution (12-TET, 24-TET, 48-TET and 53-TET); the effect of using overlapping or nonoverlapping frames; and silence and high-energy suppression in pre-processing. The santur (hammered string instrument), which is extensively used in the musical database samples, is described and its physical properties are characterised; the pitch and harmonic deviations characteristic of it are measured; and the inharmonicity factor of the instrument is calculated for the first time. The results are applicable to Persian music and to other closely related musical traditions of the Mediterranean and the Near East. This approach enables content-based analyses of, and content-based searches of, musical archives. Potential applications of this research include: music information retrieval, audio snippet (thumbnailing), music archiving and access to archival content, audio compression and coding, associating of images with audio content, music transcription, music synthesis, music editors, music instruction, automatic music accompaniment, and setting new standards and symbols for musical notation

    Automatic transcription of traditional Turkish art music recordings: A computational ethnomusicology appraoach

    Get PDF
    Thesis (Doctoral)--Izmir Institute of Technology, Electronics and Communication Engineering, Izmir, 2012Includes bibliographical references (leaves: 96-109)Text in English; Abstract: Turkish and Englishxi, 131 leavesMusic Information Retrieval (MIR) is a recent research field, as an outcome of the revolutionary change in the distribution of, and access to the music recordings. Although MIR research already covers a wide range of applications, MIR methods are primarily developed for western music. Since the most important dimensions of music are fundamentally different in western and non-western musics, developing MIR methods for non-western musics is a challenging task. On the other hand, the discipline of ethnomusicology supplies some useful insights for the computational studies on nonwestern musics. Therefore, this thesis overcomes this challenging task within the framework of computational ethnomusicology, a new emerging interdisciplinary research domain. As a result, the main contribution of this study is the development of an automatic transcription system for traditional Turkish art music (Turkish music) for the first time in the literature. In order to develop such system for Turkish music, several subjects are also studied for the first time in the literature which constitute other contributions of the thesis: Automatic music transcription problem is considered from the perspective of ethnomusicology, an automatic makam recognition system is developed and the scale theory of Turkish music is evaluated computationally for nine makamlar in order to understand whether it can be used for makam detection. Furthermore, there is a wide geographical region such as Middle-East, North Africa and Asia sharing similarities with Turkish music. Therefore our study would also provide more relevant techniques and methods than the MIR literature for the study of these non-western musics

    Extraction and representation of semantic information in digital media

    Get PDF

    Computational Tonality Estimation: Signal Processing and Hidden Markov Models

    Get PDF
    PhDThis thesis investigates computational musical tonality estimation from an audio signal. We present a hidden Markov model (HMM) in which relationships between chords and keys are expressed as probabilities of emitting observable chords from a hidden key sequence. The model is tested first using symbolic chord annotations as observations, and gives excellent global key recognition rates on a set of Beatles songs. The initial model is extended for audio input by using an existing chord recognition algorithm, which allows it to be tested on a much larger database. We show that a simple model of the upper partials in the signal improves percentage scores. We also present a variant of the HMM which has a continuous observation probability density, but show that the discrete version gives better performance. Then follows a detailed analysis of the effects on key estimation and computation time of changing the low level signal processing parameters. We find that much of the high frequency information can be omitted without loss of accuracy, and significant computational savings can be made by applying a threshold to the transform kernels. Results show that there is no single ideal set of parameters for all music, but that tuning the parameters can make a difference to accuracy. We discuss methods of evaluating more complex tonal changes than a single global key, and compare a metric that measures similarity to a ground truth to metrics that are rooted in music retrieval. We show that the two measures give different results, and so recommend that the choice of evaluation metric is determined by the intended application. Finally we draw together our conclusions and use them to suggest areas for continuation of this research, in the areas of tonality model development, feature extraction, evaluation methodology, and applications of computational tonality estimation.Engineering and Physical Sciences Research Council (EPSRC)

    Music information retrieval: conceptuel framework, annotation and user behaviour

    Get PDF
    Understanding music is a process both based on and influenced by the knowledge and experience of the listener. Although content-based music retrieval has been given increasing attention in recent years, much of the research still focuses on bottom-up retrieval techniques. In order to make a music information retrieval system appealing and useful to the user, more effort should be spent on constructing systems that both operate directly on the encoding of the physical energy of music and are flexible with respect to usersโ€™ experiences. This thesis is based on a user-centred approach, taking into account the mutual relationship between music as an acoustic phenomenon and as an expressive phenomenon. The issues it addresses are: the lack of a conceptual framework, the shortage of annotated musical audio databases, the lack of understanding of the behaviour of system users and shortage of user-dependent knowledge with respect to high-level features of music. In the theoretical part of this thesis, a conceptual framework for content-based music information retrieval is defined. The proposed conceptual framework - the first of its kind - is conceived as a coordinating structure between the automatic description of low-level music content, and the description of high-level content by the system users. A general framework for the manual annotation of musical audio is outlined as well. A new methodology for the manual annotation of musical audio is introduced and tested in case studies. The results from these studies show that manually annotated music files can be of great help in the development of accurate analysis tools for music information retrieval. Empirical investigation is the foundation on which the aforementioned theoretical framework is built. Two elaborate studies involving different experimental issues are presented. In the first study, elements of signification related to spontaneous user behaviour are clarified. In the second study, a global profile of music information retrieval system users is given and their description of high-level content is discussed. This study has uncovered relationships between the usersโ€™ demographical background and their perception of expressive and structural features of music. Such a multi-level approach is exceptional as it included a large sample of the population of real users of interactive music systems. Tests have shown that the findings of this study are representative of the targeted population. Finally, the multi-purpose material provided by the theoretical background and the results from empirical investigations are put into practice in three music information retrieval applications: a prototype of a user interface based on a taxonomy, an annotated database of experimental findings and a prototype semantic user recommender system. Results are presented and discussed for all methods used. They show that, if reliably generated, the use of knowledge on users can significantly improve the quality of music content analysis. This thesis demonstrates that an informed knowledge of human approaches to music information retrieval provides valuable insights, which may be of particular assistance in the development of user-friendly, content-based access to digital music collections

    ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜์˜ ์Œ์•… ๋ฆฌ๋“œ ์‹œํŠธ ์ž๋™ ์ฑ„๋ณด ๋ฐ ๋ฉœ๋กœ๋”” ์œ ์‚ฌ๋„ ํ‰๊ฐ€

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2023. 2. ์ด๊ฒฝ์‹.Since the composition, arrangement, and distribution of music became convenient thanks to the digitization of the music industry, the number of newly supplied music recordings is increasing. Recently, due to platform environments being established whereby anyone can become a creator, user-created music such as their songs, cover songs, and remixes is being distributed through YouTube and TikTok. With such a large volume of musical recordings, the demand to transcribe music into sheet music has always existed for musicians. However, it requires musical knowledge and is time-consuming. This thesis studies automatic lead sheet transcription using deep neural networks. The development of transcription artificial intelligence (AI) can greatly reduce the time and cost for people in the music industry to find or transcribe sheet music. In addition, since the conversion from music sources to the form of digital music is possible, the applications could be expanded, such as music plagiarism detection and music composition AI. The thesis first proposes a model recognizing chords from audio signals. Chord recognition is an important task in music information retrieval since chords are highly abstract and descriptive features of music. We utilize a self-attention mechanism for chord recognition to focus on certain regions of chords. Through an attention map analysis, we visualize how attention is performed. It turns out that the model is able to divide segments of chords by utilizing the adaptive receptive field of the attention mechanism. This thesis proposes a note-level singing melody transcription model using sequence-to-sequence transformers. Overlapping decoding is introduced to solve the problem of the context between segments being broken. Applying pitch augmentation and adding a noisy dataset with data cleansing turns out to be effective in preventing overfitting and generalizing the model performance. Ablation studies demonstrate the effects of the proposed techniques in note-level singing melody transcription, both quantitatively and qualitatively. The proposed model outperforms other models in note-level singing melody transcription performance for all the metrics considered. Finally, subjective human evaluation demonstrates that the results of the proposed models are perceived as more accurate than the results of a previous study. Utilizing the above research results, we introduce the entire process of an automatic music lead sheet transcription. By combining various music information recognized from audio signals, we show that it is possible to transcribe lead sheets that express the core of popular music. Furthermore, we compare the results with lead sheets transcribed by musicians. Finally, we propose a melody similarity assessment method based on self-supervised learning by applying the automatic lead sheet transcription. We present convolutional neural networks that express the melody of lead sheet transcription results in embedding space. To apply self-supervised learning, we introduce methods of generating training data by musical data augmentation techniques. Furthermore, a loss function is presented to utilize the training data. Experimental results demonstrate that the proposed model is able to detect similar melodies of popular music from plagiarism and cover song cases.์Œ์•… ์‚ฐ์—…์˜ ๋””์ง€ํ„ธํ™”๋ฅผ ํ†ตํ•ด ์Œ์•…์˜ ์ž‘๊ณก, ํŽธ๊ณก ๋ฐ ์œ ํ†ต์ด ํŽธ๋ฆฌํ•ด์กŒ๊ธฐ ๋•Œ๋ฌธ์— ์ƒˆ๋กญ๊ฒŒ ๊ณต๊ธ‰๋˜๋Š” ์Œ์›์˜ ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ์ตœ๊ทผ์—๋Š” ๋ˆ„๊ตฌ๋‚˜ ํฌ๋ฆฌ์—์ดํ„ฐ๊ฐ€ ๋  ์ˆ˜ ์žˆ๋Š” ํ”Œ๋žซํผ ํ™˜๊ฒฝ์ด ๊ตฌ์ถ•๋˜์–ด, ์‚ฌ์šฉ์ž๊ฐ€ ๋งŒ๋“  ์ž์ž‘๊ณก, ์ปค๋ฒ„๊ณก, ๋ฆฌ๋ฏน์Šค ๋“ฑ์ด ์œ ํŠœ๋ธŒ, ํ‹ฑํ†ก์„ ํ†ตํ•ด ์œ ํ†ต๋˜๊ณ  ์žˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๋งŽ์€ ์–‘์˜ ์Œ์•…์— ๋Œ€ํ•ด, ์Œ์•…์„ ์•…๋ณด๋กœ ์ฑ„๋ณดํ•˜๊ณ ์ž ํ•˜๋Š” ์ˆ˜์š”๋Š” ์Œ์•…๊ฐ€๋“ค์—๊ฒŒ ํ•ญ์ƒ ์กด์žฌํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์•…๋ณด ์ฑ„๋ณด์—๋Š” ์Œ์•…์  ์ง€์‹์ด ํ•„์š”ํ•˜๊ณ , ์‹œ๊ฐ„๊ณผ ๋น„์šฉ์ด ๋งŽ์ด ์†Œ์š”๋œ๋‹ค๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ํ™œ์šฉํ•˜์—ฌ ์Œ์•… ๋ฆฌ๋“œ ์‹œํŠธ ์•…๋ณด ์ž๋™ ์ฑ„๋ณด ๊ธฐ๋ฒ•์„ ์—ฐ๊ตฌํ•œ๋‹ค. ์ฑ„๋ณด ์ธ๊ณต์ง€๋Šฅ์˜ ๊ฐœ๋ฐœ์€ ์Œ์•… ์ข…์‚ฌ์ž ๋ฐ ์—ฐ์ฃผ์ž๋“ค์ด ์•…๋ณด๋ฅผ ๊ตฌํ•˜๊ฑฐ๋‚˜ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ์†Œ๋ชจํ•˜๋Š” ์‹œ๊ฐ„๊ณผ ๋น„์šฉ์„ ํฌ๊ฒŒ ์ค„์—ฌ ์ค„ ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ์Œ์›์—์„œ ๋””์ง€ํ„ธ ์•…๋ณด ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜์ด ๊ฐ€๋Šฅํ•ด์ง€๋ฏ€๋กœ, ์ž๋™ ํ‘œ์ ˆ ํƒ์ง€, ์ž‘๊ณก ์ธ๊ณต์ง€๋Šฅ ํ•™์Šต ๋“ฑ ๋‹ค์–‘ํ•˜๊ฒŒ ํ™œ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๋ฆฌ๋“œ ์‹œํŠธ ์ฑ„๋ณด๋ฅผ ์œ„ํ•ด, ๋จผ์ € ์˜ค๋””์˜ค ์‹ ํ˜ธ๋กœ๋ถ€ํ„ฐ ์ฝ”๋“œ๋ฅผ ์ธ์‹ํ•˜๋Š” ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ์Œ์•…์—์„œ ์ฝ”๋“œ๋Š” ํ•จ์ถ•์ ์ด๊ณ  ํ‘œํ˜„์ ์ธ ์Œ์•…์˜ ์ค‘์š”ํ•œ ํŠน์ง•์ด๋ฏ€๋กœ ์ด๋ฅผ ์ธ์‹ํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ์ฝ”๋“œ ๊ตฌ๊ฐ„ ์ธ์‹์„ ์œ„ํ•ด, ์–ดํ…์…˜ ๋งค์ปค๋‹ˆ์ฆ˜์„ ์ด์šฉํ•˜๋Š” ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ์ œ์‹œํ•œ๋‹ค. ์–ดํ…์…˜ ์ง€๋„ ๋ถ„์„์„ ํ†ตํ•ด, ์–ดํ…์…˜์ด ์‹ค์ œ๋กœ ์–ด๋–ป๊ฒŒ ์ ์šฉ๋˜๋Š”์ง€ ์‹œ๊ฐํ™”ํ•˜๊ณ , ๋ชจ๋ธ์ด ์ฝ”๋“œ์˜ ๊ตฌ๊ฐ„์„ ๋‚˜๋ˆ„๊ณ  ์ธ์‹ํ•˜๋Š” ๊ณผ์ •์„ ์‚ดํŽด๋ณธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹œํ€€์Šค ํˆฌ ์‹œํ€€์Šค ํŠธ๋žœ์Šคํฌ๋จธ๋ฅผ ์ด์šฉํ•œ ์Œํ‘œ ์ˆ˜์ค€์˜ ๊ฐ€์ฐฝ ๋ฉœ๋กœ๋”” ์ฑ„๋ณด ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ๋””์ฝ”๋”ฉ ๊ณผ์ •์—์„œ ๊ฐ ๊ตฌ๊ฐ„ ์‚ฌ์ด์˜ ๋ฌธ๋งฅ ์ •๋ณด๊ฐ€ ๋‹จ์ ˆ๋˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ค‘์ฒฉ ๋””์ฝ”๋”ฉ์„ ๋„์ž…ํ•œ๋‹ค. ๋ฐ์ดํ„ฐ ๋ณ€ํ˜• ๊ธฐ๋ฒ•์œผ๋กœ ์Œ๋†’์ด ๋ณ€ํ˜•์„ ์ ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ๋ฐ์ดํ„ฐ ํด๋ Œ์ง•์„ ํ†ตํ•ด ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ •๋Ÿ‰ ๋ฐ ์ •์„ฑ์ ์ธ ๋น„๊ต๋ฅผ ํ†ตํ•ด ์ œ์•ˆํ•œ ๊ธฐ๋ฒ•๋“ค์ด ์„ฑ๋Šฅ ๊ฐœ์„ ์— ๋„์›€์ด ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๊ณ , ์ œ์•ˆ๋ชจ๋ธ์ด MIR-ST500 ๋ฐ์ดํ„ฐ ์…‹์— ๋Œ€ํ•œ ์Œํ‘œ ์ˆ˜์ค€์˜ ๊ฐ€์ฐฝ ๋ฉœ๋กœ๋”” ์ฑ„๋ณด ์„ฑ๋Šฅ์—์„œ ๊ฐ€์žฅ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ์ถ”๊ฐ€๋กœ ์ฃผ๊ด€์ ์ธ ์‚ฌ๋žŒ์˜ ํ‰๊ฐ€์—์„œ ์ œ์•ˆ ๋ชจ๋ธ์˜ ์ฑ„๋ณด ๊ฒฐ๊ณผ๊ฐ€ ์ด์ „ ๋ชจ๋ธ๋ณด๋‹ค ์ € ์ •ํ™•ํ•˜๋‹ค๊ณ  ์ธ์‹๋จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์•ž์˜ ์—ฐ๊ตฌ์˜ ๊ฒฐ๊ณผ๋ฅผ ํ™œ์šฉํ•˜์—ฌ, ์Œ์•… ๋ฆฌ๋“œ ์‹œํŠธ ์ž๋™ ์ฑ„๋ณด์˜ ์ „์ฒด ๊ณผ์ •์„ ์ œ์‹œํ•œ๋‹ค. ์˜ค๋””์˜ค ์‹ ํ˜ธ๋กœ๋ถ€ํ„ฐ ์ธ์‹ํ•œ ๋‹ค์–‘ํ•œ ์Œ์•… ์ •๋ณด๋ฅผ ์ข…ํ•ฉํ•˜์—ฌ, ๋Œ€์ค‘ ์Œ์•… ์˜ค๋””์˜ค ์‹ ํ˜ธ์˜ ํ•ต์‹ฌ์„ ํ‘œํ˜„ํ•˜๋Š” ๋ฆฌ๋“œ ์‹œํŠธ ์•…๋ณด ์ฑ„๋ณด๊ฐ€ ๊ฐ€๋Šฅํ•จ์„ ๋ณด์ธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋ฅผ ์ „๋ฌธ๊ฐ€๊ฐ€ ์ œ์ž‘ํ•œ ๋ฆฌ๋“œ์‹œํŠธ์™€ ๋น„๊ตํ•˜์—ฌ ๋ถ„์„ํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ฆฌ๋“œ ์‹œํŠธ ์•…๋ณด ์ž๋™ ์ฑ„๋ณด ๊ธฐ๋ฒ•์„ ์‘์šฉํ•˜์—ฌ, ์ž๊ธฐ ์ง€๋„ ํ•™์Šต ๊ธฐ๋ฐ˜ ๋ฉœ๋กœ๋”” ์œ ์‚ฌ๋„ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ฆฌ๋“œ ์‹œํŠธ ์ฑ„๋ณด ๊ฒฐ๊ณผ์˜ ๋ฉœ๋กœ๋””๋ฅผ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์— ํ‘œํ˜„ํ•˜๋Š” ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์„ ์ œ์‹œํ•œ๋‹ค. ์ž๊ธฐ์ง€๋„ ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด, ์Œ์•…์  ๋ฐ์ดํ„ฐ ๋ณ€ํ˜• ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ค€๋น„๋œ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜๋Š” ์‹ฌ์ธต ๊ฑฐ๋ฆฌ ํ•™์Šต ์†์‹คํ•จ์ˆ˜๋ฅผ ์„ค๊ณ„ํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ถ„์„์„ ํ†ตํ•ด, ์ œ์•ˆ ๋ชจ๋ธ์ด ํ‘œ์ ˆ ๋ฐ ์ปค๋ฒ„์†ก ์ผ€์ด์Šค์—์„œ ๋Œ€์ค‘์Œ์•…์˜ ์œ ์‚ฌํ•œ ๋ฉœ๋กœ๋””๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•œ๋‹ค.Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Objectives 4 1.3 Thesis Outline 6 Chapter 2 Literature Review 7 2.1 Attention Mechanism and Transformers 7 2.1.1 Attention-based Models 7 2.1.2 Transformers with Musical Event Sequence 8 2.2 Chord Recognition 11 2.3 Note-level Singing Melody Transcription 13 2.4 Musical Key Estimation 15 2.5 Beat Tracking 17 2.6 Music Plagiarism Detection and Cover Song Identi cation 19 2.7 Deep Metric Learning and Triplet Loss 21 Chapter 3 Problem De nition 23 3.1 Lead Sheet Transcription 23 3.1.1 Chord Recognition 24 3.1.2 Singing Melody Transcription 25 3.1.3 Post-processing for Lead Sheet Representation 26 3.2 Melody Similarity Assessment 28 Chapter 4 A Bi-directional Transformer for Musical Chord Recognition 29 4.1 Methodology 29 4.1.1 Model Architecture 29 4.1.2 Self-attention in Chord Recognition 33 4.2 Experiments 35 4.2.1 Datasets 35 4.2.2 Preprocessing 35 4.2.3 Evaluation Metrics 36 4.2.4 Training 37 4.3 Results 38 4.3.1 Quantitative Evaluation 38 4.3.2 Attention Map Analysis 41 Chapter 5 Note-level Singing Melody Transcription 44 5.1 Methodology 44 5.1.1 Monophonic Note Event Sequence 44 5.1.2 Audio Features 45 5.1.3 Model Architecture 46 5.1.4 Autoregressive Decoding and Monophonic Masking 47 5.1.5 Overlapping Decoding 47 5.1.6 Pitch Augmentation 49 5.1.7 Adding Noisy Dataset with Data Cleansing 50 5.2 Experiments 51 5.2.1 Dataset 51 5.2.2 Experiment Con gurations 52 5.2.3 Evaluation Metrics 53 5.2.4 Comparison Models 54 5.2.5 Human Evaluation 55 5.3 Results 56 5.3.1 Ablation Study 56 5.3.2 Note-level Transcription Model Comparison 59 5.3.3 Transcription Performance Distribution Analysis 59 5.3.4 Fundamental Frequency (F0) Metric Evaluation 60 5.4 Qualitative Analysis 62 5.4.1 Visualization of Ablation Study 62 5.4.2 Spectrogram Analysis 65 5.4.3 Human Evaluation 67 Chapter 6 Automatic Music Lead Sheet Transcription 68 6.1 Post-processing for Lead Sheet Representation 68 6.2 Lead Sheet Transcription Results 71 Chapter 7 Melody Similarity Assessment with Self-supervised Convolutional Neural Networks 77 7.1 Methodology 77 7.1.1 Input Data Representation 77 7.1.2 Data Augmentation 78 7.1.3 Model Architecture 82 7.1.4 Loss Function 84 7.1.5 De nition of Distance between Songs 85 7.2 Experiments 87 7.2.1 Dataset 87 7.2.2 Training 88 7.2.3 Evaluation Metrics 88 7.3 Results 89 7.3.1 Quantitative Evaluation 89 7.3.2 Qualitative Evaluation 99 Chapter 8 Conclusion 107 8.1 Summary and Contributions 107 8.2 Limitations and Future Research 110 Bibliography 111 ๊ตญ๋ฌธ์ดˆ๋ก 126๋ฐ•

    Sequential decision making in artificial musical intelligence

    Get PDF
    Over the past 60 years, artificial intelligence has grown from a largely academic field of research to a ubiquitous array of tools and approaches used in everyday technology. Despite its many recent successes and growing prevalence, certain meaningful facets of computational intelligence have not been as thoroughly explored. Such additional facets cover a wide array of complex mental tasks which humans carry out easily, yet are difficult for computers to mimic. A prime example of a domain in which human intelligence thrives, but machine understanding is still fairly limited, is music. Over the last decade, many researchers have applied computational tools to carry out tasks such as genre identification, music summarization, music database querying, and melodic segmentation. While these are all useful algorithmic solutions, we are still a long way from constructing complete music agents, able to mimic (at least partially) the complexity with which humans approach music. One key aspect which hasn't been sufficiently studied is that of sequential decision making in musical intelligence. This thesis strives to answer the following question: Can a sequential decision making perspective guide us in the creation of better music agents, and social agents in general? And if so, how? More specifically, this thesis focuses on two aspects of musical intelligence: music recommendation and human-agent (and more generally agent-agent) interaction in the context of music. The key contributions of this thesis are the design of better music playlist recommendation algorithms; the design of algorithms for tracking user preferences over time; new approaches for modeling people's behavior in situations that involve music; and the design of agents capable of meaningful interaction with humans and other agents in a setting where music plays a roll (either directly or indirectly). Though motivated primarily by music-related tasks, and focusing largely on people's musical preferences, this thesis also establishes that insights from music-specific case studies can also be applicable in other concrete social domains, such as different types of content recommendation. Showing the generality of insights from musical data in other contexts serves as evidence for the utility of music domains as testbeds for the development of general artificial intelligence techniques. Ultimately, this thesis demonstrates the overall usefulness of taking a sequential decision making approach in settings previously unexplored from this perspectiveComputer Science
    • โ€ฆ
    corecore