357 research outputs found

    Progressive Filtering Using Multiresolution Histograms for Query by Humming System

    Full text link
    The rising availability of digital music stipulates effective categorization and retrieval methods. Real world scenarios are characterized by mammoth music collections through pertinent and non-pertinent songs with reference to the user input. The primary goal of the research work is to counter balance the perilous impact of non-relevant songs through Progressive Filtering (PF) for Query by Humming (QBH) system. PF is a technique of problem solving through reduced space. This paper presents the concept of PF and its efficient design based on Multi-Resolution Histograms (MRH) to accomplish searching in manifolds. Initially the entire music database is searched to obtain high recall rate and narrowed search space. Later steps accomplish slow search in the reduced periphery and achieve additional accuracy. Experimentation on large music database using recursive programming substantiates the potential of the method. The outcome of proposed strategy glimpses that MRH effectively locate the patterns. Distances of MRH at lower level are the lower bounds of the distances at higher level, which guarantees evasion of false dismissals during PF. In due course, proposed method helps to strike a balance between efficiency and effectiveness. The system is scalable for large music retrieval systems and also data driven for performance optimization as an added advantage.Comment: 12 Pages, 6 Figures, Full version of the paper published at ICMCCA-2012 with the same title, Link:http://link.springer.com/chapter/10.1007/978-81-322-1143-3_2

    Organizing digital music for use: an examination of personal music collections

    Get PDF
    Current research on music information retrieval and music digital libraries focuses on providing access to huge, public music collections. In this paper we consider a different, but related, problem: supporting an individual in maintaining and using a personal music collection. We analyze organization and access techniques used to manage personal music collections (primarily CDs and MP3 files), and from these behaviors, to suggest user behaviors that should be supported in a personal music digital library (that is, a digital library of an individual's personal music collection)

    ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜์˜ ์Œ์•… ๋ฆฌ๋“œ ์‹œํŠธ ์ž๋™ ์ฑ„๋ณด ๋ฐ ๋ฉœ๋กœ๋”” ์œ ์‚ฌ๋„ ํ‰๊ฐ€

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2023. 2. ์ด๊ฒฝ์‹.Since the composition, arrangement, and distribution of music became convenient thanks to the digitization of the music industry, the number of newly supplied music recordings is increasing. Recently, due to platform environments being established whereby anyone can become a creator, user-created music such as their songs, cover songs, and remixes is being distributed through YouTube and TikTok. With such a large volume of musical recordings, the demand to transcribe music into sheet music has always existed for musicians. However, it requires musical knowledge and is time-consuming. This thesis studies automatic lead sheet transcription using deep neural networks. The development of transcription artificial intelligence (AI) can greatly reduce the time and cost for people in the music industry to find or transcribe sheet music. In addition, since the conversion from music sources to the form of digital music is possible, the applications could be expanded, such as music plagiarism detection and music composition AI. The thesis first proposes a model recognizing chords from audio signals. Chord recognition is an important task in music information retrieval since chords are highly abstract and descriptive features of music. We utilize a self-attention mechanism for chord recognition to focus on certain regions of chords. Through an attention map analysis, we visualize how attention is performed. It turns out that the model is able to divide segments of chords by utilizing the adaptive receptive field of the attention mechanism. This thesis proposes a note-level singing melody transcription model using sequence-to-sequence transformers. Overlapping decoding is introduced to solve the problem of the context between segments being broken. Applying pitch augmentation and adding a noisy dataset with data cleansing turns out to be effective in preventing overfitting and generalizing the model performance. Ablation studies demonstrate the effects of the proposed techniques in note-level singing melody transcription, both quantitatively and qualitatively. The proposed model outperforms other models in note-level singing melody transcription performance for all the metrics considered. Finally, subjective human evaluation demonstrates that the results of the proposed models are perceived as more accurate than the results of a previous study. Utilizing the above research results, we introduce the entire process of an automatic music lead sheet transcription. By combining various music information recognized from audio signals, we show that it is possible to transcribe lead sheets that express the core of popular music. Furthermore, we compare the results with lead sheets transcribed by musicians. Finally, we propose a melody similarity assessment method based on self-supervised learning by applying the automatic lead sheet transcription. We present convolutional neural networks that express the melody of lead sheet transcription results in embedding space. To apply self-supervised learning, we introduce methods of generating training data by musical data augmentation techniques. Furthermore, a loss function is presented to utilize the training data. Experimental results demonstrate that the proposed model is able to detect similar melodies of popular music from plagiarism and cover song cases.์Œ์•… ์‚ฐ์—…์˜ ๋””์ง€ํ„ธํ™”๋ฅผ ํ†ตํ•ด ์Œ์•…์˜ ์ž‘๊ณก, ํŽธ๊ณก ๋ฐ ์œ ํ†ต์ด ํŽธ๋ฆฌํ•ด์กŒ๊ธฐ ๋•Œ๋ฌธ์— ์ƒˆ๋กญ๊ฒŒ ๊ณต๊ธ‰๋˜๋Š” ์Œ์›์˜ ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ์ตœ๊ทผ์—๋Š” ๋ˆ„๊ตฌ๋‚˜ ํฌ๋ฆฌ์—์ดํ„ฐ๊ฐ€ ๋  ์ˆ˜ ์žˆ๋Š” ํ”Œ๋žซํผ ํ™˜๊ฒฝ์ด ๊ตฌ์ถ•๋˜์–ด, ์‚ฌ์šฉ์ž๊ฐ€ ๋งŒ๋“  ์ž์ž‘๊ณก, ์ปค๋ฒ„๊ณก, ๋ฆฌ๋ฏน์Šค ๋“ฑ์ด ์œ ํŠœ๋ธŒ, ํ‹ฑํ†ก์„ ํ†ตํ•ด ์œ ํ†ต๋˜๊ณ  ์žˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๋งŽ์€ ์–‘์˜ ์Œ์•…์— ๋Œ€ํ•ด, ์Œ์•…์„ ์•…๋ณด๋กœ ์ฑ„๋ณดํ•˜๊ณ ์ž ํ•˜๋Š” ์ˆ˜์š”๋Š” ์Œ์•…๊ฐ€๋“ค์—๊ฒŒ ํ•ญ์ƒ ์กด์žฌํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์•…๋ณด ์ฑ„๋ณด์—๋Š” ์Œ์•…์  ์ง€์‹์ด ํ•„์š”ํ•˜๊ณ , ์‹œ๊ฐ„๊ณผ ๋น„์šฉ์ด ๋งŽ์ด ์†Œ์š”๋œ๋‹ค๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ํ™œ์šฉํ•˜์—ฌ ์Œ์•… ๋ฆฌ๋“œ ์‹œํŠธ ์•…๋ณด ์ž๋™ ์ฑ„๋ณด ๊ธฐ๋ฒ•์„ ์—ฐ๊ตฌํ•œ๋‹ค. ์ฑ„๋ณด ์ธ๊ณต์ง€๋Šฅ์˜ ๊ฐœ๋ฐœ์€ ์Œ์•… ์ข…์‚ฌ์ž ๋ฐ ์—ฐ์ฃผ์ž๋“ค์ด ์•…๋ณด๋ฅผ ๊ตฌํ•˜๊ฑฐ๋‚˜ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ์†Œ๋ชจํ•˜๋Š” ์‹œ๊ฐ„๊ณผ ๋น„์šฉ์„ ํฌ๊ฒŒ ์ค„์—ฌ ์ค„ ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ์Œ์›์—์„œ ๋””์ง€ํ„ธ ์•…๋ณด ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜์ด ๊ฐ€๋Šฅํ•ด์ง€๋ฏ€๋กœ, ์ž๋™ ํ‘œ์ ˆ ํƒ์ง€, ์ž‘๊ณก ์ธ๊ณต์ง€๋Šฅ ํ•™์Šต ๋“ฑ ๋‹ค์–‘ํ•˜๊ฒŒ ํ™œ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๋ฆฌ๋“œ ์‹œํŠธ ์ฑ„๋ณด๋ฅผ ์œ„ํ•ด, ๋จผ์ € ์˜ค๋””์˜ค ์‹ ํ˜ธ๋กœ๋ถ€ํ„ฐ ์ฝ”๋“œ๋ฅผ ์ธ์‹ํ•˜๋Š” ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ์Œ์•…์—์„œ ์ฝ”๋“œ๋Š” ํ•จ์ถ•์ ์ด๊ณ  ํ‘œํ˜„์ ์ธ ์Œ์•…์˜ ์ค‘์š”ํ•œ ํŠน์ง•์ด๋ฏ€๋กœ ์ด๋ฅผ ์ธ์‹ํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ์ฝ”๋“œ ๊ตฌ๊ฐ„ ์ธ์‹์„ ์œ„ํ•ด, ์–ดํ…์…˜ ๋งค์ปค๋‹ˆ์ฆ˜์„ ์ด์šฉํ•˜๋Š” ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ์ œ์‹œํ•œ๋‹ค. ์–ดํ…์…˜ ์ง€๋„ ๋ถ„์„์„ ํ†ตํ•ด, ์–ดํ…์…˜์ด ์‹ค์ œ๋กœ ์–ด๋–ป๊ฒŒ ์ ์šฉ๋˜๋Š”์ง€ ์‹œ๊ฐํ™”ํ•˜๊ณ , ๋ชจ๋ธ์ด ์ฝ”๋“œ์˜ ๊ตฌ๊ฐ„์„ ๋‚˜๋ˆ„๊ณ  ์ธ์‹ํ•˜๋Š” ๊ณผ์ •์„ ์‚ดํŽด๋ณธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹œํ€€์Šค ํˆฌ ์‹œํ€€์Šค ํŠธ๋žœ์Šคํฌ๋จธ๋ฅผ ์ด์šฉํ•œ ์Œํ‘œ ์ˆ˜์ค€์˜ ๊ฐ€์ฐฝ ๋ฉœ๋กœ๋”” ์ฑ„๋ณด ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ๋””์ฝ”๋”ฉ ๊ณผ์ •์—์„œ ๊ฐ ๊ตฌ๊ฐ„ ์‚ฌ์ด์˜ ๋ฌธ๋งฅ ์ •๋ณด๊ฐ€ ๋‹จ์ ˆ๋˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ค‘์ฒฉ ๋””์ฝ”๋”ฉ์„ ๋„์ž…ํ•œ๋‹ค. ๋ฐ์ดํ„ฐ ๋ณ€ํ˜• ๊ธฐ๋ฒ•์œผ๋กœ ์Œ๋†’์ด ๋ณ€ํ˜•์„ ์ ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ๋ฐ์ดํ„ฐ ํด๋ Œ์ง•์„ ํ†ตํ•ด ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ •๋Ÿ‰ ๋ฐ ์ •์„ฑ์ ์ธ ๋น„๊ต๋ฅผ ํ†ตํ•ด ์ œ์•ˆํ•œ ๊ธฐ๋ฒ•๋“ค์ด ์„ฑ๋Šฅ ๊ฐœ์„ ์— ๋„์›€์ด ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๊ณ , ์ œ์•ˆ๋ชจ๋ธ์ด MIR-ST500 ๋ฐ์ดํ„ฐ ์…‹์— ๋Œ€ํ•œ ์Œํ‘œ ์ˆ˜์ค€์˜ ๊ฐ€์ฐฝ ๋ฉœ๋กœ๋”” ์ฑ„๋ณด ์„ฑ๋Šฅ์—์„œ ๊ฐ€์žฅ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ์ถ”๊ฐ€๋กœ ์ฃผ๊ด€์ ์ธ ์‚ฌ๋žŒ์˜ ํ‰๊ฐ€์—์„œ ์ œ์•ˆ ๋ชจ๋ธ์˜ ์ฑ„๋ณด ๊ฒฐ๊ณผ๊ฐ€ ์ด์ „ ๋ชจ๋ธ๋ณด๋‹ค ์ € ์ •ํ™•ํ•˜๋‹ค๊ณ  ์ธ์‹๋จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์•ž์˜ ์—ฐ๊ตฌ์˜ ๊ฒฐ๊ณผ๋ฅผ ํ™œ์šฉํ•˜์—ฌ, ์Œ์•… ๋ฆฌ๋“œ ์‹œํŠธ ์ž๋™ ์ฑ„๋ณด์˜ ์ „์ฒด ๊ณผ์ •์„ ์ œ์‹œํ•œ๋‹ค. ์˜ค๋””์˜ค ์‹ ํ˜ธ๋กœ๋ถ€ํ„ฐ ์ธ์‹ํ•œ ๋‹ค์–‘ํ•œ ์Œ์•… ์ •๋ณด๋ฅผ ์ข…ํ•ฉํ•˜์—ฌ, ๋Œ€์ค‘ ์Œ์•… ์˜ค๋””์˜ค ์‹ ํ˜ธ์˜ ํ•ต์‹ฌ์„ ํ‘œํ˜„ํ•˜๋Š” ๋ฆฌ๋“œ ์‹œํŠธ ์•…๋ณด ์ฑ„๋ณด๊ฐ€ ๊ฐ€๋Šฅํ•จ์„ ๋ณด์ธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋ฅผ ์ „๋ฌธ๊ฐ€๊ฐ€ ์ œ์ž‘ํ•œ ๋ฆฌ๋“œ์‹œํŠธ์™€ ๋น„๊ตํ•˜์—ฌ ๋ถ„์„ํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ฆฌ๋“œ ์‹œํŠธ ์•…๋ณด ์ž๋™ ์ฑ„๋ณด ๊ธฐ๋ฒ•์„ ์‘์šฉํ•˜์—ฌ, ์ž๊ธฐ ์ง€๋„ ํ•™์Šต ๊ธฐ๋ฐ˜ ๋ฉœ๋กœ๋”” ์œ ์‚ฌ๋„ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ฆฌ๋“œ ์‹œํŠธ ์ฑ„๋ณด ๊ฒฐ๊ณผ์˜ ๋ฉœ๋กœ๋””๋ฅผ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์— ํ‘œํ˜„ํ•˜๋Š” ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์„ ์ œ์‹œํ•œ๋‹ค. ์ž๊ธฐ์ง€๋„ ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด, ์Œ์•…์  ๋ฐ์ดํ„ฐ ๋ณ€ํ˜• ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ค€๋น„๋œ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜๋Š” ์‹ฌ์ธต ๊ฑฐ๋ฆฌ ํ•™์Šต ์†์‹คํ•จ์ˆ˜๋ฅผ ์„ค๊ณ„ํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ถ„์„์„ ํ†ตํ•ด, ์ œ์•ˆ ๋ชจ๋ธ์ด ํ‘œ์ ˆ ๋ฐ ์ปค๋ฒ„์†ก ์ผ€์ด์Šค์—์„œ ๋Œ€์ค‘์Œ์•…์˜ ์œ ์‚ฌํ•œ ๋ฉœ๋กœ๋””๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•œ๋‹ค.Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Objectives 4 1.3 Thesis Outline 6 Chapter 2 Literature Review 7 2.1 Attention Mechanism and Transformers 7 2.1.1 Attention-based Models 7 2.1.2 Transformers with Musical Event Sequence 8 2.2 Chord Recognition 11 2.3 Note-level Singing Melody Transcription 13 2.4 Musical Key Estimation 15 2.5 Beat Tracking 17 2.6 Music Plagiarism Detection and Cover Song Identi cation 19 2.7 Deep Metric Learning and Triplet Loss 21 Chapter 3 Problem De nition 23 3.1 Lead Sheet Transcription 23 3.1.1 Chord Recognition 24 3.1.2 Singing Melody Transcription 25 3.1.3 Post-processing for Lead Sheet Representation 26 3.2 Melody Similarity Assessment 28 Chapter 4 A Bi-directional Transformer for Musical Chord Recognition 29 4.1 Methodology 29 4.1.1 Model Architecture 29 4.1.2 Self-attention in Chord Recognition 33 4.2 Experiments 35 4.2.1 Datasets 35 4.2.2 Preprocessing 35 4.2.3 Evaluation Metrics 36 4.2.4 Training 37 4.3 Results 38 4.3.1 Quantitative Evaluation 38 4.3.2 Attention Map Analysis 41 Chapter 5 Note-level Singing Melody Transcription 44 5.1 Methodology 44 5.1.1 Monophonic Note Event Sequence 44 5.1.2 Audio Features 45 5.1.3 Model Architecture 46 5.1.4 Autoregressive Decoding and Monophonic Masking 47 5.1.5 Overlapping Decoding 47 5.1.6 Pitch Augmentation 49 5.1.7 Adding Noisy Dataset with Data Cleansing 50 5.2 Experiments 51 5.2.1 Dataset 51 5.2.2 Experiment Con gurations 52 5.2.3 Evaluation Metrics 53 5.2.4 Comparison Models 54 5.2.5 Human Evaluation 55 5.3 Results 56 5.3.1 Ablation Study 56 5.3.2 Note-level Transcription Model Comparison 59 5.3.3 Transcription Performance Distribution Analysis 59 5.3.4 Fundamental Frequency (F0) Metric Evaluation 60 5.4 Qualitative Analysis 62 5.4.1 Visualization of Ablation Study 62 5.4.2 Spectrogram Analysis 65 5.4.3 Human Evaluation 67 Chapter 6 Automatic Music Lead Sheet Transcription 68 6.1 Post-processing for Lead Sheet Representation 68 6.2 Lead Sheet Transcription Results 71 Chapter 7 Melody Similarity Assessment with Self-supervised Convolutional Neural Networks 77 7.1 Methodology 77 7.1.1 Input Data Representation 77 7.1.2 Data Augmentation 78 7.1.3 Model Architecture 82 7.1.4 Loss Function 84 7.1.5 De nition of Distance between Songs 85 7.2 Experiments 87 7.2.1 Dataset 87 7.2.2 Training 88 7.2.3 Evaluation Metrics 88 7.3 Results 89 7.3.1 Quantitative Evaluation 89 7.3.2 Qualitative Evaluation 99 Chapter 8 Conclusion 107 8.1 Summary and Contributions 107 8.2 Limitations and Future Research 110 Bibliography 111 ๊ตญ๋ฌธ์ดˆ๋ก 126๋ฐ•

    Musical audio-mining

    Get PDF

    Singing Voice Recognition for Music Information Retrieval

    Get PDF
    This thesis proposes signal processing methods for analysis of singing voice audio signals, with the objectives of obtaining information about the identity and lyrics content of the singing. Two main topics are presented, singer identification in monophonic and polyphonic music, and lyrics transcription and alignment. The information automatically extracted from the singing voice is meant to be used for applications such as music classification, sorting and organizing music databases, music information retrieval, etc. For singer identification, the thesis introduces methods from general audio classification and specific methods for dealing with the presence of accompaniment. The emphasis is on singer identification in polyphonic audio, where the singing voice is present along with musical accompaniment. The presence of instruments is detrimental to voice identification performance, and eliminating the effect of instrumental accompaniment is an important aspect of the problem. The study of singer identification is centered around the degradation of classification performance in presence of instruments, and separation of the vocal line for improving performance. For the study, monophonic singing was mixed with instrumental accompaniment at different signal-to-noise (singing-to-accompaniment) ratios and the classification process was performed on the polyphonic mixture and on the vocal line separated from the polyphonic mixture. The method for classification including the step for separating the vocals is improving significantly the performance compared to classification of the polyphonic mixtures, but not close to the performance in classifying the monophonic singing itself. Nevertheless, the results show that classification of singing voices can be done robustly in polyphonic music when using source separation. In the problem of lyrics transcription, the thesis introduces the general speech recognition framework and various adjustments that can be done before applying the methods on singing voice. The variability of phonation in singing poses a significant challenge to the speech recognition approach. The thesis proposes using phoneme models trained on speech data and adapted to singing voice characteristics for the recognition of phonemes and words from a singing voice signal. Language models and adaptation techniques are an important aspect of the recognition process. There are two different ways of recognizing the phonemes in the audio: one is alignment, when the true transcription is known and the phonemes have to be located, other one is recognition, when both transcription and location of phonemes have to be found. The alignment is, obviously, a simplified form of the recognition task. Alignment of textual lyrics to music audio is performed by aligning the phonetic transcription of the lyrics with the vocal line separated from the polyphonic mixture, using a collection of commercial songs. The word recognition is tested for transcription of lyrics from monophonic singing. The performance of the proposed system for automatic alignment of lyrics and audio is sufficient for facilitating applications such as automatic karaoke annotation or song browsing. The word recognition accuracy of the lyrics transcription from singing is quite low, but it is shown to be useful in a query-by-singing application, for performing a textual search based on the words recognized from the query. When some key words in the query are recognized, the song can be reliably identified

    Music information retrieval: conceptuel framework, annotation and user behaviour

    Get PDF
    Understanding music is a process both based on and influenced by the knowledge and experience of the listener. Although content-based music retrieval has been given increasing attention in recent years, much of the research still focuses on bottom-up retrieval techniques. In order to make a music information retrieval system appealing and useful to the user, more effort should be spent on constructing systems that both operate directly on the encoding of the physical energy of music and are flexible with respect to usersโ€™ experiences. This thesis is based on a user-centred approach, taking into account the mutual relationship between music as an acoustic phenomenon and as an expressive phenomenon. The issues it addresses are: the lack of a conceptual framework, the shortage of annotated musical audio databases, the lack of understanding of the behaviour of system users and shortage of user-dependent knowledge with respect to high-level features of music. In the theoretical part of this thesis, a conceptual framework for content-based music information retrieval is defined. The proposed conceptual framework - the first of its kind - is conceived as a coordinating structure between the automatic description of low-level music content, and the description of high-level content by the system users. A general framework for the manual annotation of musical audio is outlined as well. A new methodology for the manual annotation of musical audio is introduced and tested in case studies. The results from these studies show that manually annotated music files can be of great help in the development of accurate analysis tools for music information retrieval. Empirical investigation is the foundation on which the aforementioned theoretical framework is built. Two elaborate studies involving different experimental issues are presented. In the first study, elements of signification related to spontaneous user behaviour are clarified. In the second study, a global profile of music information retrieval system users is given and their description of high-level content is discussed. This study has uncovered relationships between the usersโ€™ demographical background and their perception of expressive and structural features of music. Such a multi-level approach is exceptional as it included a large sample of the population of real users of interactive music systems. Tests have shown that the findings of this study are representative of the targeted population. Finally, the multi-purpose material provided by the theoretical background and the results from empirical investigations are put into practice in three music information retrieval applications: a prototype of a user interface based on a taxonomy, an annotated database of experimental findings and a prototype semantic user recommender system. Results are presented and discussed for all methods used. They show that, if reliably generated, the use of knowledge on users can significantly improve the quality of music content analysis. This thesis demonstrates that an informed knowledge of human approaches to music information retrieval provides valuable insights, which may be of particular assistance in the development of user-friendly, content-based access to digital music collections
    • โ€ฆ
    corecore