205 research outputs found

    Leveraging repetition for improved automatic lyric transcription in popular music

    Full text link
    Transcribing lyrics from musical audio is a challenging research prob-lem which has not benefited from many advances made in the related field of automatic speech recognition, owing to the prevalent musical accompaniment and differences between the spoken and sung voice. However, one aspect of this problem which has yet to be exploited by researchers is that significant portions of the lyrics will be repeated throughout the song. In this paper we investigate how this information can be leveraged to form a consensus transcription with improved consistency and accuracy. Our results show that improvements can be gained using a variety of techniques, and that relative gains are largest under the most challenging and realistic experimental conditions

    An algorithm for multi tempo music lyric transcription

    Get PDF
    Applied Thesis submitted to the Department of Computer Science, Ashesi University, in partial fulfillment of Bachelor of Science degree in Computer Science, April 2018.This paper documents an attempt to create an algorithm for multi-tempo music lyric transcription. This paper reviews music information retrieval as a field of study and identifies music lyric transcription as a subset of the music information retrieval field. The difficulties of music lyric transcription are highlighted and a gap in knowledge in the field is identified. There are no algorithms for music transcription that are applicable to all forms of music; they are usually specialised by instrument or by genre. The author attempts to fill this gap by creating a method for multi-tempo music lyric transcription. The methodology used to achieve this goal is a three-step process of taking audio as input, processing it using the REPET separation technique, and transcribing the separated audio file. The result of this paper was a relative success, with the music being separated successfully and the lyrics being transcribed but with accuracy lost

    ๊ณ ์œ  ํŠน์„ฑ์„ ํ™œ์šฉํ•œ ์Œ์•…์—์„œ์˜ ๋ณด์ปฌ ๋ถ„๋ฆฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•ฉ๊ณผํ•™๋ถ€, 2018. 2. ์ด๊ต๊ตฌ.๋ณด์ปฌ ๋ถ„๋ฆฌ๋ž€ ์Œ์•… ์‹ ํ˜ธ๋ฅผ ๋ณด์ปฌ ์„ฑ๋ถ„๊ณผ ๋ฐ˜์ฃผ ์„ฑ๋ถ„์œผ๋กœ ๋ถ„๋ฆฌํ•˜๋Š” ์ผ ๋˜๋Š” ๊ทธ ๋ฐฉ๋ฒ•์„ ์˜๋ฏธํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ธฐ์ˆ ์€ ์Œ์•…์˜ ํŠน์ •ํ•œ ์„ฑ๋ถ„์— ๋‹ด๊ฒจ ์žˆ๋Š” ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•œ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์—์„œ๋ถ€ํ„ฐ, ๋ณด์ปฌ ์—ฐ์Šต๊ณผ ๊ฐ™์ด ๋ถ„๋ฆฌ ์Œ์› ์ž์ฒด๋ฅผ ํ™œ์šฉํ•˜๋Š” ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ๋ชฉ์ ์œผ๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ๋ชฉ์ ์€ ๋ณด์ปฌ๊ณผ ๋ฐ˜์ฃผ๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ณ ์œ ํ•œ ํŠน์„ฑ์— ๋Œ€ํ•ด ๋…ผ์˜ํ•˜๊ณ  ๊ทธ๊ฒƒ์„ ํ™œ์šฉํ•˜์—ฌ ๋ณด์ปฌ ๋ถ„๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์„ ๊ฐœ๋ฐœํ•˜๋Š” ๊ฒƒ์ด๋ฉฐ, ํŠนํžˆ `ํŠน์ง• ๊ธฐ๋ฐ˜' ์ด๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ƒํ™ฉ์— ๋Œ€ํ•ด ์ค‘์ ์ ์œผ๋กœ ๋…ผ์˜ํ•œ๋‹ค. ์šฐ์„  ๋ถ„๋ฆฌ ๋Œ€์ƒ์ด ๋˜๋Š” ์Œ์•… ์‹ ํ˜ธ๋Š” ๋‹จ์ฑ„๋„๋กœ ์ œ๊ณต๋œ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๋ฉฐ, ์ด ๊ฒฝ์šฐ ์‹ ํ˜ธ์˜ ๊ณต๊ฐ„์  ์ •๋ณด๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค์ฑ„๋„ ํ™˜๊ฒฝ์— ๋น„ํ•ด ๋”์šฑ ์–ด๋ ค์šด ํ™˜๊ฒฝ์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ๊ธฐ๊ณ„ ํ•™์Šต ๋ฐฉ๋ฒ•์œผ๋กœ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๊ฐ ์Œ์›์˜ ๋ชจ๋ธ์„ ์ถ”์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ฐฐ์ œํ•˜๋ฉฐ, ๋Œ€์‹  ์ €์ฐจ์›์˜ ํŠน์„ฑ๋“ค๋กœ๋ถ€ํ„ฐ ๋ชจ๋ธ์„ ์œ ๋„ํ•˜์—ฌ ์ด๋ฅผ ๋ชฉํ‘œ ํ•จ์ˆ˜์— ๋ฐ˜์˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‹œ๋„ํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๊ฐ€์‚ฌ, ์•…๋ณด, ์‚ฌ์šฉ์ž์˜ ์•ˆ๋‚ด ๋“ฑ๊ณผ ๊ฐ™์€ ์™ธ๋ถ€์˜ ์ •๋ณด ์—ญ์‹œ ์ œ๊ณต๋˜์ง€ ์•Š๋Š”๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ณด์ปฌ ๋ถ„๋ฆฌ์˜ ๊ฒฝ์šฐ ์•”๋ฌต ์Œ์› ๋ถ„๋ฆฌ ๋ฌธ์ œ์™€๋Š” ๋‹ฌ๋ฆฌ ๋ถ„๋ฆฌํ•˜๊ณ ์ž ํ•˜๋Š” ์Œ์›์ด ๊ฐ๊ฐ ๋ณด์ปฌ๊ณผ ๋ฐ˜์ฃผ์— ํ•ด๋‹นํ•œ๋‹ค๋Š” ์ตœ์†Œํ•œ์˜ ์ •๋ณด๋Š” ์ œ๊ณต๋˜๋ฏ€๋กœ ๊ฐ๊ฐ์˜ ์„ฑ์งˆ๋“ค์— ๋Œ€ํ•œ ๋ถ„์„์€ ๊ฐ€๋Šฅํ•˜๋‹ค. ํฌ๊ฒŒ ์„ธ ์ข…๋ฅ˜์˜ ํŠน์„ฑ์ด ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ค‘์ ์ ์œผ๋กœ ๋…ผ์˜๋œ๋‹ค. ์šฐ์„  ์—ฐ์†์„ฑ์˜ ๊ฒฝ์šฐ ์ฃผํŒŒ์ˆ˜ ๋˜๋Š” ์‹œ๊ฐ„ ์ธก๋ฉด์œผ๋กœ ๊ฐ๊ฐ ๋…ผ์˜๋  ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ฃผํŒŒ์ˆ˜์ถ• ์—ฐ์†์„ฑ์˜ ๊ฒฝ์šฐ ์†Œ๋ฆฌ์˜ ์Œ์ƒ‰์  ํŠน์„ฑ์„, ์‹œ๊ฐ„์ถ• ์—ฐ์†์„ฑ์€ ์†Œ๋ฆฌ๊ฐ€ ์•ˆ์ •์ ์œผ๋กœ ์ง€์†๋˜๋Š” ์ •๋„๋ฅผ ๊ฐ๊ฐ ๋‚˜ํƒ€๋‚ธ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ, ์ €ํ–‰๋ ฌ๊ณ„์ˆ˜ ํŠน์„ฑ์€ ์‹ ํ˜ธ์˜ ๊ตฌ์กฐ์  ์„ฑ์งˆ์„ ๋ฐ˜์˜ํ•˜๋ฉฐ ํ•ด๋‹น ์‹ ํ˜ธ๊ฐ€ ๋‚ฎ์€ ํ–‰๋ ฌ๊ณ„์ˆ˜๋ฅผ ๊ฐ€์ง€๋Š” ํ˜•ํƒœ๋กœ ํ‘œํ˜„๋  ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ์„ฑ๊น€ ํŠน์„ฑ์€ ์‹ ํ˜ธ์˜ ๋ถ„ํฌ ํ˜•ํƒœ๊ฐ€ ์–ผ๋งˆ๋‚˜ ์„ฑ๊ธฐ๊ฑฐ๋‚˜ ์กฐ๋ฐ€ํ•œ์ง€๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€์˜ ๋ณด์ปฌ ๋ถ„๋ฆฌ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๋…ผ์˜ํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์€ ์—ฐ์†์„ฑ๊ณผ ์„ฑ๊น€ ํŠน์„ฑ์— ๊ธฐ๋ฐ˜์„ ๋‘๊ณ  ํ™”์„ฑ ์•…๊ธฐ-ํƒ€์•…๊ธฐ ๋ถ„๋ฆฌ ๋ฐฉ๋ฒ• (harmonic-percussive sound separation, HPSS) ์„ ํ™•์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•์ด ๋‘ ๋ฒˆ์˜ HPSS ๊ณผ์ •์„ ํ†ตํ•ด ๋ณด์ปฌ์„ ๋ถ„๋ฆฌํ•˜๋Š” ๊ฒƒ์— ๋น„ํ•ด ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์„ฑ๊ธด ์ž”์—ฌ ์„ฑ๋ถ„์„ ์ถ”๊ฐ€ํ•ด ํ•œ ๋ฒˆ์˜ ๋ณด์ปฌ ๋ถ„๋ฆฌ ๊ณผ์ •๋งŒ์„ ์‚ฌ์šฉํ•œ๋‹ค. ๋…ผ์˜๋˜๋Š” ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์€ ์ €ํ–‰๋ ฌ๊ณ„์ˆ˜ ํŠน์„ฑ๊ณผ ์„ฑ๊น€ ํŠน์„ฑ์„ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์œผ๋กœ, ๋ฐ˜์ฃผ๊ฐ€ ์ €ํ–‰๋ ฌ๊ณ„์ˆ˜ ๋ชจ๋ธ๋กœ ํ‘œํ˜„๋  ์ˆ˜ ์žˆ๋Š” ๋ฐ˜๋ฉด ๋ณด์ปฌ์€ ์„ฑ๊ธด ๋ถ„ํฌ๋ฅผ ๊ฐ€์ง„๋‹ค๋Š” ๊ฐ€์ •์— ๊ธฐ๋ฐ˜์„ ๋‘”๋‹ค. ์ด๋Ÿฌํ•œ ์„ฑ๋ถ„๋“ค์„ ๋ถ„๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ•์ธํ•œ ์ฃผ์„ฑ๋ถ„ ๋ถ„์„ (robust principal component analysis, RPCA) ์„ ์ด์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋Œ€ํ‘œ์ ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ณด์ปฌ ๋ถ„๋ฆฌ ์„ฑ๋Šฅ์— ์ดˆ์ ์„ ๋‘๊ณ  RPCA ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ผ๋ฐ˜ํ™”ํ•˜๊ฑฐ๋‚˜ ํ™•์žฅํ•˜๋Š” ๋ฐฉ์‹์— ๋Œ€ํ•ด ๋…ผ์˜ํ•˜๋ฉฐ, ํŠธ๋ ˆ์ด์Šค ๋…ธ๋ฆ„๊ณผ l1 ๋…ธ๋ฆ„์„ ๊ฐ๊ฐ ์ƒคํ… p ๋…ธ๋ฆ„๊ณผ lp ๋…ธ๋ฆ„์œผ๋กœ ๋Œ€์ฒดํ•˜๋Š” ๋ฐฉ๋ฒ•, ์Šค์ผ€์ผ ์••์ถ• ๋ฐฉ๋ฒ•, ์ฃผํŒŒ์ˆ˜ ๋ถ„ํฌ ํŠน์„ฑ์„ ๋ฐ˜์˜ํ•˜๋Š” ๋ฐฉ๋ฒ• ๋“ฑ์„ ํฌํ•จํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์€ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๋Œ€ํšŒ์—์„œ ํ‰๊ฐ€๋˜์—ˆ์œผ๋ฉฐ ์ตœ์‹ ์˜ ๋ณด์ปฌ ๋ถ„๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค๋ณด๋‹ค ๋” ์šฐ์ˆ˜ํ•˜๊ฑฐ๋‚˜ ๋น„์Šทํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์˜€๋‹ค.Singing voice separation (SVS) refers to the task or the method of decomposing music signal into singing voice and its accompanying instruments. It has various uses, from the preprocessing step, to extract the musical features implied in the target source, to applications for itself such as vocal training. This thesis aims to discover the common properties of singing voice and accompaniment, and apply it to advance the state-of-the-art SVS algorithms. In particular, the separation approach as follows, which is named `characteristics-based,' is concentrated in this thesis. First, the music signal is assumed to be provided in monaural, or as a single-channel recording. It is more difficult condition compared to multiple-channel recording since spatial information cannot be applied in the separation procedure. This thesis also focuses on unsupervised approach, that does not use machine learning technique to estimate the source model from the training data. The models are instead derived based on the low-level characteristics and applied to the objective function. Finally, no external information such as lyrics, score, or user guide is provided. Unlike blind source separation problems, however, the classes of the target sources, singing voice and accompaniment, are known in SVS problem, and it allows to estimate those respective properties. Three different characteristics are primarily discussed in this thesis. Continuity, in the spectral or temporal dimension, refers the smoothness of the source in the particular aspect. The spectral continuity is related with the timbre, while the temporal continuity represents the stability of sounds. On the other hand, the low-rankness refers how the signal is well-structured and can be represented as a low-rank data, and the sparsity represents how rarely the sounds in signals occur in time and frequency. This thesis discusses two SVS approaches using above characteristics. First one is based on the continuity and sparsity, which extends the harmonic-percussive sound separation (HPSS). While the conventional algorithm separates singing voice by using a two-stage HPSS, the proposed one has a single stage procedure but with an additional sparse residual term in the objective function. Another SVS approach is based on the low-rankness and sparsity. Assuming that accompaniment can be represented as a low-rank model, whereas singing voice has a sparse distribution, conventional algorithm decomposes the sources by using robust principal component analysis (RPCA). In this thesis, generalization or extension of RPCA especially for SVS is discussed, including the use of Schatten p-/lp-norm, scale compression, and spectral distribution. The presented algorithms are evaluated using various datasets and challenges and achieved the better comparable results compared to the state-of-the-art algorithms.Chapter 1 Introduction 1 1.1 Motivation 4 1.2 Applications 5 1.3 Definitions and keywords 6 1.4 Evaluation criteria 7 1.5 Topics of interest 11 1.6 Outline of the thesis 13 Chapter 2 Background 15 2.1 Spectrogram-domain separation framework 15 2.2 Approaches for singing voice separation 19 2.2.1 Characteristics-based approach 20 2.2.2 Spatial approach 21 2.2.3 Machine learning-based approach 22 2.2.4 informed approach 23 2.3 Datasets and challenges 25 2.3.1 Datasets 25 2.3.2 Challenges 26 Chapter 3 Characteristics of music sources 28 3.1 Introduction 28 3.2 Spectral/temporal continuity 29 3.2.1 Continuity of a spectrogram 29 3.2.2 Continuity of musical sources 30 3.3 Low-rankness 31 3.3.1 Low-rankness of a spectrogram 31 3.3.2 Low-rankness of musical sources 33 3.4 Sparsity 34 3.4.1 Sparsity of a spectrogram 34 3.4.2 Sparsity of musical sources 36 3.5 Experiments 38 3.6 Summary 39 Chapter 4 Singing voice separation using continuity and sparsity 43 4.1 Introduction 43 4.2 SVS using two-stage HPSS 45 4.2.1 Harmonic-percussive sound separation 45 4.2.2 SVS using two-stage HPSS 46 4.3 Proposed algorithm 48 4.4 Experimental evaluation 52 4.4.1 MIR-1k Dataset 52 4.4.2 Beach boys Dataset 55 4.4.3 iKala dataset in MIREX 2014 56 4.5 Conclusion 58 Chapter 5 Singing voice separation using low-rankness and sparsity 61 5.1 Introduction 61 5.2 SVS using robust principal component analysis 63 5.2.1 Robust principal component analysis 63 5.2.2 Optimization for RPCA using augmented Lagrangian multiplier method 63 5.2.3 SVS using RPCA 65 5.3 SVS using generalized RPCA 67 5.3.1 Generalized RPCA using Schatten p- and lp-norm 67 5.3.2 Comparison of pRPCA with robust matrix completion 68 5.3.3 Optimization method of pRPCA 69 5.3.4 Discussion of the normalization factor for ฮป 69 5.3.5 Generalized RPCA using scale compression 71 5.3.6 Experimental results 72 5.4 SVS using RPCA and spectral distribution 73 5.4.1 RPCA with weighted l1-norm 73 5.4.2 Proposed method: SVS using wRPCA 74 5.4.3 Experimental results using DSD100 dataset 78 5.4.4 Comparison with state-of-the-arts in SiSEC 2016 79 5.4.5 Discussion 85 5.5 Summary 86 Chapter 6 Conclusion and Future Work 88 6.1 Conclusion 88 6.2 Contributions 89 6.3 Future work 91 6.3.1 Discovering various characteristics for SVS 91 6.3.2 Expanding to other SVS approaches 92 6.3.3 Applying the characteristics for deep learning models 92 Bibliography 94 ์ดˆ ๋ก 110Docto

    Musical Contour Regulation Facilitation (MCRF) to Support Emotion Regulation Development in Preschoolers: A Mixed Methods Feasibility Study

    Get PDF
    Title from PDF of title page, viewed on July 7, 2015Dissertation advisor: William EverettVitaIncludes bibliographic references (pages 216-236)Thesis (Ph.D.)--Conservatory of Music and Dance and School of Education. University of Missouri--Kansas City, 2015Emotion regulation (ER) is the ability for a person to maintain a comfortable state of arousal by controlling and shifting his or her emotional experiences and expressions. The emergence of maladaptive ER occurs in childhood and is one characteristic often shared by several disorders. Maladaptive ER can significantly affect multiple areas in child development, such as the ability to learn in school, form and maintain healthy relationships with peers and adults, and manage and inhibit behavioral responses. Interventions for children at-risk for developing maladaptive ER skills are limited and need further exploration. Based on limitations noted in existing treatment options, this study provided a preliminary examination of the utility of using a music-based approach. An embedded convergent mixed methods research design was used to explore the feasibility of a Musical Contour Regulation Facilitation (MCRF) intervention. The MCRF intervention was developed to improve ER abilities in children by providing opportunities to practice real-time management of high and low arousal experiences. Typically developing preschool-aged children (n = 8) participated in 11 MCRF sessions over four weeks. Data to assess ER skills and related behaviors was collected pre- and post-MCRF treatment; current regulatory levels were assessed and self-reported at the beginning and end of each MCRF session. In addition, parent and teacher interviews and questionnaires were conducted post-treatment. Grounded theory-based qualitative analysis results suggest that most parents and both teachers noted emotional changes in the children following MCRF treatment. Perhaps more importantly, all interviewees believed in the importance and helpfulness of music on developmental outcomes even if they did not note changes in the children or they recognized that other factors may have contributed to perceived changes. Quantitative data analysis results indicated clinically significant improvements in ER skills in the children following MCRF treatment. Convergent mixed methods analyses results further support the efficacy and acceptability of the MCRF intervention. Together, these findings endorse future normative and clinical study of the MCRF intervention as way to facilitate ER development, especially as this medium is highly desired by parents and teachers and can be easily integrated in a preschool setting.Introduction -- Emotion regulation and musical contour regulation facilitation in theory and practice: an integrated literature review -- The effectiveness of MCRF in facilitating emotion regulation: methodology for a feasibility study -- The effectiveness of MCRF in facilitating emotion regulation: mixed methods results -- The feasibility of MCRF in facilitating emotion regulation development in preschoolers: discussion, implications, and recommendations -- Appendix A. Musical contour regulation facilitation (MCRF) intervention manual -- Appendix B. MCRF intervention pilot assessment -- Appendix C. Recruitment materials -- Appendix D. Informed consent and child assent -- Appendix E. Study measure

    Application of automatic speech recognition technologies to singing

    Get PDF
    The research field of Music Information Retrieval is concerned with the automatic analysis of musical characteristics. One aspect that has not received much attention so far is the automatic analysis of sung lyrics. On the other hand, the field of Automatic Speech Recognition has produced many methods for the automatic analysis of speech, but those have rarely been employed for singing. This thesis analyzes the feasibility of applying various speech recognition methods to singing, and suggests adaptations. In addition, the routes to practical applications for these systems are described. Five tasks are considered: Phoneme recognition, language identification, keyword spotting, lyrics-to-audio alignment, and retrieval of lyrics from sung queries. The main bottleneck in almost all of these tasks lies in the recognition of phonemes from sung audio. Conventional models trained on speech do not perform well when applied to singing. Training models on singing is difficult due to a lack of annotated data. This thesis offers two approaches for generating such data sets. For the first one, speech recordings are made more โ€œsong-likeโ€. In the second approach, textual lyrics are automatically aligned to an existing singing data set. In both cases, these new data sets are then used for training new acoustic models, offering considerable improvements over models trained on speech. Building on these improved acoustic models, speech recognition algorithms for the individual tasks were adapted to singing by either improving their robustness to the differing characteristics of singing, or by exploiting the specific features of singing performances. Examples of improving robustness include the use of keyword-filler HMMs for keyword spotting, an i-vector approach for language identification, and a method for alignment and lyrics retrieval that allows highly varying durations. Features of singing are utilized in various ways: In an approach for language identification that is well-suited for long recordings; in a method for keyword spotting based on phoneme durations in singing; and in an algorithm for alignment and retrieval that exploits known phoneme confusions in singing.Das Gebiet des Music Information Retrieval befasst sich mit der automatischen Analyse von musikalischen Charakteristika. Ein Aspekt, der bisher kaum erforscht wurde, ist dabei der gesungene Text. Auf der anderen Seite werden in der automatischen Spracherkennung viele Methoden fรผr die automatische Analyse von Sprache entwickelt, jedoch selten fรผr Gesang. Die vorliegende Arbeit untersucht die Anwendung von Methoden aus der Spracherkennung auf Gesang und beschreibt mรถgliche Anpassungen. Zudem werden Wege zur praktischen Anwendung dieser Ansรคtze aufgezeigt. Fรผnf Themen werden dabei betrachtet: Phonemerkennung, Sprachenidentifikation, Schlagwortsuche, Text-zu-Gesangs-Alignment und Suche von Texten anhand von gesungenen Anfragen. Das grรถรŸte Hindernis bei fast allen dieser Themen ist die Erkennung von Phonemen aus Gesangsaufnahmen. Herkรถmmliche, auf Sprache trainierte Modelle, bieten keine guten Ergebnisse fรผr Gesang. Das Trainieren von Modellen auf Gesang ist schwierig, da kaum annotierte Daten verfรผgbar sind. Diese Arbeit zeigt zwei Ansรคtze auf, um solche Daten zu generieren. Fรผr den ersten wurden Sprachaufnahmen kรผnstlich gesangsรคhnlicher gemacht. Fรผr den zweiten wurden Texte automatisch zu einem vorhandenen Gesangsdatensatz zugeordnet. Die neuen Datensรคtze wurden zum Trainieren neuer Modelle genutzt, welche deutliche Verbesserungen gegenรผber sprachbasierten Modellen bieten. Auf diesen verbesserten akustischen Modellen aufbauend wurden Algorithmen aus der Spracherkennung fรผr die verschiedenen Aufgaben angepasst, entweder durch das Verbessern der Robustheit gegenรผber Gesangscharakteristika oder durch das Ausnutzen von hilfreichen Besonderheiten von Gesang. Beispiele fรผr die verbesserte Robustheit sind der Einsatz von Keyword-Filler-HMMs fรผr die Schlagwortsuche, ein i-Vector-Ansatz fรผr die Sprachenidentifikation sowie eine Methode fรผr das Alignment und die Textsuche, die stark schwankende Phonemdauern nicht bestraft. Die Besonderheiten von Gesang werden auf verschiedene Weisen genutzt: So z.B. in einem Ansatz fรผr die Sprachenidentifikation, der lange Aufnahmen benรถtigt; in einer Methode fรผr die Schlagwortsuche, die bekannte Phonemdauern in Gesang mit einbezieht; und in einem Algorithmus fรผr das Alignment und die Textsuche, der bekannte Phonemkonfusionen verwertet

    "Integrated Arts" Pedagogy and Philosophy

    Get PDF
    This dissertation proposes and discusses the pedagogy and philosophy behind an original method titled IAM (Integrated Arts Method), which is an alternate experiential, integrated, conducive, and student-centered music and arts pedagogical method that can facilitate effective teaching for effective learning. Three key philosophical principles and a number of general techniques and attitudes are suggested as contributing factors to observed success of IAM. These generalized contributing factors, the IAM pedagogy and philosophy, are comparable with existing related pedagogical methods, which offer indirect explanation and support for the success of IAM. In turn, these pedagogical principles and attitudes allow generalization of IAMs advantages to other subject areas. IAM pedagogy contributes to the field of education generally, and arts education specifically, with original music and interdisciplinary programs, materials, compositions, and procedures, which represent practical and effective tools for both educators and students success. The three key principles of IAM pedagogy suggest that learning can be effective if it is: 1) physical experience based, 2) with synthesis of related subjects, and 3) taught in a positive and stimulating atmosphere. Accurate facilitation of these key principles involves techniques, aspects, and pedagogical attitudes which this dissertation specifies and explains. IAM pedagogy is embodied through a set of general principles, specific attitudes, and practical tools for achieving required emotional, mental, physiological, and psychological functioning of both learner and teacher for their mutual effectiveness. Results of real world IAM programs in subject areas of music and arts suggest that the pedagogy of IAM contributes to effective, enjoyable, and memorable education. Explanation and support of this contribution stem from scientific, educational, sociological, philosophical, neuroscience, cognition, and music and arts literature. Hence the research conducted in this dissertation has been from real world practice toward grounded theory. IAM programs have been conducted 7 times with success (in 2007-2009 public school based extracurricular settings), with the aid of original pedagogical programs and materials developed for these programs. These specific programs and their parameters and materials will be offered in the dissertation as concrete sample pedagogical solutions to practical application of the proposed principles in music and integrated arts education

    Functional Scaffolding for Musical Composition: A New Approach in Computer-Assisted Music Composition

    Get PDF
    While it is important for systems intended to enhance musical creativity to define and explore musical ideas conceived by individual users, many limit musical freedom by focusing on maintaining musical structure, thereby impeding the user\u27s freedom to explore his or her individual style. This dissertation presents a comprehensive body of work that introduces a new musical representation that allows users to explore a space of musical rules that are created from their own melodies. This representation, called functional scaffolding for musical composition (FSMC), exploits a simple yet powerful property of multipart compositions: The pattern of notes and rhythms in different instrumental parts of the same song are functionally related. That is, in principle, one part can be expressed as a function of another. Music in FSMC is represented accordingly as a functional relationship between an existing human composition, or scaffold, and an additional generated voice. This relationship is encoded by a type of artificial neural network called a compositional pattern producing network (CPPN). A human user without any musical expertise can then explore how these additional generated voices should relate to the scaffold through an interactive evolutionary process akin to animal breeding. The utility of this insight is validated by two implementations of FSMC called NEAT Drummer and MaestroGenesis, that respectively help users tailor drum patterns and complete multipart arrangements from as little as a single original monophonic track. The five major contributions of this work address the overarching hypothesis in this dissertation that functional relationships alone, rather than specialized music theory, are sufficient for generating plausible additional voices. First, to validate FSMC and determine whether plausible generated voices result from the human-composed scaffold or intrinsic properties of the CPPN, drum patterns are created with NEAT Drummer to accompany several different polyphonic pieces. Extending the FSMC approach to generate pitched voices, the second contribution reinforces the importance of functional transformations through quality assessments that indicate that some partially FSMC-generated pieces are indistinguishable from those that are fully human. While the third contribution focuses on constructing and exploring a space of plausible voices with MaestroGenesis, the fourth presents results from a two-year study where students discuss their creative experience with the program. Finally, the fifth contribution is a plugin for MaestroGenesis called MaestroGenesis Voice (MG-V) that provides users a more natural way to incorporate MaestroGenesis in their creative endeavors by allowing scaffold creation through the human voice. Together, the chapters in this dissertation constitute a comprehensive approach to assisted music generation, enabling creativity without the need for musical expertise

    Modelling Digital Media Objects

    Get PDF
    • โ€ฆ
    corecore