182 research outputs found

    Singing voice correction using canonical time warping

    Full text link
    Expressive singing voice correction is an appealing but challenging problem. A robust time-warping algorithm which synchronizes two singing recordings can provide a promising solution. We thereby propose to address the problem by canonical time warping (CTW) which aligns amateur singing recordings to professional ones. A new pitch contour is generated given the alignment information, and a pitch-corrected singing is synthesized back through the vocoder. The objective evaluation shows that CTW is robust against pitch-shifting and time-stretching effects, and the subjective test demonstrates that CTW prevails the other methods including DTW and the commercial auto-tuning software. Finally, we demonstrate the applicability of the proposed method in a practical, real-world scenario

    Analysis on Using Synthesized Singing Techniques in Assistive Interfaces for Visually Impaired to Study Music

    Get PDF
    Tactile and auditory senses are the basic types of methods that visually impaired people sense the world. Their interaction with assistive technologies also focuses mainly on tactile and auditory interfaces. This research paper discuss about the validity of using most appropriate singing synthesizing techniques as a mediator in assistive technologies specifically built to address their music learning needs engaged with music scores and lyrics. Music scores with notations and lyrics are considered as the main mediators in musical communication channel which lies between a composer and a performer. Visually impaired music lovers have less opportunity to access this main mediator since most of them are in visual format. If we consider a music score, the vocal performer’s melody is married to all the pleasant sound producible in the form of singing. Singing best fits for a format in temporal domain compared to a tactile format in spatial domain. Therefore, conversion of existing visual format to a singing output will be the most appropriate nonlossy transition as proved by the initial research on adaptive music score trainer for visually impaired [1]. In order to extend the paths of this initial research, this study seek on existing singing synthesizing techniques and researches on auditory interfaces

    Automatic lyric alignment in Cantonese popular music.

    Get PDF
    Wong Chi Hang.Thesis submitted in: October 2005.Thesis (M.Phil.)--Chinese University of Hong Kong, 2006.Includes bibliographical references (leaves 89-94).Abstracts in English and Chinese.Abstract --- p.ii摘要 --- p.iiiAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 2 --- Literature Review --- p.5Chapter 2.1 --- LyricAlly --- p.5Chapter 2.2 --- Singing Voice Detection --- p.6Chapter 2.3 --- Singing Transcription System --- p.7Chapter 3 --- Background and System Overview --- p.9Chapter 3.1 --- Background --- p.9Chapter 3.1.1 --- Audio Mixing Practices of the popular music industry --- p.10Chapter 3.1.2 --- Cantonese lyric writer practice --- p.11Chapter 3.2 --- System Overview --- p.13Chapter 4 --- Vocal Signal Enhancement --- p.15Chapter 4.1 --- Method --- p.15Chapter 4.1.1 --- Non-center Signal Estimation --- p.16Chapter 4.1.2 --- Center Signal Estimation --- p.17Chapter 4.1.3 --- Bass and drum reduction --- p.21Chapter 4.2 --- Experimental Results --- p.21Chapter 4.2.1 --- Experimental Setup --- p.21Chapter 4.2.2 --- Results and Discussion --- p.24Chapter 5 --- Onset Detection --- p.29Chapter 5.1 --- Method --- p.29Chapter 5.1.1 --- Envelope Extraction --- p.30Chapter 5.1.2 --- Relative Difference Function --- p.32Chapter 5.1.3 --- Post-Processing --- p.32Chapter 5.2 --- Experimental Results --- p.34Chapter 5.2.1 --- Experimental Setup --- p.34Chapter 5.2.2 --- Results and Discussion --- p.35Chapter 6 --- Non-vocal Pruning --- p.39Chapter 6.1 --- Method --- p.39Chapter 6.1.1 --- Vocal Feature Selection --- p.39Chapter 6.1.2 --- Feed-forward neural network --- p.44Chapter 6.2 --- Experimental Results --- p.46Chapter 6.2.1 --- Experimental Setup --- p.46Chapter 6.2.2 --- Results and Discussion --- p.48Chapter 7 --- Lyric Feature Extraction --- p.51Chapter 7.1 --- Features --- p.52Chapter 7.1.1 --- Relative Pitch Feature --- p.52Chapter 7.1.2 --- Time Distance Feature --- p.54Chapter 7.2 --- Pitch Extraction --- p.56Chapter 7.2.1 --- f0 Detection Algorithms --- p.56Chapter 7.2.2 --- Post-Processing --- p.64Chapter 7.2.3 --- Experimental Results --- p.64Chapter 8 --- Lyrics Alignment --- p.69Chapter 8.1 --- Dynamic Time Warping --- p.69Chapter 8.2 --- Experimental Results --- p.72Chapter 8.2.1 --- Experimental Setup --- p.72Chapter 8.2.2 --- Results and Discussion --- p.74Chapter 9 --- Conclusion and Future Work --- p.82Chapter 9.1 --- Conclusion --- p.82Chapter 9.2 --- Future Work --- p.83Chapter A --- Publications --- p.85Chapter B --- Symbol Table --- p.86Bibliography --- p.8

    Singing Voice Recognition for Music Information Retrieval

    Get PDF
    This thesis proposes signal processing methods for analysis of singing voice audio signals, with the objectives of obtaining information about the identity and lyrics content of the singing. Two main topics are presented, singer identification in monophonic and polyphonic music, and lyrics transcription and alignment. The information automatically extracted from the singing voice is meant to be used for applications such as music classification, sorting and organizing music databases, music information retrieval, etc. For singer identification, the thesis introduces methods from general audio classification and specific methods for dealing with the presence of accompaniment. The emphasis is on singer identification in polyphonic audio, where the singing voice is present along with musical accompaniment. The presence of instruments is detrimental to voice identification performance, and eliminating the effect of instrumental accompaniment is an important aspect of the problem. The study of singer identification is centered around the degradation of classification performance in presence of instruments, and separation of the vocal line for improving performance. For the study, monophonic singing was mixed with instrumental accompaniment at different signal-to-noise (singing-to-accompaniment) ratios and the classification process was performed on the polyphonic mixture and on the vocal line separated from the polyphonic mixture. The method for classification including the step for separating the vocals is improving significantly the performance compared to classification of the polyphonic mixtures, but not close to the performance in classifying the monophonic singing itself. Nevertheless, the results show that classification of singing voices can be done robustly in polyphonic music when using source separation. In the problem of lyrics transcription, the thesis introduces the general speech recognition framework and various adjustments that can be done before applying the methods on singing voice. The variability of phonation in singing poses a significant challenge to the speech recognition approach. The thesis proposes using phoneme models trained on speech data and adapted to singing voice characteristics for the recognition of phonemes and words from a singing voice signal. Language models and adaptation techniques are an important aspect of the recognition process. There are two different ways of recognizing the phonemes in the audio: one is alignment, when the true transcription is known and the phonemes have to be located, other one is recognition, when both transcription and location of phonemes have to be found. The alignment is, obviously, a simplified form of the recognition task. Alignment of textual lyrics to music audio is performed by aligning the phonetic transcription of the lyrics with the vocal line separated from the polyphonic mixture, using a collection of commercial songs. The word recognition is tested for transcription of lyrics from monophonic singing. The performance of the proposed system for automatic alignment of lyrics and audio is sufficient for facilitating applications such as automatic karaoke annotation or song browsing. The word recognition accuracy of the lyrics transcription from singing is quite low, but it is shown to be useful in a query-by-singing application, for performing a textual search based on the words recognized from the query. When some key words in the query are recognized, the song can be reliably identified

    Automatic transcription of traditional Turkish art music recordings: A computational ethnomusicology appraoach

    Get PDF
    Thesis (Doctoral)--Izmir Institute of Technology, Electronics and Communication Engineering, Izmir, 2012Includes bibliographical references (leaves: 96-109)Text in English; Abstract: Turkish and Englishxi, 131 leavesMusic Information Retrieval (MIR) is a recent research field, as an outcome of the revolutionary change in the distribution of, and access to the music recordings. Although MIR research already covers a wide range of applications, MIR methods are primarily developed for western music. Since the most important dimensions of music are fundamentally different in western and non-western musics, developing MIR methods for non-western musics is a challenging task. On the other hand, the discipline of ethnomusicology supplies some useful insights for the computational studies on nonwestern musics. Therefore, this thesis overcomes this challenging task within the framework of computational ethnomusicology, a new emerging interdisciplinary research domain. As a result, the main contribution of this study is the development of an automatic transcription system for traditional Turkish art music (Turkish music) for the first time in the literature. In order to develop such system for Turkish music, several subjects are also studied for the first time in the literature which constitute other contributions of the thesis: Automatic music transcription problem is considered from the perspective of ethnomusicology, an automatic makam recognition system is developed and the scale theory of Turkish music is evaluated computationally for nine makamlar in order to understand whether it can be used for makam detection. Furthermore, there is a wide geographical region such as Middle-East, North Africa and Asia sharing similarities with Turkish music. Therefore our study would also provide more relevant techniques and methods than the MIR literature for the study of these non-western musics
    corecore