215 research outputs found

    Lyrics-to-Audio Alignment and its Application

    Get PDF
    Automatic lyrics-to-audio alignment techniques have been drawing attention in the last years and various studies have been made in this field. The objective of lyrics-to-audio alignment is to estimate a temporal relationship between lyrics and musical audio signals and can be applied to various applications such as Karaoke-style lyrics display. In this contribution, we provide an overview of recent development in this research topic, where we put a particular focus on categorization of various methods and on applications

    Automatic lyric alignment in Cantonese popular music.

    Get PDF
    Wong Chi Hang.Thesis submitted in: October 2005.Thesis (M.Phil.)--Chinese University of Hong Kong, 2006.Includes bibliographical references (leaves 89-94).Abstracts in English and Chinese.Abstract --- p.ii摘要 --- p.iiiAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 2 --- Literature Review --- p.5Chapter 2.1 --- LyricAlly --- p.5Chapter 2.2 --- Singing Voice Detection --- p.6Chapter 2.3 --- Singing Transcription System --- p.7Chapter 3 --- Background and System Overview --- p.9Chapter 3.1 --- Background --- p.9Chapter 3.1.1 --- Audio Mixing Practices of the popular music industry --- p.10Chapter 3.1.2 --- Cantonese lyric writer practice --- p.11Chapter 3.2 --- System Overview --- p.13Chapter 4 --- Vocal Signal Enhancement --- p.15Chapter 4.1 --- Method --- p.15Chapter 4.1.1 --- Non-center Signal Estimation --- p.16Chapter 4.1.2 --- Center Signal Estimation --- p.17Chapter 4.1.3 --- Bass and drum reduction --- p.21Chapter 4.2 --- Experimental Results --- p.21Chapter 4.2.1 --- Experimental Setup --- p.21Chapter 4.2.2 --- Results and Discussion --- p.24Chapter 5 --- Onset Detection --- p.29Chapter 5.1 --- Method --- p.29Chapter 5.1.1 --- Envelope Extraction --- p.30Chapter 5.1.2 --- Relative Difference Function --- p.32Chapter 5.1.3 --- Post-Processing --- p.32Chapter 5.2 --- Experimental Results --- p.34Chapter 5.2.1 --- Experimental Setup --- p.34Chapter 5.2.2 --- Results and Discussion --- p.35Chapter 6 --- Non-vocal Pruning --- p.39Chapter 6.1 --- Method --- p.39Chapter 6.1.1 --- Vocal Feature Selection --- p.39Chapter 6.1.2 --- Feed-forward neural network --- p.44Chapter 6.2 --- Experimental Results --- p.46Chapter 6.2.1 --- Experimental Setup --- p.46Chapter 6.2.2 --- Results and Discussion --- p.48Chapter 7 --- Lyric Feature Extraction --- p.51Chapter 7.1 --- Features --- p.52Chapter 7.1.1 --- Relative Pitch Feature --- p.52Chapter 7.1.2 --- Time Distance Feature --- p.54Chapter 7.2 --- Pitch Extraction --- p.56Chapter 7.2.1 --- f0 Detection Algorithms --- p.56Chapter 7.2.2 --- Post-Processing --- p.64Chapter 7.2.3 --- Experimental Results --- p.64Chapter 8 --- Lyrics Alignment --- p.69Chapter 8.1 --- Dynamic Time Warping --- p.69Chapter 8.2 --- Experimental Results --- p.72Chapter 8.2.1 --- Experimental Setup --- p.72Chapter 8.2.2 --- Results and Discussion --- p.74Chapter 9 --- Conclusion and Future Work --- p.82Chapter 9.1 --- Conclusion --- p.82Chapter 9.2 --- Future Work --- p.83Chapter A --- Publications --- p.85Chapter B --- Symbol Table --- p.86Bibliography --- p.8

    A Lyrics-matching QBH System for Interactive Environments

    Get PDF
    (Abstract to follow

    Perceptual and automated estimates of infringement in 40 music copyright cases

    Get PDF
    Music copyright infringement lawsuits implicate millions of dollars in damages and costs of litigation. There are, however, few objective measures by which to evaluate these claims. Recent music information retrieval research has proposed objective algorithms to automatically detect musical similarity, which might reduce subjectivity in music copyright infringement decisions, but there remains minimal relevant perceptual data despite its crucial role in copyright law. We collected perceptual data from 51 participants for 40 adjudicated copyright cases from 1915–2018 in 7 legal jurisdictions (USA, UK, Australia, New Zealand, Japan, People’s Republic of China, and Taiwan). Each case was represented by three different versions: either full audio, melody only (MIDI), or lyrics only (text). Due to the historical emphasis in legal opinions on melody as the key criterion for deciding infringement, we originally predicted that listening to melody-only versions would result in perceptual judgments that more closely matched actual past legal decisions. However, as in our preliminary study of 17 court decisions (Yuan et al., 2020), our results did not match these predictions. Participants listening to full audio outperformed not only the melody-only condition, but also automated algorithms designed to calculate musical similarity (with maximal accuracy of 83% vs. 75%, respectively). Meanwhile, lyrics-only conditions performed at chance levels. Analysis of outlier cases suggests that music, lyrics, and contextual factors can interact in complex ways difficult to capture using quantitative metrics. We propose directions for further investigation including using larger and more diverse samples of cases, enhanced methods, and adapting our perceptual experiment method to avoid relying on ground truth data only from court decisions (which may be subject to errors and selection bias). Our results contribute data and methods to inform practical debates relevant to music copyright law throughout the world, such as the question of whether, and the extent to which, judges and jurors should be allowed to hear published sound recordings of the disputed works in determining musical similarity. Our results ultimately suggest that while automated algorithms are unlikely to replace human judgments, they may help to supplement them

    Singing Voice Recognition for Music Information Retrieval

    Get PDF
    This thesis proposes signal processing methods for analysis of singing voice audio signals, with the objectives of obtaining information about the identity and lyrics content of the singing. Two main topics are presented, singer identification in monophonic and polyphonic music, and lyrics transcription and alignment. The information automatically extracted from the singing voice is meant to be used for applications such as music classification, sorting and organizing music databases, music information retrieval, etc. For singer identification, the thesis introduces methods from general audio classification and specific methods for dealing with the presence of accompaniment. The emphasis is on singer identification in polyphonic audio, where the singing voice is present along with musical accompaniment. The presence of instruments is detrimental to voice identification performance, and eliminating the effect of instrumental accompaniment is an important aspect of the problem. The study of singer identification is centered around the degradation of classification performance in presence of instruments, and separation of the vocal line for improving performance. For the study, monophonic singing was mixed with instrumental accompaniment at different signal-to-noise (singing-to-accompaniment) ratios and the classification process was performed on the polyphonic mixture and on the vocal line separated from the polyphonic mixture. The method for classification including the step for separating the vocals is improving significantly the performance compared to classification of the polyphonic mixtures, but not close to the performance in classifying the monophonic singing itself. Nevertheless, the results show that classification of singing voices can be done robustly in polyphonic music when using source separation. In the problem of lyrics transcription, the thesis introduces the general speech recognition framework and various adjustments that can be done before applying the methods on singing voice. The variability of phonation in singing poses a significant challenge to the speech recognition approach. The thesis proposes using phoneme models trained on speech data and adapted to singing voice characteristics for the recognition of phonemes and words from a singing voice signal. Language models and adaptation techniques are an important aspect of the recognition process. There are two different ways of recognizing the phonemes in the audio: one is alignment, when the true transcription is known and the phonemes have to be located, other one is recognition, when both transcription and location of phonemes have to be found. The alignment is, obviously, a simplified form of the recognition task. Alignment of textual lyrics to music audio is performed by aligning the phonetic transcription of the lyrics with the vocal line separated from the polyphonic mixture, using a collection of commercial songs. The word recognition is tested for transcription of lyrics from monophonic singing. The performance of the proposed system for automatic alignment of lyrics and audio is sufficient for facilitating applications such as automatic karaoke annotation or song browsing. The word recognition accuracy of the lyrics transcription from singing is quite low, but it is shown to be useful in a query-by-singing application, for performing a textual search based on the words recognized from the query. When some key words in the query are recognized, the song can be reliably identified

    Proceedings of the 6th International Workshop on Folk Music Analysis, 15-17 June, 2016

    Get PDF
    The Folk Music Analysis Workshop brings together computational music analysis and ethnomusicology. Both symbolic and audio representations of music are considered, with a broad range of scientific approaches being applied (signal processing, graph theory, deep learning). The workshop features a range of interesting talks from international researchers in areas such as Indian classical music, Iranian singing, Ottoman-Turkish Makam music scores, Flamenco singing, Irish traditional music, Georgian traditional music and Dutch folk songs. Invited guest speakers were Anja Volk, Utrecht University and Peter Browne, Technological University Dublin

    Application of automatic speech recognition technologies to singing

    Get PDF
    The research field of Music Information Retrieval is concerned with the automatic analysis of musical characteristics. One aspect that has not received much attention so far is the automatic analysis of sung lyrics. On the other hand, the field of Automatic Speech Recognition has produced many methods for the automatic analysis of speech, but those have rarely been employed for singing. This thesis analyzes the feasibility of applying various speech recognition methods to singing, and suggests adaptations. In addition, the routes to practical applications for these systems are described. Five tasks are considered: Phoneme recognition, language identification, keyword spotting, lyrics-to-audio alignment, and retrieval of lyrics from sung queries. The main bottleneck in almost all of these tasks lies in the recognition of phonemes from sung audio. Conventional models trained on speech do not perform well when applied to singing. Training models on singing is difficult due to a lack of annotated data. This thesis offers two approaches for generating such data sets. For the first one, speech recordings are made more “song-like”. In the second approach, textual lyrics are automatically aligned to an existing singing data set. In both cases, these new data sets are then used for training new acoustic models, offering considerable improvements over models trained on speech. Building on these improved acoustic models, speech recognition algorithms for the individual tasks were adapted to singing by either improving their robustness to the differing characteristics of singing, or by exploiting the specific features of singing performances. Examples of improving robustness include the use of keyword-filler HMMs for keyword spotting, an i-vector approach for language identification, and a method for alignment and lyrics retrieval that allows highly varying durations. Features of singing are utilized in various ways: In an approach for language identification that is well-suited for long recordings; in a method for keyword spotting based on phoneme durations in singing; and in an algorithm for alignment and retrieval that exploits known phoneme confusions in singing.Das Gebiet des Music Information Retrieval befasst sich mit der automatischen Analyse von musikalischen Charakteristika. Ein Aspekt, der bisher kaum erforscht wurde, ist dabei der gesungene Text. Auf der anderen Seite werden in der automatischen Spracherkennung viele Methoden für die automatische Analyse von Sprache entwickelt, jedoch selten für Gesang. Die vorliegende Arbeit untersucht die Anwendung von Methoden aus der Spracherkennung auf Gesang und beschreibt mögliche Anpassungen. Zudem werden Wege zur praktischen Anwendung dieser Ansätze aufgezeigt. Fünf Themen werden dabei betrachtet: Phonemerkennung, Sprachenidentifikation, Schlagwortsuche, Text-zu-Gesangs-Alignment und Suche von Texten anhand von gesungenen Anfragen. Das größte Hindernis bei fast allen dieser Themen ist die Erkennung von Phonemen aus Gesangsaufnahmen. Herkömmliche, auf Sprache trainierte Modelle, bieten keine guten Ergebnisse für Gesang. Das Trainieren von Modellen auf Gesang ist schwierig, da kaum annotierte Daten verfügbar sind. Diese Arbeit zeigt zwei Ansätze auf, um solche Daten zu generieren. Für den ersten wurden Sprachaufnahmen künstlich gesangsähnlicher gemacht. Für den zweiten wurden Texte automatisch zu einem vorhandenen Gesangsdatensatz zugeordnet. Die neuen Datensätze wurden zum Trainieren neuer Modelle genutzt, welche deutliche Verbesserungen gegenüber sprachbasierten Modellen bieten. Auf diesen verbesserten akustischen Modellen aufbauend wurden Algorithmen aus der Spracherkennung für die verschiedenen Aufgaben angepasst, entweder durch das Verbessern der Robustheit gegenüber Gesangscharakteristika oder durch das Ausnutzen von hilfreichen Besonderheiten von Gesang. Beispiele für die verbesserte Robustheit sind der Einsatz von Keyword-Filler-HMMs für die Schlagwortsuche, ein i-Vector-Ansatz für die Sprachenidentifikation sowie eine Methode für das Alignment und die Textsuche, die stark schwankende Phonemdauern nicht bestraft. Die Besonderheiten von Gesang werden auf verschiedene Weisen genutzt: So z.B. in einem Ansatz für die Sprachenidentifikation, der lange Aufnahmen benötigt; in einer Methode für die Schlagwortsuche, die bekannte Phonemdauern in Gesang mit einbezieht; und in einem Algorithmus für das Alignment und die Textsuche, der bekannte Phonemkonfusionen verwertet

    Exploring the intersection of translation and music: an analysis of how foreign songs reach Chinese audiences

    Get PDF
    The thesis looks into the practice of song translation, which occupies a peripheral position in translation studies (TS) despite its commonplace occurrence and significant impact on the global spread of songs. Foreign songs enjoy enormous appeal in China, where different methods have been adopted to translate them with the aim of enhancing listeners’ full reception. In particular, the practice of writing Chinese lyrics anew and setting them to the foreign tunes regardless of the semantic relationship between the source text (ST) and the target text (TT) has proliferated over the past decades. Some translated songs capture the gist of the original lyrics omitting minor details whereas some sever their relations with the original. This blurs the boundaries between translation, adaptation and rewriting lyrics. Another noticeable phenomenon is the emergence of self-organising communities, whose involvement in translating song lyrics and circulating subtitled music videos (MVs) cannot be overlooked in today’s digital landscape. Song translation can be understood as a field with its own “rules of the game” and exchange of different forms of capital following a Bourdieusian perspective. Adopting a case study methodology, the thesis investigates the particular field of song translation with special reference to the translation practices of a veteran song translator named Xue Fan 薛范, online amateur translators, and a professional lyricist from Hong Kong called Albert Leung 林夕. These case studies have been conducted for providing an in-depth analysis of China’s song translation activities through time and the dynamics of the power relations in the field. To translate a song from one language and culture into another invariably involves the losses and gains of certain elements, given the song’s semiotic richness. Against this backdrop, the thesis attempts to examine how the interplay of different meaning-making modes in a song has been dealt with by different agents under various circumstances through close examination of the relationship between STs and TTs. This will allow a better understanding of the production, circulation and reception of song translations in respective historical, ideological and social contexts. It is hoped that the thesis can provide new insights into our understanding of ‘translation’ in relation to music, and further shed light on how translation evolves at the convergence of music and technology in the globalisation era
    corecore