4 research outputs found

    Searching lyrical phrases in a-capella Turkish Makam recordings

    No full text
    Comunicació presentada a la 16th International Society for Music Information Retrieval Conference (ISMIR 2015), celebrada els dies 26 a 30 d'octubre de 2015 a Màlaga, Espanya.Search by lyrics, the problem of locating the exact occurrences of a phrase from lyrics in musical audio, is a recently emerging research topic. Unlike key-phrases in speech, lyrical key-phrases have durations that bear important relation to other musical aspects like the structure of a composition. In this work we propose an approach that address the differences of syllable durations, specific for singing. First a phrase is expanded to MFCC-based phoneme models, trained on speech. Then, we apply dynamic time warping between the phrase and audio to estimate candidate audio segments in the given audio recording. Next, the retrieved audio segments are ranked by means of a novel score-informed hidden Markov model, in which durations of the syllables within a phrase are explicitly modeled. The proposed approach is evaluated on 12 a-capella audio recordings of Turkish Makam music. Relying on standard speech phonetic models, we arrive at promising results that outperform a baseline approach unaware of lyrics durations. To the best of our knowledge, this is the first work tackling the problem of search by lyrical key-phrases. We expect that it can serve as a baseline for further research on singing material with similar musical characteristics.This work is partly supported by the European Research Council under the European Union’s Seventh Framework Program, as part of the CompMusic project (ERC grant agreement 267583) and partly by the AGAUR research grant

    Searching lyrical phrases in a-capella Turkish Makam recordings

    No full text
    Comunicació presentada a la 16th International Society for Music Information Retrieval Conference (ISMIR 2015), celebrada els dies 26 a 30 d'octubre de 2015 a Màlaga, Espanya.Search by lyrics, the problem of locating the exact occurrences of a phrase from lyrics in musical audio, is a recently emerging research topic. Unlike key-phrases in speech, lyrical key-phrases have durations that bear important relation to other musical aspects like the structure of a composition. In this work we propose an approach that address the differences of syllable durations, specific for singing. First a phrase is expanded to MFCC-based phoneme models, trained on speech. Then, we apply dynamic time warping between the phrase and audio to estimate candidate audio segments in the given audio recording. Next, the retrieved audio segments are ranked by means of a novel score-informed hidden Markov model, in which durations of the syllables within a phrase are explicitly modeled. The proposed approach is evaluated on 12 a-capella audio recordings of Turkish Makam music. Relying on standard speech phonetic models, we arrive at promising results that outperform a baseline approach unaware of lyrics durations. To the best of our knowledge, this is the first work tackling the problem of search by lyrical key-phrases. We expect that it can serve as a baseline for further research on singing material with similar musical characteristics.This work is partly supported by the European Research Council under the European Union’s Seventh Framework Program, as part of the CompMusic project (ERC grant agreement 267583) and partly by the AGAUR research grant

    Application of automatic speech recognition technologies to singing

    Get PDF
    The research field of Music Information Retrieval is concerned with the automatic analysis of musical characteristics. One aspect that has not received much attention so far is the automatic analysis of sung lyrics. On the other hand, the field of Automatic Speech Recognition has produced many methods for the automatic analysis of speech, but those have rarely been employed for singing. This thesis analyzes the feasibility of applying various speech recognition methods to singing, and suggests adaptations. In addition, the routes to practical applications for these systems are described. Five tasks are considered: Phoneme recognition, language identification, keyword spotting, lyrics-to-audio alignment, and retrieval of lyrics from sung queries. The main bottleneck in almost all of these tasks lies in the recognition of phonemes from sung audio. Conventional models trained on speech do not perform well when applied to singing. Training models on singing is difficult due to a lack of annotated data. This thesis offers two approaches for generating such data sets. For the first one, speech recordings are made more “song-like”. In the second approach, textual lyrics are automatically aligned to an existing singing data set. In both cases, these new data sets are then used for training new acoustic models, offering considerable improvements over models trained on speech. Building on these improved acoustic models, speech recognition algorithms for the individual tasks were adapted to singing by either improving their robustness to the differing characteristics of singing, or by exploiting the specific features of singing performances. Examples of improving robustness include the use of keyword-filler HMMs for keyword spotting, an i-vector approach for language identification, and a method for alignment and lyrics retrieval that allows highly varying durations. Features of singing are utilized in various ways: In an approach for language identification that is well-suited for long recordings; in a method for keyword spotting based on phoneme durations in singing; and in an algorithm for alignment and retrieval that exploits known phoneme confusions in singing.Das Gebiet des Music Information Retrieval befasst sich mit der automatischen Analyse von musikalischen Charakteristika. Ein Aspekt, der bisher kaum erforscht wurde, ist dabei der gesungene Text. Auf der anderen Seite werden in der automatischen Spracherkennung viele Methoden für die automatische Analyse von Sprache entwickelt, jedoch selten für Gesang. Die vorliegende Arbeit untersucht die Anwendung von Methoden aus der Spracherkennung auf Gesang und beschreibt mögliche Anpassungen. Zudem werden Wege zur praktischen Anwendung dieser Ansätze aufgezeigt. Fünf Themen werden dabei betrachtet: Phonemerkennung, Sprachenidentifikation, Schlagwortsuche, Text-zu-Gesangs-Alignment und Suche von Texten anhand von gesungenen Anfragen. Das größte Hindernis bei fast allen dieser Themen ist die Erkennung von Phonemen aus Gesangsaufnahmen. Herkömmliche, auf Sprache trainierte Modelle, bieten keine guten Ergebnisse für Gesang. Das Trainieren von Modellen auf Gesang ist schwierig, da kaum annotierte Daten verfügbar sind. Diese Arbeit zeigt zwei Ansätze auf, um solche Daten zu generieren. Für den ersten wurden Sprachaufnahmen künstlich gesangsähnlicher gemacht. Für den zweiten wurden Texte automatisch zu einem vorhandenen Gesangsdatensatz zugeordnet. Die neuen Datensätze wurden zum Trainieren neuer Modelle genutzt, welche deutliche Verbesserungen gegenüber sprachbasierten Modellen bieten. Auf diesen verbesserten akustischen Modellen aufbauend wurden Algorithmen aus der Spracherkennung für die verschiedenen Aufgaben angepasst, entweder durch das Verbessern der Robustheit gegenüber Gesangscharakteristika oder durch das Ausnutzen von hilfreichen Besonderheiten von Gesang. Beispiele für die verbesserte Robustheit sind der Einsatz von Keyword-Filler-HMMs für die Schlagwortsuche, ein i-Vector-Ansatz für die Sprachenidentifikation sowie eine Methode für das Alignment und die Textsuche, die stark schwankende Phonemdauern nicht bestraft. Die Besonderheiten von Gesang werden auf verschiedene Weisen genutzt: So z.B. in einem Ansatz für die Sprachenidentifikation, der lange Aufnahmen benötigt; in einer Methode für die Schlagwortsuche, die bekannte Phonemdauern in Gesang mit einbezieht; und in einem Algorithmus für das Alignment und die Textsuche, der bekannte Phonemkonfusionen verwertet

    Turkish Makam Acapella Sections Dataset

    No full text
    <p>Turkish makam  acapella sections dataset is sung by professional singers and is a collection of recordings  of compositions from the vocal form şarkı. They are selected to be the same as the recordings in version two of http://compmusic.upf.edu/turkish-sarki</p> <p>The main intention is to provide acapella counterpart to polyphonic recordings.  </p> <p><strong>THE DATASET </strong></p> <p><strong>Audio music content</strong></p> <p>The collection has annotations with section, lyrics phrases and lyrics words. Each section, lyrics word and  lyrical phrase is aligned to its corresponding segment in the audio. Annotations of secitons (aranağme, zemin etc.) are taken from <a href="https://github.com/MTG/turkish_makam_section_dataset">https://github.com/MTG/turkish_makam_section_dataset</a></p> <p>FORMAT: All annotations in TextGrid (used in Praat)</p> <p>Please cite one of the following publications  if you use the dataset in your work:</p> <blockquote> <p><a href="http://www.mtg.upf.edu/biblio/author/810">Dzhambazov, G.</a>, & <a href="http://www.mtg.upf.edu/biblio/author/893">Serra X.</a> (2015).  <a href="http://www.mtg.upf.edu/node/3266">Modeling of Phoneme Durations for Alignment between Polyphonic Audio and Lyrics</a>. Sound and Music Computing Conference 2015.</p> </blockquote> <p>Or</p> <blockquote> <p>Dzhambazov, G., Şentürk S., & Serra X. (2015).  Searching Lyrical Phrases in A-Capella Turkish Makam Recordings. 16th International Society for Music Information Retrieval (ISMIR) Conference</p> </blockquote> <p>turkish-makam-acapella-sections-dataset-2.0.zip  is organised by artist and makam_acapella-master_1.0.zip  is organised by musicbrainz ID.</p> <p><strong>Contact</strong></p> <p>If you have any questions or comments about the dataset, please feel free to write to us.</p> <p>Georgi Dzhambazov</p> <p>Music Technology Group,</p> <p>Universitat Pompeu Fabra, </p> <p>Barcelona, Spain</p> <p>georgi dzhambazov upf edu</p> <p> </p> <p><a href="http://compmusic.upf.edu/turkish-makam-acapella-sections-dataset">http://compmusic.upf.edu/turkish-makam-acapella-sections-dataset</a></p
    corecore