Search CORE

198 research outputs found

Computer-aided Melody Note Transcription Using the Tony Software: Accuracy and Efficiency

Author: Bello J
Bittner R
Cannam C
Dai J
Dixon S
Fazekas G
Mauch M
Salamon J
Publication venue
Publication date: 01/01/2015
Field of study

accepteddate-added: 2015-05-24 19:18:46 +0000 date-modified: 2017-12-28 10:36:36 +0000 keywords: Tony, melody, note, transcription, open source software bdsk-url-1: https://code.soundsoftware.ac.uk/attachments/download/1423/tony-paper_preprint.pdfdate-added: 2015-05-24 19:18:46 +0000 date-modified: 2017-12-28 10:36:36 +0000 keywords: Tony, melody, note, transcription, open source software bdsk-url-1: https://code.soundsoftware.ac.uk/attachments/download/1423/tony-paper_preprint.pdfWe present Tony, a software tool for the interactive an- notation of melodies from monophonic audio recordings, and evaluate its usability and the accuracy of its note extraction method. The scientific study of acoustic performances of melodies, whether sung or played, requires the accurate transcription of notes and pitches. To achieve the desired transcription accuracy for a particular application, researchers manually correct results obtained by automatic methods. Tony is an interactive tool directly aimed at making this correction task efficient. It provides (a) state-of-the art algorithms for pitch and note estimation, (b) visual and auditory feedback for easy error-spotting, (c) an intelligent graphical user interface through which the user can rapidly correct estimation errors, (d) extensive export functions enabling further processing in other applications. We show that Tony’s built in automatic note transcription method compares favourably with existing tools. We report how long it takes to annotate recordings on a set of 96 solo vocal recordings and study the effect of piece, the number of edits made and the annotator’s increasing mastery of the software. Tony is Open Source software, with source code and compiled binaries for Windows, Mac OS X and Linux available from https://code.soundsoftware.ac.uk/projects/tony/

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Queen Mary Research Online

Music Information Retrieval: An Inspirational Guide to Transfer from Related Disciplines

Author: Hanjalic Alan
Kurth Frank
Liem Cynthia C.S.
Weninger Felix
Publication venue: Dagstuhl Follow-Ups. Multimodal Music Processing
Publication date: 01/01/2012
Field of study

The emerging field of Music Information Retrieval (MIR) has been influenced by neighboring domains in signal processing and machine learning, including automatic speech recognition, image processing and text information retrieval. In this contribution, we start with concrete examples for methodology transfer between speech and music processing, oriented on the building blocks of pattern recognition: preprocessing, feature extraction, and classification/decoding. We then assume a higher level viewpoint when describing sources of mutual inspiration derived from text and image information retrieval. We conclude that dealing with the peculiarities of music in MIR research has contributed to advancing the state-of-the-art in other fields, and that many future challenges in MIR are strikingly similar to those that other research areas have been facing

Dagstuhl Research Online Publication Server

Application of automatic speech recognition technologies to singing

Author: Kruspe Anna Marie
Publication venue
Publication date: 26/07/2018
Field of study

The research field of Music Information Retrieval is concerned with the automatic analysis of musical characteristics. One aspect that has not received much attention so far is the automatic analysis of sung lyrics. On the other hand, the field of Automatic Speech Recognition has produced many methods for the automatic analysis of speech, but those have rarely been employed for singing. This thesis analyzes the feasibility of applying various speech recognition methods to singing, and suggests adaptations. In addition, the routes to practical applications for these systems are described. Five tasks are considered: Phoneme recognition, language identification, keyword spotting, lyrics-to-audio alignment, and retrieval of lyrics from sung queries. The main bottleneck in almost all of these tasks lies in the recognition of phonemes from sung audio. Conventional models trained on speech do not perform well when applied to singing. Training models on singing is difficult due to a lack of annotated data. This thesis offers two approaches for generating such data sets. For the first one, speech recordings are made more “song-like”. In the second approach, textual lyrics are automatically aligned to an existing singing data set. In both cases, these new data sets are then used for training new acoustic models, offering considerable improvements over models trained on speech. Building on these improved acoustic models, speech recognition algorithms for the individual tasks were adapted to singing by either improving their robustness to the differing characteristics of singing, or by exploiting the specific features of singing performances. Examples of improving robustness include the use of keyword-filler HMMs for keyword spotting, an i-vector approach for language identification, and a method for alignment and lyrics retrieval that allows highly varying durations. Features of singing are utilized in various ways: In an approach for language identification that is well-suited for long recordings; in a method for keyword spotting based on phoneme durations in singing; and in an algorithm for alignment and retrieval that exploits known phoneme confusions in singing.Das Gebiet des Music Information Retrieval befasst sich mit der automatischen Analyse von musikalischen Charakteristika. Ein Aspekt, der bisher kaum erforscht wurde, ist dabei der gesungene Text. Auf der anderen Seite werden in der automatischen Spracherkennung viele Methoden für die automatische Analyse von Sprache entwickelt, jedoch selten für Gesang. Die vorliegende Arbeit untersucht die Anwendung von Methoden aus der Spracherkennung auf Gesang und beschreibt mögliche Anpassungen. Zudem werden Wege zur praktischen Anwendung dieser Ansätze aufgezeigt. Fünf Themen werden dabei betrachtet: Phonemerkennung, Sprachenidentifikation, Schlagwortsuche, Text-zu-Gesangs-Alignment und Suche von Texten anhand von gesungenen Anfragen. Das größte Hindernis bei fast allen dieser Themen ist die Erkennung von Phonemen aus Gesangsaufnahmen. Herkömmliche, auf Sprache trainierte Modelle, bieten keine guten Ergebnisse für Gesang. Das Trainieren von Modellen auf Gesang ist schwierig, da kaum annotierte Daten verfügbar sind. Diese Arbeit zeigt zwei Ansätze auf, um solche Daten zu generieren. Für den ersten wurden Sprachaufnahmen künstlich gesangsähnlicher gemacht. Für den zweiten wurden Texte automatisch zu einem vorhandenen Gesangsdatensatz zugeordnet. Die neuen Datensätze wurden zum Trainieren neuer Modelle genutzt, welche deutliche Verbesserungen gegenüber sprachbasierten Modellen bieten. Auf diesen verbesserten akustischen Modellen aufbauend wurden Algorithmen aus der Spracherkennung für die verschiedenen Aufgaben angepasst, entweder durch das Verbessern der Robustheit gegenüber Gesangscharakteristika oder durch das Ausnutzen von hilfreichen Besonderheiten von Gesang. Beispiele für die verbesserte Robustheit sind der Einsatz von Keyword-Filler-HMMs für die Schlagwortsuche, ein i-Vector-Ansatz für die Sprachenidentifikation sowie eine Methode für das Alignment und die Textsuche, die stark schwankende Phonemdauern nicht bestraft. Die Besonderheiten von Gesang werden auf verschiedene Weisen genutzt: So z.B. in einem Ansatz für die Sprachenidentifikation, der lange Aufnahmen benötigt; in einer Methode für die Schlagwortsuche, die bekannte Phonemdauern in Gesang mit einbezieht; und in einem Algorithmus für das Alignment und die Textsuche, der bekannte Phonemkonfusionen verwertet

Digitale Bibliothek Thüringen

Classification of musical genres using hidden Markov models

Author: Dalin-Volsing Sebastian
Publication venue: Lunds universitet/Matematisk statistik
Publication date: 01/01/2017
Field of study

The music content online is expanding fast, and music streaming services are in need for algorithms that sort new music. Sorting music by their characteristics often comes down to considering the genre of the music. Numerous studies have been made on automatic classiﬁcation of audio ﬁles using spectral analysis and machine learning methods. However, many of the completed studies have been unrealistic in terms of usefulness in real settings, choosing genres that are very dissimilar. The aim of this master’s thesis is to try a more realistic scenario, with genres of which the border between them is uncertain, such as Pop and R&B. Mel-frequency cepstral coefﬁcients (MFCCs) were extracted from audio ﬁles and used as a multidimensional Gaussian input to a hidden Markov model (HMM) to classify the four genres Pop, Jazz, Classical and R&B. An alternative method is tested, using a more theoretical approach of music characteristics to improve classiﬁcation. The maximum total accuracy obtained when tested on an external test set was 0.742 for audio data, and 0.540 for theoretical data, implying that a combination of the two methods will not result in an increase of accuracy. Different methods of evaluation and possible alternative approaches are discussed

An Effective Music Information Retrieval Method Using Three-Dimensional Continuous DP

Author: Heo Sung-Phil
Ito Akinori
Makino Shozo
Suzuki Motoyuki
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/03/2010
Field of study

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Institutional Repositories DataBase (IRDB)

A Comprehensive Trainable Error Model for Sung Music Queries

Author: Birmingham W. P.
Meek C. J.
Publication venue: 'AI Access Foundation'
Publication date: 30/06/2011
Field of study

We propose a model for errors in sung queries, a variant of the hidden Markov model (HMM). This is a solution to the problem of identifying the degree of similarity between a (typically error-laden) sung query and a potential target in a database of musical works, an important problem in the field of music information retrieval. Similarity metrics are a critical component of query-by-humming (QBH) applications which search audio and multimedia databases for strong matches to oral queries. Our model comprehensively expresses the types of error or variation between target and query: cumulative and non-cumulative local errors, transposition, tempo and tempo changes, insertions, deletions and modulation. The model is not only expressive, but automatically trainable, or able to learn and generalize from query examples. We present results of simulations, designed to assess the discriminatory potential of the model, and tests with real sung queries, to demonstrate relevance to real-world applications

arXiv.org e-Print Archive

Crossref

MILITARY COMMUNICATIONS AND INFORMATION SYSTEMS CONFERENCE Is speech technology ready for use now?

Author: Dirk Kolb
Ulla Uebler
Publication venue
Publication date
Field of study

Research and development in speech technology has been performed for almost 30 years now. Coming from experimental systems, a set of products has been developed in this time. From the view of the potential users, the main question remains: has the technology reached a state where it can be used meaningfully? This paper will discuss this question- and will give an overview of the tasks that speech recognition can solve and the state for usability for each of the task

CiteSeerX

A comparative evaluation of search techniques for query-by-humming using the MUSART testbed

Author: Bainbridge
Bainbridge
Clausen
Dannenberg
Dannenberg
Doraisamy
Doraisamy
Downie
Durbin
Durey
Goto
Hewlett
Hoos
Hsu
Hsu
Hu
Jin
Mazzoni
McNab
Meek
Meek
Mongeau
Pardo
Pardo
Pardo
Pauws
Rabiner
Salton
Salton
Sankoff
Tolonen
Tseng
Uitdenbogerd
Publication venue: 'Wiley'
Publication date: 01/01/2007
Field of study

Crossref