4 research outputs found
BEAT AND METER EXTRACTION USING GAUSSIFIED ONSETS
Rhythm, beat and meter are key concepts of music in general. Many efforts had been made in the last years to automatically extract beat and meter from a piece of music given either in audio or symbolical representation (see e.g. [11] for an overview). In this paper we propose a new method for extracting beat, meter and phase information from a list of unquantized onset times. The procedure relies on a novel method called âGaussification â and adopts correlation techniques combined with findings from music psychology for parameter settings. 1
Automatisierte Extraktion rhythmischer Merkmale zur Anwendung in Music Information Retrieval-Systemen
This thesis describes the automated extraction of features for the
description of the rhythmic content of musical audio signals. These
features are selected with respect to their applicability in music
information retrieval (MIR) systems.
While research on automatic extraction of rhythmic features, for example
tempo and time signature has been in progress for some time, current
algorithms still seem to be a long way from matching human recognition
performance. Among the reasons of the difference between the performances
of a machine listening system and a trained listener are the use of
information on different levels of abstraction and musical knowledge in
human cognition. The approach described here is influenced by these two
principles of cognition.
In order to identify appropriate features and relevant aspects of human
processing of audio signals the necessary knowledge of musicology,
psychoacoustics and cognition science are described.Subsequently, the
description of the state-of-the-art comprises known methods for the
extraction of rhythmic features from musical audio signals. The main part
of the thesis contains a collection of machine-listening methods evaluating
information on different levels of abstraction. A compact representation of
metrical structure of musical audio signals is proposed.The evaluation of
low-level features enables the application of musical knowledge to a
minimal degree only. On the other hand it becomes apparent, that the
processing of high-level features is prone to errors due to the propagation
of the errors in the extraction process of this information. This motivates
the joint evaluation of low- and high-level information depending on their
reliability.
The extraction of rhythmic features from information of automated detected
percussive instruments represents a technical progress compared to the
state-of-the-art. The segmentation of the audio signals in characteristic
and similar regions representing verse or chorus for example is introduced
as a valuable pre-processing step. The achieved significant improvements of
the recognition rate are proved with real-world test data.
The performances of the developed methods are evaluated using a large
corpus of test data and the applicability of the extracted features for the
use in an exemplary MIR-system is examined.Das Thema dieser Dissertation ist die Extraktion von Merkmalen, die
rhythmische Eigenschaften von Audiosignalen beschreiben. Diese Merkmale
sind fĂŒr die Anwendung in Music Information Retrieval (MIR)-Systemen
ausgewÀhlt.
Obwohl in der Vergangenheit an der Extraktion rhythmischer Merkmale wie zum
Beispiel Tempo und Taktart in groĂem Umfang gearbeitet wurde, erreichen
aktuelle Verfahren nicht die Erkennungsleistung eines geĂŒbten Zuhörers.
Eine der Ursache dafĂŒr wird in der Auswertung von Informationen auf
unterschiedlichen Abstraktionsebenen beim Menschen vermutet, eine weitere
bei der BerĂŒcksichtigung von \mbox{musikalischem} Vorwissen. Der hier
beschriebene Ansatz orientiert sich an diesen Analyse\-mechanismen.
Zur Identifikation von geeigneten Merkmalen und relevanten Aspekten der
menschlichen Verarbeitung der Schallsignale werden Grundlagen aus
Musiktheorie, Psychoakustik und Kognitionswissenschaft erklÀrt. Bekannte
Verfahren zur Extraktion rhythmischer Merkmale werden in einer
ausfĂŒhrlichen Darstellung des Standes der Technik anschlieĂend erlĂ€utert.
Der Hauptteil der Arbeit enthÀlt eine Zusammenstellung von Verfahren des
maschinellen Hörens, die Informationen auf unterschiedlichen
Abstraktionsebenen auswerten. Eine kompakte Darstellung der metrischen
Struktur wird zur Ermittlung der metrischen Merkmale vorgestellt.Da
einerseits die Auswertung von Low-level-Merkmalen die Anwendung von
musikalischem Vorwissen nur in geringen Maà ermöglicht, und andererseits
die Informationen auf höheren Abstraktionsebenen durch ihre
Fehlerhaftigkeit die Erkennungsleistung in verschiedenen Situationen
einschrÀnken können, werden die Ergebnisse der verschiedenen Verfahren in
AbhĂ€ngigkeit ihrer KonfidenzmaĂe zu einem Gesamtergebnis zusammengefasst.
Die Extraktion von rhythmischen Merkmalen aus den Informationen maschinell
detektierter perkussiver Instrumente stellt einen Fortschritt im Vergleich
zu bekannten Arbeiten dar. Eine Segmentierung in charakteristische
Abschnitte des Audiosignals, die zum Beispiel Strophe oder Refrain
reprÀsentieren, wird als Vorverarbeitungsschritt zur Analyse vorgestellt
und die dadurch erreichte signifikante Verbesserung der
Erkennungs\-leistung nachgewiesen.
Die LeistungsfÀhigkeit der Verfahren wird anhand eines umfangreichen
Testdatensatzes evaluiert und die Eignung der extrahierten Merkmale in
einem MIR-System untersucht
Content-based retrieval of melodies using artificial neural networks
Human listeners are capable of spontaneously organizing and remembering a continuous stream of musical notes. A listener automatically segments a melody into phrases, from which an entire melody may be learnt and later recognized. This ability makes human listeners ideal for the task of retrieving melodies by content. This research introduces two neural networks, known as SONNETMAP and _ReTREEve, which attempt to model this behaviour. SONNET-MAP functions as a melody segmenter, whereas ReTREEve is specialized towards content-based retrieval (CBR).
Typically, CBR systems represent melodies as strings of symbols drawn from a finite alphabet, thereby reducing the retrieval process to the task of approximate string matching. SONNET-MAP and ReTREEwe, which are derived from Nigrinâs SONNET architecture, offer a novel approach to these traditional systems, and indeed CBR in general. Based on melodic grouping cues, SONNETMAP segments a melody into phrases. Parallel SONNET modules form independent, sub-symbolic representations of the pitch and rhythm dimensions of each phrase. These representations are then bound using associative maps, forming a two-dimensional representation of each phrase. This organizational scheme enables SONNET-MAP to segment melodies into phrases using both the pitch and rhythm features of each melody. The boundary points formed by these melodic phrase segments are then utilized to populate the iieTREEve network.
ReTREEw is organized in the same parallel fashion as SONNET-MAP. However, in addition, melodic phrases are aggregated by an additional layer; thus forming a two-dimensional, hierarchical memory structure of each entire melody. Melody retrieval is accomplished by matching input queries, whether perfect (for example, a fragment from the original melody) or imperfect (for example, a fragment derived from humming), against learned phrases and phrase sequence templates. Using a sample of fifty melodies composed by The Beatles , results show th a t the use of both pitch and rhythm during the retrieval process significantly improves retrieval results over networks that only use either pitch o r rhythm. Additionally, queries that are aligned along phrase boundaries are retrieved using significantly fewer notes than those that are not, thus indicating the importance of a human-based approach to melody segmentation. Moreover, depending on query degradation, different melodic features prove more adept at retrieval than others.
The experiments presented in this thesis represent the largest empirical test of SONNET-based networks ever performed. As far as we are aware, the combined SONNET-MAP and -ReTREEue networks constitute the first self-organizing CBR system capable of automatic segmentation and retrieval of melodies using various features of pitch and rhythm