4 research outputs found
Automatic transcription of polyphonic music exploiting temporal evolution
PhDAutomatic music transcription is the process of converting an audio recording
into a symbolic representation using musical notation. It has numerous applications
in music information retrieval, computational musicology, and the
creation of interactive systems. Even for expert musicians, transcribing polyphonic
pieces of music is not a trivial task, and while the problem of automatic
pitch estimation for monophonic signals is considered to be solved, the creation
of an automated system able to transcribe polyphonic music without setting
restrictions on the degree of polyphony and the instrument type still remains
open.
In this thesis, research on automatic transcription is performed by explicitly
incorporating information on the temporal evolution of sounds. First efforts address
the problem by focusing on signal processing techniques and by proposing
audio features utilising temporal characteristics. Techniques for note onset and
offset detection are also utilised for improving transcription performance. Subsequent
approaches propose transcription models based on shift-invariant probabilistic
latent component analysis (SI-PLCA), modeling the temporal evolution
of notes in a multiple-instrument case and supporting frequency modulations in
produced notes. Datasets and annotations for transcription research have also
been created during this work. Proposed systems have been privately as well as
publicly evaluated within the Music Information Retrieval Evaluation eXchange
(MIREX) framework. Proposed systems have been shown to outperform several
state-of-the-art transcription approaches.
Developed techniques have also been employed for other tasks related to music
technology, such as for key modulation detection, temperament estimation,
and automatic piano tutoring. Finally, proposed music transcription models
have also been utilized in a wider context, namely for modeling acoustic scenes
A User-assisted Approach to Multiple Instrument Music Transcription
PhDThe task of automatic music transcription has been studied for several decades
and is regarded as an enabling technology for a multitude of applications such
as music retrieval and discovery, intelligent music processing and large-scale
musicological analyses. It refers to the process of identifying the musical content
of a performance and representing it in a symbolic format. Despite its long
research history, fully automatic music transcription systems are still error prone
and often fail when more complex polyphonic music is analysed. This gives
rise to the question in what ways human knowledge can be incorporated in the
transcription process.
This thesis investigates ways to involve a human user in the transcription
process. More specifically, it is investigated how user input can be employed
to derive timbre models for the instruments in a music recording, which are
employed to obtain instrument-specific (parts-based) transcriptions.
A first investigation studies different types of user input in order to derive
instrument models by means of a non-negative matrix factorisation framework.
The transcription accuracy of the different models is evaluated and a method is
proposed that refines the models by allowing each pitch of each instrument to
be represented by multiple basis functions.
A second study aims at limiting the amount of user input to make the
method more applicable in practice. Different methods are considered to estimate
missing non-negative basis functions when only a subset of basis functions can
be extracted based on the user information.
A method is proposed to track the pitches of individual instruments over time
by means of a Viterbi framework in which the states at each time frame contain
several candidate instrument-pitch combinations. A transition probability is
employed that combines three different criteria: the frame-wise reconstruction
error of each combination, a pitch continuity measure that favours similar pitches
in consecutive frames, and an explicit activity model for each instrument. The
method is shown to outperform other state-of-the-art multi-instrument tracking
methods.
Finally, the extraction of instrument models that include phase information
is investigated as a step towards complex matrix decomposition. The phase
relations between the partials of harmonic sounds are explored as a time-invariant
property that can be employed to form complex-valued basis functions. The
application of the model for a user-assisted transcription task is illustrated with a saxophone example.QMU
Automatic transcription of the melody from polyphonic music
This dissertation addresses the problem of melody detection in polyphonic musical audio. The proposed algorithm uses a bottom-up design, in which each module leads to a more abstract representation of the audio data, which allows a very efficient computation of the melody. Nonetheless, the dataflow is not strictly unidirectional: on several occasions, feedback from higher processing modules controls the processing of low-level modules. The spectral analysis is based on a technique for the efficient computation of short-time Fourier spectra in different time-frequency resolutions. The pitch determination algorithm (PDA) is based on the pair-wise analysis of spectral peaks. Although melody detection implies a strong focus on the predominant voice, the proposed tone processing module aims at extracting multiple fundamental frequencies (F0). In order to identify the melody, the best succession of tones has to be chosen. This thesis describes an efficient computational method for auditory stream segregation that processes a variable number of simultaneous voices. The presented melody extraction algorithm has been evaluated during the MIREX audio melody extraction task. The MIREX results show that the proposed algorithm belongs to the state-of-the-art-algorithms, reaching the best overall accuracy in MIREX 2014.Diese Dissertation befasst sich mit dem Problem der Melodiextraktion aus polyphonem musikalischen Audio. Der vorgestellte Algorithmus umfasst ein „bottom-up“-Design, in dem jedes dieser Module eine abstraktere Darstellung der Audiodaten liefert, was eine effiziente Extraktion der Melodie erlaubt. Allerdings ist der Datenstrom nicht unidirektional -- bei verschiedenen Gelegenheiten steuert Feedback von höheren Verarbeitungsmodulen die Verarbeitung von vorangestellten Modulen. Die Spektralanalyse basiert auf einer Technik zur effizienten Berechnung von Kurzzeit-Fourier-Spektren in verschiedenen Zeit-Frequenz-Auflösungen. Der Pitchbestimmungsalgorithmus basiert auf der paarweisen Analyse von spektralen Maxima. Obwohl die Melodieextraktion einen starken Fokus auf die vorherrschende Stimme voraussetzt, zielt das Tonverabeitungsmodul auf eine Extraktion von allen auftretenden Grundfrequenzen (F0) ab. Um die Melodiestimme zu identifizieren, muss die beste Abfolge von Tönen ausgewählt werden. Diese Dissertation beschreibt eine effiziente Methode für die automatische Segregation von sogenannten auditiven Klangströmen. Dabei wird eine variable Anzahl von gleichzeitigen Stimmen verarbeitet. Der vorgestellte Melodieextraktionsalgorithmus wurde im MIREX „audio melody extraction task“ evaluiert. Die Resultate zeigen, dass der Algorithmus zum Stand der Technik gehört – es wurde die beste Gesamtgenauigkeit der im Jahr 2014 ausgewerteten Algorithmen erreicht