1 research outputs found
A User-assisted Approach to Multiple Instrument Music Transcription
PhDThe task of automatic music transcription has been studied for several decades
and is regarded as an enabling technology for a multitude of applications such
as music retrieval and discovery, intelligent music processing and large-scale
musicological analyses. It refers to the process of identifying the musical content
of a performance and representing it in a symbolic format. Despite its long
research history, fully automatic music transcription systems are still error prone
and often fail when more complex polyphonic music is analysed. This gives
rise to the question in what ways human knowledge can be incorporated in the
transcription process.
This thesis investigates ways to involve a human user in the transcription
process. More specifically, it is investigated how user input can be employed
to derive timbre models for the instruments in a music recording, which are
employed to obtain instrument-specific (parts-based) transcriptions.
A first investigation studies different types of user input in order to derive
instrument models by means of a non-negative matrix factorisation framework.
The transcription accuracy of the different models is evaluated and a method is
proposed that refines the models by allowing each pitch of each instrument to
be represented by multiple basis functions.
A second study aims at limiting the amount of user input to make the
method more applicable in practice. Different methods are considered to estimate
missing non-negative basis functions when only a subset of basis functions can
be extracted based on the user information.
A method is proposed to track the pitches of individual instruments over time
by means of a Viterbi framework in which the states at each time frame contain
several candidate instrument-pitch combinations. A transition probability is
employed that combines three different criteria: the frame-wise reconstruction
error of each combination, a pitch continuity measure that favours similar pitches
in consecutive frames, and an explicit activity model for each instrument. The
method is shown to outperform other state-of-the-art multi-instrument tracking
methods.
Finally, the extraction of instrument models that include phase information
is investigated as a step towards complex matrix decomposition. The phase
relations between the partials of harmonic sounds are explored as a time-invariant
property that can be employed to form complex-valued basis functions. The
application of the model for a user-assisted transcription task is illustrated with a saxophone example.QMU