2,145 research outputs found
Computational Methods for the Alignment and Score-Informed Transcription of Piano Music
PhDThis thesis is concerned with computational methods for alignment and score-informed
transcription of piano music. Firstly, several methods are proposed to improve the alignment
robustness and accuracywhen various versions of one piece of music showcomplex
differences with respect to acoustic conditions or musical interpretation. Secondly, score
to performance alignment is applied to enable score-informed transcription.
Although music alignment methods have considerably improved in accuracy in recent
years, the task remains challenging. The research in this thesis aims to improve the
robustness for some cases where there are substantial differences between versions and
state-of-the-art methods may fail in identifying a correct alignment. This thesis first exploits
the availability of multiple versions of the piece to be aligned. By processing these
jointly, the alignment process can be stabilised by exploiting additional examples of how
a section might be interpreted or which acoustic conditions may arise. Two methods are
proposed, progressive alignment and profile HMM, both adapted from the multiple biological
sequence alignment task. Experiments demonstrate that these methods can indeed
improve the alignment accuracy and robustness over comparable pairwise methods.
Secondly, this thesis presents a score to performance alignment method that can improve
the robustness in cases where some musical voices, such as the melody, are played asynchronously
to others – a stylistic device used in musical expression. The asynchronies between
the melody and the accompaniment are handled by treating the voices as separate
timelines in a multi-dimensional variant of dynamic time warping (DTW). The method
measurably improves the alignment accuracy for pieces with asynchronous voices and
preserves the accuracy otherwise.
Once an accurate alignment between a score and an audio recording is available, the
score information can be exploited as prior knowledge in automatic music transcription
(AMT), for scenarios where score is available, such as music tutoring. Score-informed dictionary
learning is used to learn the spectral pattern of each pitch that describes the energy
distribution of the associated notes in the recording. More precisely, the dictionary learning
process in non-negative matrix factorization (NMF) is constrained using the aligned
score. This way, by adapting the dictionary to a given recording, the proposed method
improves the accuracy over the state-of-the-art.China Scholarship Council
Multimodal music information processing and retrieval: survey and future challenges
Towards improving the performance in various music information processing
tasks, recent studies exploit different modalities able to capture diverse
aspects of music. Such modalities include audio recordings, symbolic music
scores, mid-level representations, motion, and gestural data, video recordings,
editorial or cultural tags, lyrics and album cover arts. This paper critically
reviews the various approaches adopted in Music Information Processing and
Retrieval and highlights how multimodal algorithms can help Music Computing
applications. First, we categorize the related literature based on the
application they address. Subsequently, we analyze existing information fusion
approaches, and we conclude with the set of challenges that Music Information
Retrieval and Sound and Music Computing research communities should focus in
the next years
Automatic music transcription: challenges and future directions
Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects
Automatic transcription of polyphonic music exploiting temporal evolution
PhDAutomatic music transcription is the process of converting an audio recording
into a symbolic representation using musical notation. It has numerous applications
in music information retrieval, computational musicology, and the
creation of interactive systems. Even for expert musicians, transcribing polyphonic
pieces of music is not a trivial task, and while the problem of automatic
pitch estimation for monophonic signals is considered to be solved, the creation
of an automated system able to transcribe polyphonic music without setting
restrictions on the degree of polyphony and the instrument type still remains
open.
In this thesis, research on automatic transcription is performed by explicitly
incorporating information on the temporal evolution of sounds. First efforts address
the problem by focusing on signal processing techniques and by proposing
audio features utilising temporal characteristics. Techniques for note onset and
offset detection are also utilised for improving transcription performance. Subsequent
approaches propose transcription models based on shift-invariant probabilistic
latent component analysis (SI-PLCA), modeling the temporal evolution
of notes in a multiple-instrument case and supporting frequency modulations in
produced notes. Datasets and annotations for transcription research have also
been created during this work. Proposed systems have been privately as well as
publicly evaluated within the Music Information Retrieval Evaluation eXchange
(MIREX) framework. Proposed systems have been shown to outperform several
state-of-the-art transcription approaches.
Developed techniques have also been employed for other tasks related to music
technology, such as for key modulation detection, temperament estimation,
and automatic piano tutoring. Finally, proposed music transcription models
have also been utilized in a wider context, namely for modeling acoustic scenes
Score-Informed Source Separation for Music Signals
In recent years, the processing of audio recordings by exploiting additional musical knowledge has turned out to be a promising research direction. In particular, additional note information as specified by a musical score or a MIDI file has been employed to support various audio processing tasks such as source separation, audio parameterization, performance analysis, or instrument equalization. In this contribution, we provide an overview of approaches for score-informed source separation and illustrate their potential by discussing innovative applications and interfaces. Additionally, to illustrate some basic principles behind these approaches, we demonstrate how score information can be integrated into the well-known non-negative matrix factorization (NMF) framework. Finally, we compare this approach to advanced methods based on parametric models
From heuristics-based to data-driven audio melody extraction
The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications
- …