461 research outputs found
Automatic music transcription: challenges and future directions
Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects
Fusion of Multimodal Information in Music Content Analysis
Music is often processed through its acoustic realization. This is restrictive in the sense that music is clearly a highly multimodal concept where various types of heterogeneous information can be associated to a given piece of music (a musical score, musicians\u27 gestures, lyrics, user-generated metadata, etc.). This has recently led researchers to apprehend music through its various facets, giving rise to "multimodal music analysis" studies. This article gives a synthetic overview of methods that have been successfully employed in multimodal signal analysis. In particular, their use in music content processing is discussed in more details through five case studies that highlight different multimodal integration techniques. The case studies include an example of cross-modal correlation for music video analysis, an audiovisual drum transcription system, a description of the concept of informed source separation, a discussion of multimodal dance-scene analysis, and an example of user-interactive music analysis. In the light of these case studies, some perspectives of multimodality in music processing are finally suggested
Score-Informed Source Separation for Music Signals
In recent years, the processing of audio recordings by exploiting additional musical knowledge has turned out to be a promising research direction. In particular, additional note information as specified by a musical score or a MIDI file has been employed to support various audio processing tasks such as source separation, audio parameterization, performance analysis, or instrument equalization. In this contribution, we provide an overview of approaches for score-informed source separation and illustrate their potential by discussing innovative applications and interfaces. Additionally, to illustrate some basic principles behind these approaches, we demonstrate how score information can be integrated into the well-known non-negative matrix factorization (NMF) framework. Finally, we compare this approach to advanced methods based on parametric models
Visual analysis for drum sequence transcription
A system is presented for analysing drum performance video sequences. A novel ellipse detection algorithm is introduced that automatically locates drum tops. This algorithm fits ellipses to edge clusters, and ranks them according to various fitness criteria. A background/foreground segmentation method is then used to extract the silhouette of the drummer and drum sticks. Coupled with a motion
intensity feature, this allows for the detection of âhitsâ in each of the extracted regions. In order to obtain a transcription of the performance, each of these regions is automatically labeled with the corresponding instrument class. A partial audio transcription and color cues are used to measure the compatibility between a region and its label, the Kuhn-Munkres algorithm is then employed to find the optimal labeling. Experimental results demonstrate the ability of visual analysis to enhance the performance of an audio drum transcription system
Music Information Retrieval Meets Music Education
This paper addresses the use of Music Information Retrieval (MIR) techniques in music education and their integration in learning software. A general overview of systems that are either commercially available or in research stage is presented. Furthermore, three well-known MIR methods used in music learning systems and their state-of-the-art are described: music transcription, solo and accompaniment track creation, and generation of performance instructions. As a representative example of a music learning system developed within the MIR community, the Songs2See software is outlined. Finally, challenges and directions for future research are described
Towards Automated Processing of Folk Song Recordings
Folk music is closely related to the musical culture of a
specific nation or region. Even though folk songs have been
passed down mainly by oral tradition, most musicologists study
the relation between folk songs on the basis of symbolic music
descriptions, which are obtained by transcribing recorded tunes
into a score-like representation. Due to the complexity of
audio recordings, once having the transcriptions, the original
recorded tunes are often no longer used in the actual folk song
research even though they still may contain valuable
information. In this paper, we present various techniques for
making audio recordings more easily accessible for music
researchers. In particular, we show how one can use
synchronization techniques to automatically segment and
annotate the recorded songs. The processed audio recordings can
then be made accessible along with a symbolic transcript by
means of suitable visualization, searching, and navigation
interfaces to assist folk song researchers to conduct large
scale investigations comprising the audio material
A supervised classification approach for note tracking in polyphonic piano transcription
In the field of Automatic Music Transcription, note tracking systems constitute a key process in the overall success of the task as they compute the expected note-level abstraction out of a frame-based pitch activation representation. Despite its relevance, note tracking is most commonly performed using a set of hand-crafted rules adjusted in a manual fashion for the data at issue. In this regard, the present work introduces an approach based on machine learning, and more precisely supervised classification, that aims at automatically inferring such policies for the case of piano music. The idea is to segment each pitch band of a frame-based pitch activation into single instances which are subsequently classified as active or non-active note events. Results using a comprehensive set of supervised classification strategies on the MAPS piano data-set report its competitiveness against other commonly considered strategies for note tracking as well as an improvement of more than +10% in terms of F-measure when compared to the baseline considered for both frame-level and note-level evaluations.This research work is partially supported by Universidad de Alicante through the FPU program [UAFPU2014â5883] and the Spanish Ministerio de EconomĂa y Competitividad through project TIMuL [No. TIN2013â48152âC2â1âR, supported by EU FEDER funds]. EB is supported by a UK RAEng Research Fellowship [grant number RF/128]
- âŠ