176 research outputs found
Automatic music transcription: challenges and future directions
Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects
Automatic Music Transcription: Breaking the Glass Ceiling
Automatic music transcription is considered by many to be the Holy Grail in the field of music signal analysis. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. In order to overcome the limited performance of transcription systems, algorithms have to be tailored to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information across different methods and musical aspects
A mechanical source of Turkish music from 18th-century London
A late eighteenth-century mechanical organ, built into a clock made by Henry Borrell of London, plays melodies that sound completely unlike those normally found on English domestic musical clocks. This article draws on the disciplines of historical musicology, ethnomusicology and horology to argue that these tunes derive directly from the repertory of the eighteenth-century Ottoman court. The melodies are analysed firstly in the context of the Ottoman repertory and then alongside contemporary European transcriptions of ‘exotic’ music, notably Edward Jones’s Lyric Airs. The distortion of ‘Turkish’ melodies in European representations is set within the wider context of orientalism and musical transculturation; this evidence is then brought to bear on interpreting music that may also be subject to mechanical distortion. The article ends with a consideration of how these earliest known sounding examples of Ottoman music may have arrived in London and its reception there
Zgodovina in izzivi digitalne etno/muzikologije v Sloveniji
Procese razumevanja glasbe kot niza pojavov, ki so tesno povezani z IT praksami iskanja glasbe v slovenski raziskovalni skupnosti, skiciramo s treh osnovnih vidikov: etnomuzikološkega, bibliotekarskega (bistvenega, ko se glasbi približamo računalniško) in IT. Članek ocenjuje doprinos teh perspektiv k razumevanju glasbe, in predlaga, da tri obravnavane perspektive niso poljubne
Probabilistic Segmentation of Folk Music Recordings
The paper presents a novel method for automatic segmentation of folk music field recordings. The method is based on a distance measure that uses dynamic time warping to cope with tempo variations and a dynamic programming approach to handle pitch drifting for finding similarities and estimating the length of repeating segment. A probabilistic framework based on HMM is used to find segment boundaries, searching for optimal match between the expected segment length, between-segment similarities, and likely locations of segment beginnings. Evaluation of several current state-of-the-art approaches for segmentation of commercial music is presented and their weaknesses when dealing with folk music are exposed, such as intolerance to pitch drift and variable tempo. The proposed method is evaluated and its performance analyzed on a collection of 206 folk songs of different ensemble types: solo, two- and three-voiced, choir, instrumental, and instrumental with singing. It outperforms current commercial music segmentation methods for noninstrumental music and is on a par with the best for instrumental recordings. The method is also comparable to a more specialized method for segmentation of solo singing folk music recordings
Automatic transcription of polyphonic music exploiting temporal evolution
PhDAutomatic music transcription is the process of converting an audio recording
into a symbolic representation using musical notation. It has numerous applications
in music information retrieval, computational musicology, and the
creation of interactive systems. Even for expert musicians, transcribing polyphonic
pieces of music is not a trivial task, and while the problem of automatic
pitch estimation for monophonic signals is considered to be solved, the creation
of an automated system able to transcribe polyphonic music without setting
restrictions on the degree of polyphony and the instrument type still remains
open.
In this thesis, research on automatic transcription is performed by explicitly
incorporating information on the temporal evolution of sounds. First efforts address
the problem by focusing on signal processing techniques and by proposing
audio features utilising temporal characteristics. Techniques for note onset and
offset detection are also utilised for improving transcription performance. Subsequent
approaches propose transcription models based on shift-invariant probabilistic
latent component analysis (SI-PLCA), modeling the temporal evolution
of notes in a multiple-instrument case and supporting frequency modulations in
produced notes. Datasets and annotations for transcription research have also
been created during this work. Proposed systems have been privately as well as
publicly evaluated within the Music Information Retrieval Evaluation eXchange
(MIREX) framework. Proposed systems have been shown to outperform several
state-of-the-art transcription approaches.
Developed techniques have also been employed for other tasks related to music
technology, such as for key modulation detection, temperament estimation,
and automatic piano tutoring. Finally, proposed music transcription models
have also been utilized in a wider context, namely for modeling acoustic scenes
The Macropolitics of Microsound: Gender and sexual identities in Barry Truax’s "Song of Songs".
This analysis explores how Barry Truax’s Song of Songs
(1992) for oboe d’amore, English horn and two digital
soundtracks reorients prevailing norms of sexuality by playing
with musical associations and aural conventions of how gender
sounds. The work sets the erotic dialogue between King
Solomon and Shulamite from the biblical Song of Solomon
text. On the soundtracks we hear a Christian monk’s song,
environmental sounds (birds, cicadas and bells), and two
speakers who recite the biblical text in its entirety preserving
the gendered pronouns of the original. By attending to
established gender norms, Truax confirms the identity of each
speaker, such that the speakers seemingly address one another
as a duet, but the woman also addresses a female lover and the
man a male. These gender categories are then progressively
blurred with granular time-stretching and harmonisation
(which transform the timbre of the voices), techniques that,
together, resituate the presumed heteronormative text within
a diverse constellation of possible sexual orientations
The Macropolitics of Microsound: Gender and sexual identities in Barry Truax’s "Song of Songs".
This analysis explores how Barry Truax’s Song of Songs
(1992) for oboe d’amore, English horn and two digital
soundtracks reorients prevailing norms of sexuality by playing
with musical associations and aural conventions of how gender
sounds. The work sets the erotic dialogue between King
Solomon and Shulamite from the biblical Song of Solomon
text. On the soundtracks we hear a Christian monk’s song,
environmental sounds (birds, cicadas and bells), and two
speakers who recite the biblical text in its entirety preserving
the gendered pronouns of the original. By attending to
established gender norms, Truax confirms the identity of each
speaker, such that the speakers seemingly address one another
as a duet, but the woman also addresses a female lover and the
man a male. These gender categories are then progressively
blurred with granular time-stretching and harmonisation
(which transform the timbre of the voices), techniques that,
together, resituate the presumed heteronormative text within
a diverse constellation of possible sexual orientations
Real-time detection of overlapping sound events with non-negative matrix factorization
International audienceIn this paper, we investigate the problem of real-time detection of overlapping sound events by employing non-negative matrix factorization techniques. We consider a setup where audio streams arrive in real-time to the system and are decomposed onto a dictionary of event templates learned off-line prior to the decomposition. An important drawback of existing approaches in this context is the lack of controls on the decomposition. We propose and compare two provably convergent algorithms that address this issue, by controlling respectively the sparsity of the decomposition and the trade-off of the decomposition between the different frequency components. Sparsity regularization is considered in the framework of convex quadratic programming, while frequency compromise is introduced by employing the beta-divergence as a cost function. The two algorithms are evaluated on the multi-source detection tasks of polyphonic music transcription, drum transcription and environmental sound recognition. The obtained results show how the proposed approaches can improve detection in such applications, while maintaining low computational costs that are suitable for real-time
- …