Search CORE

3,654 research outputs found

Investigating the Perceptual Validity of Evaluation Metrics for Automatic Piano Music Transcription

Author: Benetos E
Liu L
Pearce M
Ycart A
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 01/01/2020
Field of study

Automatic Music Transcription (AMT) is usually evaluated using low-level criteria, typically by counting the numbers of errors, with equal weighting. Yet, some errors (e.g. out-of-key notes) are more salient than others. In this study, we design an online listening test to gather judgements about AMT quality. These judgements take the form of pairwise comparisons of transcriptions of the same music by pairs of different AMT systems. We investigate how these judgements correlate with benchmark metrics, and find that although they match in many cases, agreement drops when comparing pairs with similar scores, or pairs of poor transcriptions. We show that onset-only notewise F-measure is the benchmark metric that correlates best with human judgement, all the more so with higher onset tolerance thresholds. We define a set of features related to various musical attributes, and use them to design a new metric that correlates significantly better with listeners' quality judgements. We examine which musical aspects were important to raters by conducting an ablation study on the defined metric, highlighting the importance of the rhythmic dimension (tempo, meter). We make the collected data entirely available for further study, in particular to evaluate the perceptual relevance of new AMT metrics

Queen Mary Research Online

Agreement among human and annotated transcriptions of global songs

Author: 22nd International Society for Music Information Retrieval Conference (ISMIR)
Benetos E
Fujii S
Fukatsu H
Kondo H
McBride J
Ozaki Y
Pfordresher PQ
Proutskova P
Sakai E
Savage PE
Six J
T. Tierney A
Publication venue: International Society for Music Information Retrieval
Publication date: 09/11/2021
Field of study

Cross-cultural musical analysis requires standardized symbolic representation of sounds such as score notation. However, transcription into notation is usually conducted manually by ear, which is time-consuming and subjective. Our aim is to evaluate the reliability of existing methods for transcribing songs from diverse societies. We had 3 experts independently transcribe a sample of 32 excerpts of traditional monophonic songs from around the world (half a cappella, half with instrumental accompaniment). 16 songs also had pre-existing transcriptions created by 3 different experts. We compared these human transcriptions against one another and against 10 automatic music transcription algorithms. We found that human transcriptions can be sufficiently reliable (~90% agreement, κ ~.7), but current automated methods are not (<60% agreement, κ <.4). No automated method clearly outperformed others, in contrast to our predictions. These results suggest that improving automated methods for cross-cultural music transcription is critical for diversifying MIR

Queen Mary Research Online

Annotator subjectivity in harmony annotations of popular music

Author: Bransen J.
Burgoyne J.A.
de Haas W.B.
Kent-Muller A.
Koops H.V.
Volk A.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2019
Field of study

International Migration, Integration and Social Cohesion online publications

Leveraging Contextual Cues for Generating Basketball Highlights

Author: R.
Shi J.
Smith R.
Xiong Z.
Publication venue
Publication date: 29/06/2016
Field of study

The massive growth of sports videos has resulted in a need for automatic generation of sports highlights that are comparable in quality to the hand-edited highlights produced by broadcasters such as ESPN. Unlike previous works that mostly use audio-visual cues derived from the video, we propose an approach that additionally leverages contextual cues derived from the environment that the game is being played in. The contextual cues provide information about the excitement levels in the game, which can be ranked and selected to automatically produce high-quality basketball highlights. We introduce a new dataset of 25 NCAA games along with their play-by-play stats and the ground-truth excitement data for each basket. We explore the informativeness of five different cues derived from the video and from the environment through user studies. Our experiments show that for our study participants, the highlights produced by our system are comparable to the ones produced by ESPN for the same games.Comment: Proceedings of ACM Multimedia 201

arXiv.org e-Print Archive

Crossref

Identifying Listener-informed Features for Modeling Time-varying Emotion Perception

Author: Barthet M
Chew E
Inter- national Symposium on Computer Music Multidisciplinary Research
Yang S
Publication venue
Publication date: 13/10/2019
Field of study

Queen Mary Research Online

Sketching sounds: an exploratory study on sound-shape associations

Author: Barthet M
Fazekas G
International Computer Music Conference
Löbbers S
Publication venue: International Computer Music Conference
Publication date: 01/07/2021
Field of study

Sound synthesiser controls typically correspond to technical parameters of signal processing algorithms rather than intuitive sound descriptors that relate to human perception of sound. This makes it difficult to realise sound ideas in a straightforward way. Cross-modal mappings, for example between gestures and sound, have been suggested as a more intuitive control mechanism. A large body of research shows consistency in human associations between sounds and shapes. However, the use of drawings to drive sound synthesis has not been explored to its full extent. This pa- per presents an exploratory study that asked participants to sketch visual imagery of sounds with a monochromatic digital drawing interface, with the aim to identify different representational approaches and determine whether timbral sound characteristics can be communicated reliably through visual sketches. Results imply that the development of a synthesiser exploiting sound-shape associations is feasible, but a larger and more focused dataset is needed in followup studies

Queen Mary Research Online