4,240 research outputs found
Understanding Optical Music Recognition
For over 50 years, researchers have been trying to teach computers to read music notation, referred to as Optical Music Recognition (OMR). However, this field is still difficult to access for new researchers, especially those without a significant musical background: Few introductory materials are available, and, furthermore, the field has struggled with defining itself and building a shared terminology. In this work, we address these shortcomings by (1) providing a robust definition of OMR and its relationship to related fields, (2) analyzing how OMR inverts the music encoding process to recover the musical notation and the musical semantics from documents, and (3) proposing a taxonomy of OMR, with most notably a novel taxonomy of applications. Additionally, we discuss how deep learning affects modern OMR research, as opposed to the traditional pipeline. Based on this work, the reader should be able to attain a basic understanding of OMR: its objectives, its inherent structure, its relationship to other fields, the state of the art, and the research opportunities it affords
Linking Sheet Music and Audio - Challenges and New Approaches
Score and audio files are the two most important ways to represent,
convey, record, store, and experience music. While score describes a piece of music on an abstract level using symbols such as notes, keys, and measures, audio files allow for reproducing a specific acoustic realization of the piece. Each of these representations reflects different facets of music yielding insights into aspects ranging from structural elements (e.g., motives, themes, musical form) to specific performance aspects (e.g., artistic shaping,
sound). Therefore, the simultaneous access to score and audio
representations is of great importance.
In this paper, we address the problem of automatically generating
musically relevant linking structures between the various data sources
that are available for a given piece of music. In particular, we discuss the task of sheet music-audio synchronization with the aim to link regions in images of scanned scores to musically corresponding sections in an audio recording of the same piece. Such linking structures form the basis for novel interfaces that allow users to access and explore multimodal sources of music within a single framework.
As our main contributions, we give an overview of the state-of-the-art for this kind of synchronization task, we present some novel approaches, and indicate future research directions. In particular, we address problems that arise in the presence of structural differences and discuss challenges when applying optical music recognition to complex orchestral scores. Finally, potential applications of the synchronization results are presented
Proceedings of the 4th International Workshop on Reading Music Systems
The International Workshop on Reading Music Systems (WoRMS) is a workshop
that tries to connect researchers who develop systems for reading music, such
as in the field of Optical Music Recognition, with other researchers and
practitioners that could benefit from such systems, like librarians or
musicologists.
The relevant topics of interest for the workshop include, but are not limited
to: Music reading systems; Optical music recognition; Datasets and performance
evaluation; Image processing on music scores; Writer identification; Authoring,
editing, storing and presentation systems for music scores; Multi-modal
systems; Novel input-methods for music to produce written music; Web-based
Music Information Retrieval services; Applications and projects; Use-cases
related to written music.
These are the proceedings of the 4th International Workshop on Reading Music
Systems, held online on Nov. 18th 2022.Comment: Proceedings edited by Jorge Calvo-Zaragoza, Alexander Pacha and Elona
Shatr
Recognition of handwritten music scores
The recognition of handwritten music scores still remains an open problem. The existing approaches can only deal with very simple handwritten scores mainly because of the variability in the handwriting style and the variability in the composition of groups of music notes (i.e. compound music notes). In this work on the one hand I study the isolated symbols (i.e half-note, quarter-note, clefs, sharps) and on the other hand the compound music notes. Firstly, I will separate the isolated symbols (i.e half-notes, quarter-notes, clefs, sharps) to the compounds and I will study each one separately. The isolated symbols will be recognized with symbol recognition methods and compounds with a primitive hierarchy and syntactic rules. The method has been tested using several handwritten music scores of the CVC-MUSCIMA database and compared with a commercial Optical Music Recognition software. Given that my method is learning-free, the obtained results are promising.El reconeixement de partitures musicals manuscrites segueix sent un problema obert. Els enfocaments existents només poden reconéixer partitures manuscrites molt simples, principalment a causa de la variabilitat en l'estil d'escriptura i la variabilitat en la composició dels grups de notes musicals (p.e. els símbols musicals compostos). En aquest treball, per començar, se separaran els símbols simples (p.e blanques, negres, claus, sostinguts) dels compostos i els estudiaré per separat. Els símbols simples mitjançant mètodes de reconeixement de símbols i els compostos a partir d'una jerarquia de primitives i regles sintàctiques. El meu mètode ha estat provat utilitzant diferents partitures de música escrita a mà de la base de dades CVC-MUSCIMA i comparat amb un programari de reconeixement òptic musical comercial. Tenint en compte que el meu mètode és d'aprenentatge lliure, els resultats obtinguts són prometedors.El reconocimiento de partituras musicales manuscritas sigue siendo un problema abierto. Los enfoques existentes sólo pueden reconocer partituras manuscritas muy simples, principalmente debido a la variabilidad en el estilo de escritura y la variabilidad en la composición de los grupos de notas musicales (p.e. los símbolos musicales compuestos). En este trabajo, para empezar, se separarán los símbolos simples (p.e blancas, negras, llaves, sostenidos) de los compuestos y los estudiaré por separado. Los símbolos simples mediante métodos de reconocimiento de símbolos y los compuestos a partir de una jerarquía de primitivas y reglas sintácticas. Mi método ha sido probado utilizando diferentes partituras de música escrita a mano de la base de datos CVC-MUSCIMA y comparado con un software de reconocimiento óptico musical comercial. Teniendo en cuenta que mi método es de aprendizaje libre, los resultados obtenidos son prometedores
End-to-end optical music recognition for pianoform sheet music
End-to-end solutions have brought about significant advances in the field of Optical Music Recognition. These approaches directly provide the symbolic representation of a given image of a musical score. Despite this, several documents, such as pianoform musical scores, cannot yet benefit from these solutions since their structural complexity does not allow their effective transcription. This paper presents a neural method whose objective is to transcribe these musical scores in an end-to-end fashion. We also introduce the GrandStaff dataset, which contains 53,882 single-system piano scores in common western modern notation. The sources are encoded in both a standard digital music representation and its adaptation for current transcription technologies. The method proposed in this paper is trained and evaluated using this dataset. The results show that the approach presented is, for the first time, able to effectively transcribe pianoform notation in an end-to-end manner.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This paper is part of the MultiScore project (PID2020-118447RA-I00), funded by MCIN/AEI/10.13039/501100011033. The first author is supported by Grant ACIF/2021/356 from the “Programa I+D+i de la Generalitat Valenciana.
Clustering by compression
We present a new method for clustering based on compression. The method
doesn't use subject-specific features or background knowledge, and works as
follows: First, we determine a universal similarity distance, the normalized
compression distance or NCD, computed from the lengths of compressed data files
(singly and in pairwise concatenation). Second, we apply a hierarchical
clustering method. The NCD is universal in that it is not restricted to a
specific application area, and works across application area boundaries. A
theoretical precursor, the normalized information distance, co-developed by one
of the authors, is provably optimal but uses the non-computable notion of
Kolmogorov complexity. We propose precise notions of similarity metric, normal
compressor, and show that the NCD based on a normal compressor is a similarity
metric that approximates universality. To extract a hierarchy of clusters from
the distance matrix, we determine a dendrogram (binary tree) by a new quartet
method and a fast heuristic to implement it. The method is implemented and
available as public software, and is robust under choice of different
compressors. To substantiate our claims of universality and robustness, we
report evidence of successful application in areas as diverse as genomics,
virology, languages, literature, music, handwritten digits, astronomy, and
combinations of objects from completely different domains, using statistical,
dictionary, and block sorting compressors. In genomics we presented new
evidence for major questions in Mammalian evolution, based on
whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta
hypothesis against the Theria hypothesis.Comment: LaTeX, 27 pages, 20 figure
Late multimodal fusion for image and audio music transcription
Music transcription, which deals with the conversion of music sources into a structured digital format, is a key problem for Music Information Retrieval (MIR). When addressing this challenge in computational terms, the MIR community follows two lines of research: music documents, which is the case of Optical Music Recognition (OMR), or audio recordings, which is the case of Automatic Music Transcription (AMT). The different nature of the aforementioned input data has conditioned these fields to develop modality-specific frameworks. However, their recent definition in terms of sequence labeling tasks leads to a common output representation, which enables research on a combined paradigm. In this respect, multimodal image and audio music transcription comprises the challenge of effectively combining the information conveyed by image and audio modalities. In this work, we explore this question at a late-fusion level: we study four combination approaches in order to merge, for the first time, the hypotheses regarding end-to-end OMR and AMT systems in a lattice-based search space. The results obtained for a series of performance scenarios–in which the corresponding single-modality models yield different error rates–showed interesting benefits of these approaches. In addition, two of the four strategies considered significantly improve the corresponding unimodal standard recognition frameworks.This paper is part of the I+D+i PID2020-118447RA-I00 (MultiScore) project, funded by MCIN/AEI/10.13039/501100011033. Some of the computing resources were provided by the Generalitat Valenciana and the European Union through the FEDER funding programme (IDIFEDER/2020/003). The first and second authors are respectively supported by grants FPU19/04957 from the Spanish Ministerio de Universidades and APOSTD/2020/256 from Generalitat Valenciana
- …