Search CORE

2,473 research outputs found

Adaptive Resonance: An Emerging Neural Theory of Cognition

Author: Carpenter Gail
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/09/2000
Field of study

Adaptive resonance is a theory of cognitive information processing which has been realized as a family of neural network models. In recent years, these models have evolved to incorporate new capabilities in the cognitive, neural, computational, and technological domains. Minimal models provide a conceptual framework, for formulating questions about the nature of cognition; an architectural framework, for mapping cognitive functions to cortical regions; a semantic framework, for precisely defining terms; and a computational framework, for testing hypotheses. These systems are here exemplified by the distributed ART (dART) model, which generalizes localist ART systems to allow arbitrarily distributed code representations, while retaining basic capabilities such as stable fast learning and scalability. Since each component is placed in the context of a unified real-time system, analysis can move from the level of neural processes, including learning laws and rules of synaptic transmission, to cognitive processes, including attention and consciousness. Local design is driven by global functional constraints, with each network synthesizing a dynamic balance of opposing tendencies. The self-contained working ART and dART models can also be transferred to technology, in areas that include remote sensing, sensor fusion, and content-addressable information retrieval from large databases.Office of Naval Research and the defense Advanced Research Projects Agency (N00014-95-1-0409, N00014-1-95-0657); National Institutes of Health (20-316-4304-5

Boston University Institutional Repository (OpenBU)

Understanding Optical Music Recognition

Author: Calvo-Zaragoza Jorge
Hajič Jr. Jan
Pacha Alexander
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2020
Field of study

For over 50 years, researchers have been trying to teach computers to read music notation, referred to as Optical Music Recognition (OMR). However, this field is still difficult to access for new researchers, especially those without a significant musical background: Few introductory materials are available, and, furthermore, the field has struggled with defining itself and building a shared terminology. In this work, we address these shortcomings by (1) providing a robust definition of OMR and its relationship to related fields, (2) analyzing how OMR inverts the music encoding process to recover the musical notation and the musical semantics from documents, and (3) proposing a taxonomy of OMR, with most notably a novel taxonomy of applications. Additionally, we discuss how deep learning affects modern OMR research, as opposed to the traditional pipeline. Based on this work, the reader should be able to attain a basic understanding of OMR: its objectives, its inherent structure, its relationship to other fields, the state of the art, and the research opportunities it affords

Repositorio Institucional de la Universidad de Alicante

arXiv.org e-Print Archive

End-to-End Neural Optical Music Recognition of Monophonic Scores

Author: Brook
Meredith
Rabiner
Selfridge-Field
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

[EN] Optical Music Recognition is a field of research that investigates how to computationally decode music notation from images. Despite the efforts made so far, there are hardly any complete solutions to the problem. In this work, we study the use of neural networks that work in an end-to-end manner. This is achieved by using a neural model that combines the capabilities of convolutional neural networks, which work on the input image, and recurrent neural networks, which deal with the sequential nature of the problem. Thanks to the use of the the so-called Connectionist Temporal Classification loss function, these models can be directly trained from input images accompanied by their corresponding transcripts into music symbol sequences. We also present the Printed Images of Music Staves (PrIMuS) dataset, containing more than 80,000 monodic single-staff real scores in common western notation, that is used to train and evaluate the neural approach. In our experiments, it is demonstrated that this formulation can be carried out successfully. Additionally, we study several considerations about the codification of the output musical sequences, the convergence and scalability of the neural models, as well as the ability of this approach to locate symbols in the input score.This work was supported by the Social Sciences and Humanities Research Council of Canada, and the Spanish Ministerio de Economia y Competitividad through Project HISPAMUS Ref. No. TIN2017-86576-R (supported by UE FEDER funds).Calvo-Zaragoza, J.; Rizo, D. (2018). End-to-End Neural Optical Music Recognition of Monophonic Scores. Applied Sciences. 8(4). https://doi.org/10.3390/app8040606S8

Repositorio Institucional de la Universidad de Alicante

Multidisciplinary Digital Publishing Institute

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

RiuNet

Retrieving Music Semantics from Optical Music Recognition by Machine Translation

Author: Calvo-Zaragoza Jorge
Iñesta José M.
Rizo David
Ríos-Vila Antonio
Thomae Martha E.
Publication venue: 'Modern Language Association'
Publication date: 01/01/2020
Field of study

In this paper, we apply machine translation techniques to solve one of the central problems in the field of optical music recognition: extracting the semantics of a sequence of music characters. So far, this problem has been approached through heuristics and grammars, which are not generalizable solutions. We borrowed the seq2seq model and the attention mechanism from machine translation to address this issue. Given its example-based learning, the model proposed is meant to apply to different notations provided there is enough training data. The model was tested on the PrIMuS dataset of common Western music notation incipits. Its performance was satisfactory for the vast majority of examples, flawlessly extracting the musical meaning of 85% of the incipits in the test set—mapping correctly series of accidentals into key signatures, pairs of digits into time signatures, combinations of digits and rests into multi-measure rests, detecting implicit accidentals, etc.This work is supported by the Spanish Ministry HISPAMUS project TIN2017-86576-R, partially funded by the EU, and by CIRMMT’s Inter-Centre Research Exchange Funding and McGill’s Graduate Mobility Award

Repositorio Institucional de la Universidad de Alicante

A global method for music symbol recognition in typeset music sheets

Author: Rossant Florence
Publication venue: 'Elsevier BV'
Publication date: 01/01/2002
Field of study

International audienceThis paper presents an optical music recognition (OMR) system that can automatically recognize the main musical symbols of a scanned paper-based music score. Two major stages are distinguished: the first one, using low-level pre-processing, detects the isolated objects and outputs some hypotheses about them; the second one has to take the final correct decision, through high-level processing including contextual information and music writing rules. This article exposes both stages of the method: after explaining in detail the first one, the symbol analysis process, it shows through first experiments that its outputs can efficiently be used as inputs for a high-level decision process

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

Author: Gu Chunhui
Li Yeqing
Malik Jitendra
Pantofaru Caroline
Ricco Susanna
Ross David A.
Schmid Cordelia
Sukthankar Rahul
Sun Chen
Toderici George
Vijayanarasimhan Sudheendra
Vondrick Carl
Publication venue
Publication date: 30/04/2018
Field of study

This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual actions, rather than composite actions; (2) precise spatio-temporal annotations with possibly multiple annotations for each person; (3) exhaustive annotation of these atomic actions over 15-minute video clips; (4) people temporally linked across consecutive segments; and (5) using movies to gather a varied set of action representations. This departs from existing datasets for spatio-temporal action recognition, which typically provide sparse annotations for composite actions in short video clips. We will release the dataset publicly. AVA, with its realistic scene and action complexity, exposes the intrinsic difficulty of action recognition. To benchmark this, we present a novel approach for action localization that builds upon the current state-of-the-art methods, and demonstrates better performance on JHMDB and UCF101-24 categories. While setting a new state of the art on existing datasets, the overall results on AVA are low at 15.6% mAP, underscoring the need for developing new approaches for video understanding.Comment: To appear in CVPR 2018. Check dataset page https://research.google.com/ava/ for detail

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Deep watershed detector for music object recognition

Author: Elezi Ismail
Schmidhuber Jürgen
Stadelmann Thilo
Tuggener Lukas
Publication venue: Society for Music Information Retrieval
Publication date: 01/01/2018
Field of study

Optical Music Recognition (OMR) is an important and challenging area within music information retrieval, the accurate detection of music symbols in digital images is a core functionality of any OMR pipeline. In this paper, we introduce a novel object detection method, based on synthetic energy maps and the watershed transform, called Deep Watershed Detector (DWD). Our method is specifically tailored to deal with high resolution images that contain a large number of very small objects and is therefore able to process full pages of written music. We present state-of-the-art detection results of common music symbols and show DWD’s ability to work with synthetic scores equally well as on handwritten music

ZHAW digitalcollection