249 research outputs found

    Recognition of pen-based music notation with finite-state machines

    Get PDF
    This work presents a statistical model to recognize pen-based music compositions using stroke recognition algorithms and finite-state machines. The series of strokes received as input is mapped onto a stochastic representation, which is combined with a formal language that describes musical symbols in terms of stroke primitives. Then, a Probabilistic Finite-State Automaton is obtained, which defines probabilities over the set of musical sequences. This model is eventually crossed with a semantic language to avoid sequences that does not make musical sense. Finally, a decoding strategy is applied in order to output a hypothesis about the musical sequence actually written. Comprehensive experimentation with several decoding algorithms, stroke similarity measures and probability density estimators are tested and evaluated following different metrics of interest. Results found have shown the goodness of the proposed model, obtaining competitive performances in all metrics and scenarios considered.This work was supported by the Spanish Ministerio de Educación, Cultura y Deporte through a FPU Fellowship (Ref. AP2012–0939) and the Spanish Ministerio de Economía y Competitividad through the TIMuL Project (No. TIN2013-48152-C2-1-R, supported by UE FEDER funds)

    Handwritten Music Recognition for Mensural notation with convolutional recurrent neural networks

    Get PDF
    [EN] Optical Music Recognition is the technology that allows computers to read music notation, which is also referred to as Handwritten Music Recognition when it is applied over handwritten notation. This technology aims at efficiently transcribing written music into a representation that can be further processed by a computer. This is of special interest to transcribe the large amount of music written in early notations, such as the Mensural notation, since they represent largely unexplored heritage for the musicological community. Traditional approaches to this problem are based on complex strategies with many explicit rules that only work for one particular type of manuscript. Machine learning approaches offer the promise of generalizable solutions, based on learning from just labelled examples. However, previous research has not achieved sufficiently acceptable results for handwritten Mensural notation. In this work we propose the use of deep neural networks, namely convolutional recurrent neural networks, which have proved effective in other similar domains such as handwritten text recognition. Our experimental results achieve, for the first time, recognition results that can be considered effective for transcribing handwritten Mensural notation, decreasing the symbol-level error rate of previous approaches from 25.7% to 7.0%. (C) 2019 Elsevier B.V. All rights reserved.First author thanks the support from the Spanish Ministry "HISPAMUS" project (TIN2017-86576-R), partially funded by the EU. The other authors were supported by the European Union's H2020 grant "Recognition and Enrichment of Archival Documents" (Ref. 674943), by the BBVA Foundacion through the 2017-2018 and 2018-2019 Digital Humanities research grants "Carabela" and "HistWeather - Dos Siglos de Datos Cilmaticos", and by EU JPICH project "HOME - History Of Medieval Europe"(Spanish PEICTI Ref. PCI2018-093122).Calvo-Zaragoza, J.; Toselli, AH.; Vidal, E. (2019). Handwritten Music Recognition for Mensural notation with convolutional recurrent neural networks. Pattern Recognition Letters. 128:115-121. https://doi.org/10.1016/j.patrec.2019.08.021S11512112

    A selectional auto-encoder approach for document image binarization

    Get PDF
    Binarization plays a key role in the automatic information retrieval from document images. This process is usually performed in the first stages of document analysis systems, and serves as a basis for subsequent steps. Hence it has to be robust in order to allow the full analysis workflow to be successful. Several methods for document image binarization have been proposed so far, most of which are based on hand-crafted image processing strategies. Recently, Convolutional Neural Networks have shown an amazing performance in many disparate duties related to computer vision. In this paper we discuss the use of convolutional auto-encoders devoted to learning an end-to-end map from an input image to its selectional output, in which activations indicate the likelihood of pixels to be either foreground or background. Once trained, documents can therefore be binarized by parsing them through the model and applying a global threshold. This approach has proven to outperform existing binarization strategies in a number of document types.This work was partially supported by the Social Sciences and Humanities Research Council of Canada, the Spanish Ministerio de Ciencia, Innovación y Universidades through Juan de la Cierva - Formación grant (Ref. FJCI-2016-27873), and the Universidad de Alicante through grant GRE-16-04

    Proceedings of the 4th International Workshop on Reading Music Systems

    Full text link
    The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 4th International Workshop on Reading Music Systems, held online on Nov. 18th 2022.Comment: Proceedings edited by Jorge Calvo-Zaragoza, Alexander Pacha and Elona Shatr

    Improving classification using a Confidence Matrix based on weak classifiers applied to OCR

    Get PDF
    This paper proposes a new feature representation method based on the construction of a Confidence Matrix (CM). This representation consists of posterior probability values provided by several weak classifiers, each one trained and used in different sets of features from the original sample. The CM allows the final classifier to abstract itself from discovering underlying groups of features. In this work the CM is applied to isolated character image recognition, for which several set of features can be extracted from each sample. Experimentation has shown that the use of CM permits a significant improvement in accuracy in most cases, while the others remain the same. The results were obtained after experimenting with four well-known corpora, using evolved meta-classifiers with the k-Nearest Neighbor rule as a weak classifier and by applying statistical significance tests.This work was partially supported by the Spanish CICyT through the project TIN2013-48152-C2-1-R, the Consejería de Educación de la Comunidad Valenciana through Project PROMETEO/2012/017 and a FPU fellowship (AP2012-0939) from the Spanish Ministerio de Educación Cultura y Deporte

    Recognition of online handwritten music symbols

    Get PDF
    Paper submitted to MML 2013, 6th International Workshop on Machine Learning and Music, Prague, September 23, 2013.An effective way of digitizing a new musical composition is to use an e-pen and tablet application in which the user's pen strokes are recognized online and the digital score is created with the sole effort of the composition itself. This work aims to be a starting point for research on the recognition of online handwritten music notation. To this end, different alternatives within the two modalities of recognition resulting from this data are presented: online recognition, which uses the strokes marked by a pen, and offline recognition, which uses the image generated after drawing the symbol. A comparative experiment with common machine learning algorithms over a dataset of 3800 samples and 32 different music symbols is presented. Results show that samples of the actual user are needed if good classification rates are pursued. Moreover, algorithms using the online data, on average, achieve better classification results than the others

    Understanding Optical Music Recognition

    Get PDF
    For over 50 years, researchers have been trying to teach computers to read music notation, referred to as Optical Music Recognition (OMR). However, this field is still difficult to access for new researchers, especially those without a significant musical background: Few introductory materials are available, and, furthermore, the field has struggled with defining itself and building a shared terminology. In this work, we address these shortcomings by (1) providing a robust definition of OMR and its relationship to related fields, (2) analyzing how OMR inverts the music encoding process to recover the musical notation and the musical semantics from documents, and (3) proposing a taxonomy of OMR, with most notably a novel taxonomy of applications. Additionally, we discuss how deep learning affects modern OMR research, as opposed to the traditional pipeline. Based on this work, the reader should be able to attain a basic understanding of OMR: its objectives, its inherent structure, its relationship to other fields, the state of the art, and the research opportunities it affords

    Incremental Unsupervised Domain-Adversarial Training of Neural Networks

    Get PDF
    In the context of supervised statistical learning, it is typically assumed that the training set comes from the same distribution that draws the test samples. When this is not the case, the behavior of the learned model is unpredictable and becomes dependent upon the degree of similarity between the distribution of the training set and the distribution of the test set. One of the research topics that investigates this scenario is referred to as domain adaptation. Deep neural networks brought dramatic advances in pattern recognition and that is why there have been many attempts to provide good domain adaptation algorithms for these models. Here we take a different avenue and approach the problem from an incremental point of view, where the model is adapted to the new domain iteratively. We make use of an existing unsupervised domain-adaptation algorithm to identify the target samples on which there is greater confidence about their true label. The output of the model is analyzed in different ways to determine the candidate samples. The selected set is then added to the source training set by considering the labels provided by the network as ground truth, and the process is repeated until all target samples are labelled. Our results report a clear improvement with respect to the non-incremental case in several datasets, also outperforming other state-of-the-art domain adaptation algorithms.Comment: 26 pages, 7 figure

    Deep Neural Networks for Document Processing of Music Score Images

    Get PDF
    [EN] There is an increasing interest in the automatic digitization of medieval music documents. Despite efforts in this field, the detection of the different layers of information on these documents still poses difficulties. The use of Deep Neural Networks techniques has reported outstanding results in many areas related to computer vision. Consequently, in this paper, we study the so-called Convolutional Neural Networks (CNN) for performing the automatic document processing of music score images. This process is focused on layering the image into its constituent parts (namely, background, staff lines, music notes, and text) by training a classifier with examples of these parts. A comprehensive experimentation in terms of the configuration of the networks was carried out, which illustrates interesting results as regards to both the efficiency and effectiveness of these models. In addition, a cross-manuscript adaptation experiment was presented in which the networks are evaluated on a different manuscript from the one they were trained. The results suggest that the CNN is capable of adapting its knowledge, and so starting from a pre-trained CNN reduces (or eliminates) the need for new labeled data.This work was supported by the Social Sciences and Humanities Research Council of Canada, and Universidad de Alicante through grant GRE-16-04.Calvo-Zaragoza, J.; Castellanos, F.; Vigliensoni, G.; Fujinaga, I. (2018). Deep Neural Networks for Document Processing of Music Score Images. Applied Sciences. 8(5). https://doi.org/10.3390/app8050654S85Bainbridge, D., & Bell, T. (2001). Computers and the Humanities, 35(2), 95-121. doi:10.1023/a:1002485918032Byrd, D., & Simonsen, J. G. (2015). Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images. Journal of New Music Research, 44(3), 169-195. doi:10.1080/09298215.2015.1045424LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. doi:10.1038/nature14539Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marcal, A. R. S., Guedes, C., & Cardoso, J. S. (2012). Optical music recognition: state-of-the-art and open issues. International Journal of Multimedia Information Retrieval, 1(3), 173-190. doi:10.1007/s13735-012-0004-6Louloudis, G., Gatos, B., Pratikakis, I., & Halatsis, C. (2008). Text line detection in handwritten documents. Pattern Recognition, 41(12), 3758-3772. doi:10.1016/j.patcog.2008.05.011Montagner, I. S., Hirata, N. S. T., & Hirata, R. (2017). Staff removal using image operator learning. Pattern Recognition, 63, 310-320. doi:10.1016/j.patcog.2016.10.002Calvo-Zaragoza, J., Micó, L., & Oncina, J. (2016). Music staff removal with supervised pixel classification. International Journal on Document Analysis and Recognition (IJDAR), 19(3), 211-219. doi:10.1007/s10032-016-0266-2Calvo-Zaragoza, J., Pertusa, A., & Oncina, J. (2017). Staff-line detection and removal using a convolutional neural network. Machine Vision and Applications, 28(5-6), 665-674. doi:10.1007/s00138-017-0844-4Shelhamer, E., Long, J., & Darrell, T. (2017). Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 640-651. doi:10.1109/tpami.2016.2572683Kato, Z. (2011). Markov Random Fields in Image Segmentation. Foundations and Trends® in Signal Processing, 5(1-2), 1-155. doi:10.1561/2000000035Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. doi:10.1109/5.72679
    corecore