32 research outputs found

    A selectional auto-encoder approach for document image binarization

    Get PDF
    Binarization plays a key role in the automatic information retrieval from document images. This process is usually performed in the first stages of document analysis systems, and serves as a basis for subsequent steps. Hence it has to be robust in order to allow the full analysis workflow to be successful. Several methods for document image binarization have been proposed so far, most of which are based on hand-crafted image processing strategies. Recently, Convolutional Neural Networks have shown an amazing performance in many disparate duties related to computer vision. In this paper we discuss the use of convolutional auto-encoders devoted to learning an end-to-end map from an input image to its selectional output, in which activations indicate the likelihood of pixels to be either foreground or background. Once trained, documents can therefore be binarized by parsing them through the model and applying a global threshold. This approach has proven to outperform existing binarization strategies in a number of document types.This work was partially supported by the Social Sciences and Humanities Research Council of Canada, the Spanish Ministerio de Ciencia, Innovación y Universidades through Juan de la Cierva - Formación grant (Ref. FJCI-2016-27873), and the Universidad de Alicante through grant GRE-16-04

    Staff-line removal with selectional auto-encoders

    Get PDF
    Staff-line removal is an important preprocessing stage as regards most Optical Music Recognition systems. The common procedures employed to carry out this task involve image processing techniques. In contrast to these traditional methods, which are based on hand-engineered transformations, the problem can also be approached from a machine learning point of view if representative examples of the task are provided. We propose doing this through the use of a new approach involving auto-encoders, which select the appropriate features of an input feature set (Selectional Auto-Encoders). Within the context of the problem at hand, the model is trained to select those pixels of a given image that belong to a musical symbol, thus removing the lines of the staves. Our results show that the proposed technique is quite competitive and significantly outperforms the other state-of-art strategies considered, particularly when dealing with grayscale input images.This work was partially supported by the Spanish Ministerio de Educación, Cultura y Deporte through a FPU fellowship (AP2012- 0939) and the Spanish Ministerio de Economía y Competitividad through Project TIMuL (No. TIN2013-48152-C2-1-R, supported by UE FEDER funds)

    MureTools. An Optical Music Recognition Supporting System

    Get PDF
    This project has the intention of providing supporting tools in the OMR (Optical Music Recognition) field by easing training and evaluation tasks. Through this whole work, the OMR field along with the relevant processes being carried out with the objective of digitizing and preserving musical pieces and tradition will be presented. Additionally, not only formats and representations will be introduced, but also neural networks models proven to be effective in this field, as well as its relevant concepts and metrics will be included throughout, unveiling also the procedures carried out in order to achieve the proposed objectives, as well as the solutions for issues that arise. Furthermore it will be important to showcase the possibilities of integrating all of these kind of tasks through web microservices and synchronous and asynchronous protocols, using task queuing so processes can take place over time steadily and effectively

    Deep Neural Networks for Document Processing of Music Score Images

    Get PDF
    [EN] There is an increasing interest in the automatic digitization of medieval music documents. Despite efforts in this field, the detection of the different layers of information on these documents still poses difficulties. The use of Deep Neural Networks techniques has reported outstanding results in many areas related to computer vision. Consequently, in this paper, we study the so-called Convolutional Neural Networks (CNN) for performing the automatic document processing of music score images. This process is focused on layering the image into its constituent parts (namely, background, staff lines, music notes, and text) by training a classifier with examples of these parts. A comprehensive experimentation in terms of the configuration of the networks was carried out, which illustrates interesting results as regards to both the efficiency and effectiveness of these models. In addition, a cross-manuscript adaptation experiment was presented in which the networks are evaluated on a different manuscript from the one they were trained. The results suggest that the CNN is capable of adapting its knowledge, and so starting from a pre-trained CNN reduces (or eliminates) the need for new labeled data.This work was supported by the Social Sciences and Humanities Research Council of Canada, and Universidad de Alicante through grant GRE-16-04.Calvo-Zaragoza, J.; Castellanos, F.; Vigliensoni, G.; Fujinaga, I. (2018). Deep Neural Networks for Document Processing of Music Score Images. Applied Sciences. 8(5). https://doi.org/10.3390/app8050654S85Bainbridge, D., & Bell, T. (2001). Computers and the Humanities, 35(2), 95-121. doi:10.1023/a:1002485918032Byrd, D., & Simonsen, J. G. (2015). Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images. Journal of New Music Research, 44(3), 169-195. doi:10.1080/09298215.2015.1045424LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. doi:10.1038/nature14539Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marcal, A. R. S., Guedes, C., & Cardoso, J. S. (2012). Optical music recognition: state-of-the-art and open issues. International Journal of Multimedia Information Retrieval, 1(3), 173-190. doi:10.1007/s13735-012-0004-6Louloudis, G., Gatos, B., Pratikakis, I., & Halatsis, C. (2008). Text line detection in handwritten documents. Pattern Recognition, 41(12), 3758-3772. doi:10.1016/j.patcog.2008.05.011Montagner, I. S., Hirata, N. S. T., & Hirata, R. (2017). Staff removal using image operator learning. Pattern Recognition, 63, 310-320. doi:10.1016/j.patcog.2016.10.002Calvo-Zaragoza, J., Micó, L., & Oncina, J. (2016). Music staff removal with supervised pixel classification. International Journal on Document Analysis and Recognition (IJDAR), 19(3), 211-219. doi:10.1007/s10032-016-0266-2Calvo-Zaragoza, J., Pertusa, A., & Oncina, J. (2017). Staff-line detection and removal using a convolutional neural network. Machine Vision and Applications, 28(5-6), 665-674. doi:10.1007/s00138-017-0844-4Shelhamer, E., Long, J., & Darrell, T. (2017). Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 640-651. doi:10.1109/tpami.2016.2572683Kato, Z. (2011). Markov Random Fields in Image Segmentation. Foundations and Trends® in Signal Processing, 5(1-2), 1-155. doi:10.1561/2000000035Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. doi:10.1109/5.72679

    Optical Music Recognition: State of the Art and Major Challenges

    Get PDF
    Optical Music Recognition (OMR) is concerned with transcribing sheet music into a machine-readable format. The transcribed copy should allow musicians to compose, play and edit music by taking a picture of a music sheet. Complete transcription of sheet music would also enable more efficient archival. OMR facilitates examining sheet music statistically or searching for patterns of notations, thus helping use cases in digital musicology too. Recently, there has been a shift in OMR from using conventional computer vision techniques towards a deep learning approach. In this paper, we review relevant works in OMR, including fundamental methods and significant outcomes, and highlight different stages of the OMR pipeline. These stages often lack standard input and output representation and standardised evaluation. Therefore, comparing different approaches and evaluating the impact of different processing methods can become rather complex. This paper provides recommendations for future work, addressing some of the highlighted issues and represents a position in furthering this important field of research

    Multi-task Layout Analysis of Handwritten Musical Scores

    Get PDF
    [EN] Document Layout Analysis (DLA) is a process that must be performed before attempting to recognize the content of handwritten musical scores by a modern automatic or semiautomatic system. DLA should provide the segmentation of the document image into semantically useful region types such as staff, lyrics, etc. In this paper we extend our previous work for DLA of handwritten text documents to also address complex handwritten music scores. This system is able to perform region segmentation, region classification and baseline detection in an integrated manner. Several experiments were performed in two different datasets in order to validate this approach and assess it in different scenarios. Results show high accuracy in such complex manuscripts and very competent computational time, which is a good indicator of the scalability of the method for very large collections.This work was partially supported by the Universitat Politecnica de Valencia under grant FPI-420II/899, a 2017-2018 Digital Humanities research grant of the BBVA Foundation for the project Carabela, the History Of Medieval Europe (HOME) project (Ref.: PCI2018-093122) and through the EU project READ (Horizon-2020 program, grant Ref. 674943). NVIDIA Corporation kindly donated the Titan X GPU used for this research.Quirós, L.; Toselli, AH.; Vidal, E. (2019). Multi-task Layout Analysis of Handwritten Musical Scores. Springer. 123-134. https://doi.org/10.1007/978-3-030-31321-0_11S123134Burgoyne, J.A., Ouyang, Y., Himmelman, T., Devaney, J., Pugin, L., Fujinaga, I.: Lyric extraction and recognition on digital images of early music sources. In: Proceedings of the 10th International Society for Music Information Retrieval Conference, vol. 10, pp. 723–727 (2009)Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Probabilistic music-symbol spotting in handwritten scores. In: 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 558–563, August 2018Calvo-Zaragoza, J., Zhang, K., Saleh, Z., Vigliensoni, G., Fujinaga, I.: Music document layout analysis through machine learning and human feedback. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 02, pp. 23–24, November 2017Calvo-Zaragoza, J., Castellanos, F.J., Vigliensoni, G., Fujinaga, I.: Deep neural networks for document processing of music score images. Appl. Sci. 8(5), 654 (2018). (2076-3417)Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Handwritten music recognition for mensural notation: formulation, data and baseline results. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1081–1086. IEEE (2017)Campos, V.B., Calvo-Zaragoza, J., Toselli, A.H., Ruiz, E.V.: Sheet music statistical layout analysis. In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 313–318. IEEE (2016)Castellanos, F.J., Calvo-Zaragoza, J., Vigliensoni, G., Fujinaga, I.: Document analysis of music score images with selectional auto-encoders. In: 19th International Society for Music Information Retrieval Conference, pp. 256–263 (2018)Grüning, T., Labahn, R., Diem, M., Kleber, F., Fiel, S.: READ-BAD: a new dataset and evaluation scheme for baseline detection in archival documents. CoRR abs/1705.03311 (2017). http://arxiv.org/abs/1705.03311Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations (ICLR) (2015)Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Quirós, L.: Multi-task handwritten document layout analysis. ArXiv e-prints, 1806.08852 (2018). https://arxiv.org/abs/1806.08852Quirós, L., Bosch, V., Serrano, L., Toselli, A.H., Vidal, E.: From HMMs to RNNs: computer-assisted transcription of a handwritten notarial records collection. In: 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 116–121. IEEE, August 2018Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marcal, A.R., Guedes, C., Cardoso, J.S.: Optical music recognition: state-of-the-art and open issues. Int. J. Multimed. Inf. Retrieval 1(3), 173–190 (2012)Sánchez, J.A., Romero, V., Toselli, A.H., Villegas, M., Vidal, E.: ICDAR2017 competition on handwritten text recognition on the READ dataset. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1383–1388. IEEE (2017)Suzuki, S., et al.: Topological structural analysis of digitized binary images by border following. Comput. Vis. Graph. Image Process. 30(1), 32–46 (1985

    Deep watershed detector for music object recognition

    Get PDF
    Optical Music Recognition (OMR) is an important and challenging area within music information retrieval, the accurate detection of music symbols in digital images is a core functionality of any OMR pipeline. In this paper, we introduce a novel object detection method, based on synthetic energy maps and the watershed transform, called Deep Watershed Detector (DWD). Our method is specifically tailored to deal with high resolution images that contain a large number of very small objects and is therefore able to process full pages of written music. We present state-of-the-art detection results of common music symbols and show DWD’s ability to work with synthetic scores equally well as on handwritten music

    Deep watershed detector for music object recognition

    Get PDF
    Optical Music Recognition (OMR) is an important and challenging area within music information retrieval, the accurate detection of music symbols in digital images is a core functionality of any OMR pipeline. In this paper, we introduce a novel object detection method, based on synthetic energy maps and the watershed transform, called Deep Watershed Detector (DWD). Our method is specifically tailored to deal with high resolution images that contain a large number of very small objects and is therefore able to process full pages of written music. We present state-of-the-art detection results of common music symbols and show DWD’s ability to work with synthetic scores equally well as on handwritten music

    Restoration of deteriorated text sections in ancient document images using atri-level semi-adaptive thresholding technique

    Get PDF
    The proposed research aims to restore deteriorated text sections that are affected by stain markings, ink seepages and document ageing in ancient document photographs, as these challenges confront document enhancement. A tri-level semi-adaptive thresholding technique is developed in this paper to overcome the issues. The primary focus, however, is on removing deteriorations that obscure text sections. The proposed algorithm includes three levels of degradation removal as well as pre- and post-enhancement processes. In level-wise degradation removal, a global thresholding approach is used, whereas, pseudo-colouring uses local thresholding procedures. Experiments on palm leaf and DIBCO document photos reveal a decent performance in removing ink/oil stains whilst retaining obscured text sections. In DIBCO and palm leaf datasets, our system also showed its efficacy in removing common deteriorations such as uneven illumination, show throughs, discolouration and writing marks. The proposed technique directly correlates to other thresholding-based benchmark techniques producing average F-measure and precision of 65.73 and 93% towards DIBCO datasets and 55.24 and 94% towards palm leaf datasets. Subjective analysis shows the robustness of proposed model towards the removal of stains degradations with a qualitative score of 3 towards 45% of samples indicating degradation removal with fairly readable text
    corecore