587,499 research outputs found

    Graphic Symbol Recognition using Graph Based Signature and Bayesian Network Classifier

    Full text link
    We present a new approach for recognition of complex graphic symbols in technical documents. Graphic symbol recognition is a well known challenge in the field of document image analysis and is at heart of most graphic recognition systems. Our method uses structural approach for symbol representation and statistical classifier for symbol recognition. In our system we represent symbols by their graph based signatures: a graphic symbol is vectorized and is converted to an attributed relational graph, which is used for computing a feature vector for the symbol. This signature corresponds to geometry and topology of the symbol. We learn a Bayesian network to encode joint probability distribution of symbol signatures and use it in a supervised learning scenario for graphic symbol recognition. We have evaluated our method on synthetically deformed and degraded images of pre-segmented 2D architectural and electronic symbols from GREC databases and have obtained encouraging recognition rates.Comment: 5 pages, 8 figures, Tenth International Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society, 2009, volume 10, 1325-132

    Foothill: A Quasiconvex Regularization for Edge Computing of Deep Neural Networks

    Full text link
    Deep neural networks (DNNs) have demonstrated success for many supervised learning tasks, ranging from voice recognition, object detection, to image classification. However, their increasing complexity might yield poor generalization error that make them hard to be deployed on edge devices. Quantization is an effective approach to compress DNNs in order to meet these constraints. Using a quasiconvex base function in order to construct a binary quantizer helps training binary neural networks (BNNs) and adding noise to the input data or using a concrete regularization function helps to improve generalization error. Here we introduce foothill function, an infinitely differentiable quasiconvex function. This regularizer is flexible enough to deform towards L1L_1 and L2L_2 penalties. Foothill can be used as a binary quantizer, as a regularizer, or as a loss. In particular, we show this regularizer reduces the accuracy gap between BNNs and their full-precision counterpart for image classification on ImageNet.Comment: Accepted in 16th International Conference of Image Analysis and Recognition (ICIAR 2019

    Multi-task Layout Analysis of Handwritten Musical Scores

    Get PDF
    [EN] Document Layout Analysis (DLA) is a process that must be performed before attempting to recognize the content of handwritten musical scores by a modern automatic or semiautomatic system. DLA should provide the segmentation of the document image into semantically useful region types such as staff, lyrics, etc. In this paper we extend our previous work for DLA of handwritten text documents to also address complex handwritten music scores. This system is able to perform region segmentation, region classification and baseline detection in an integrated manner. Several experiments were performed in two different datasets in order to validate this approach and assess it in different scenarios. Results show high accuracy in such complex manuscripts and very competent computational time, which is a good indicator of the scalability of the method for very large collections.This work was partially supported by the Universitat Politecnica de Valencia under grant FPI-420II/899, a 2017-2018 Digital Humanities research grant of the BBVA Foundation for the project Carabela, the History Of Medieval Europe (HOME) project (Ref.: PCI2018-093122) and through the EU project READ (Horizon-2020 program, grant Ref. 674943). NVIDIA Corporation kindly donated the Titan X GPU used for this research.Quirós, L.; Toselli, AH.; Vidal, E. (2019). Multi-task Layout Analysis of Handwritten Musical Scores. Springer. 123-134. https://doi.org/10.1007/978-3-030-31321-0_11S123134Burgoyne, J.A., Ouyang, Y., Himmelman, T., Devaney, J., Pugin, L., Fujinaga, I.: Lyric extraction and recognition on digital images of early music sources. In: Proceedings of the 10th International Society for Music Information Retrieval Conference, vol. 10, pp. 723–727 (2009)Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Probabilistic music-symbol spotting in handwritten scores. In: 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 558–563, August 2018Calvo-Zaragoza, J., Zhang, K., Saleh, Z., Vigliensoni, G., Fujinaga, I.: Music document layout analysis through machine learning and human feedback. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 02, pp. 23–24, November 2017Calvo-Zaragoza, J., Castellanos, F.J., Vigliensoni, G., Fujinaga, I.: Deep neural networks for document processing of music score images. Appl. Sci. 8(5), 654 (2018). (2076-3417)Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Handwritten music recognition for mensural notation: formulation, data and baseline results. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1081–1086. IEEE (2017)Campos, V.B., Calvo-Zaragoza, J., Toselli, A.H., Ruiz, E.V.: Sheet music statistical layout analysis. In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 313–318. IEEE (2016)Castellanos, F.J., Calvo-Zaragoza, J., Vigliensoni, G., Fujinaga, I.: Document analysis of music score images with selectional auto-encoders. In: 19th International Society for Music Information Retrieval Conference, pp. 256–263 (2018)Grüning, T., Labahn, R., Diem, M., Kleber, F., Fiel, S.: READ-BAD: a new dataset and evaluation scheme for baseline detection in archival documents. CoRR abs/1705.03311 (2017). http://arxiv.org/abs/1705.03311Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations (ICLR) (2015)Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Quirós, L.: Multi-task handwritten document layout analysis. ArXiv e-prints, 1806.08852 (2018). https://arxiv.org/abs/1806.08852Quirós, L., Bosch, V., Serrano, L., Toselli, A.H., Vidal, E.: From HMMs to RNNs: computer-assisted transcription of a handwritten notarial records collection. In: 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 116–121. IEEE, August 2018Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marcal, A.R., Guedes, C., Cardoso, J.S.: Optical music recognition: state-of-the-art and open issues. Int. J. Multimed. Inf. Retrieval 1(3), 173–190 (2012)Sánchez, J.A., Romero, V., Toselli, A.H., Villegas, M., Vidal, E.: ICDAR2017 competition on handwritten text recognition on the READ dataset. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1383–1388. IEEE (2017)Suzuki, S., et al.: Topological structural analysis of digitized binary images by border following. Comput. Vis. Graph. Image Process. 30(1), 32–46 (1985

    Handwriting recognition by using deep learning to extract meaningful features

    Full text link
    [EN] Recent improvements in deep learning techniques show that deep models can extract more meaningful data directly from raw signals than conventional parametrization techniques, making it possible to avoid specific feature extraction in the area of pattern recognition, especially for Computer Vision or Speech tasks. In this work, we directly use raw text line images by feeding them to Convolutional Neural Networks and deep Multilayer Perceptrons for feature extraction in a Handwriting Recognition system. The proposed recognition system, based on Hidden Markov Models that are hybridized with Neural Networks, has been tested with the IAM Database, achieving a considerable improvement.Work partially supported by the Spanish MINECO and FEDER founds under project TIN2017-85854-C4-2-R.Pastor Pellicer, J.; Castro-Bleda, MJ.; España Boquera, S.; Zamora-Martinez, FJ. (2019). Handwriting recognition by using deep learning to extract meaningful features. AI Communications. 32(2):101-112. https://doi.org/10.3233/AIC-170562S101112322Baldi, P., Brunak, S., Frasconi, P., Soda, G., & Pollastri, G. (1999). Exploiting the past and the future in protein secondary structure prediction. Bioinformatics, 15(11), 937-946. doi:10.1093/bioinformatics/15.11.937LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. doi:10.1038/nature14539Bertolami, R., & Bunke, H. (2008). Hidden Markov model-based ensemble methods for offline handwritten text line recognition. Pattern Recognition, 41(11), 3452-3460. doi:10.1016/j.patcog.2008.04.003Bianne-Bernard, A.-L., Menasri, F., Mohamad, R. A.-H., Mokbel, C., Kermorvant, C., & Likforman-Sulem, L. (2011). Dynamic and Contextual Information in HMM Modeling for Handwritten Word Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(10), 2066-2080. doi:10.1109/tpami.2011.22C.M. Bishop, Neural networks for pattern recognition, Oxford University Press, 1995.T. Bluche, H. Ney and C. Kermorvant, Feature extraction with convolutional neural networks for handwritten word recognition, in: 12th International Conference on Document Analysis and Recognition (ICDAR), 2013, pp. 285–289.T. Bluche, H. Ney and C. Kermorvant, Tandem HMM with convolutional neural network for handwritten word recognition, in: 38th International Conference on Acoustics Speech and Signal Processing (ICASSP), 2013, pp. 2390–2394.T. Bluche, H. Ney and C. Kermorvant, A comparison of sequence-trained deep neural networks and recurrent neural networks optical modeling for handwriting recognition, in: Slsp-2014, 2014, pp. 1–12.H. Bourlard and N. Morgan, Connectionist Speech Recognition – A Hybrid Approach, Series in Engineering and Computer Science, Vol. 247, Kluwer Academic, 1994.Bozinovic, R. M., & Srihari, S. N. (1989). Off-line cursive script word recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(1), 68-83. doi:10.1109/34.23114H. Bunke, Recognition of cursive roman handwriting – past, present and future, in: International Conference on Document Analysis and Recognition, Vol. 1, 2003, pp. 448–459.E. Caillault, C. Viard-Gaudin and A. Rahim Ahmad, MS-TDNN with global discriminant trainings, in: International Conference on Document Analysis and Recognition (ICDAR), 2005, pp. 856–860.P. Doetsch, M. Kozielski and H. Ney, Fast and robust training of recurrent neural networks for offline handwriting recognition, in: 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), 2014, pp. 279–284.P. Dreuw, P. Doetsch, C. Plahl and H. Ney, Hierarchical hybrid MLP/HMM or rather MLP features for a discriminatively trained Gaussian HMM: A comparison for offline handwriting recognition, in: International Conference on Image Processing (ICIP), 2011, pp. 3541–3544.Dreuw, P., Heigold, G., & Ney, H. (2011). Confidence- and margin-based MMI/MPE discriminative training for off-line handwriting recognition. International Journal on Document Analysis and Recognition (IJDAR), 14(3), 273-288. doi:10.1007/s10032-011-0160-xEspaña-Boquera, S., Castro-Bleda, M. J., Gorbe-Moya, J., & Zamora-Martinez, F. (2011). Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(4), 767-779. doi:10.1109/tpami.2010.141A. Graves, S. Fernández, F. Gomez and J. Schmidhuber, Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks, in: 23rd International Conference on Machine Learning (ICML), ACM, 2006, pp. 369–376.A. Graves and N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks, in: 31st International Conference on Machine Learning (ICML), 2014, pp. 1764–1772.Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., & Schmidhuber, J. (2009). A Novel Connectionist System for Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5), 855-868. doi:10.1109/tpami.2008.137A. Graves and J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM networks, in: International Joint Conference on Neural Networks (IJCNN), Vol. 4, 2005, pp. 2047–2052.A. Graves and J. Schmidhuber, Offline handwriting recognition with multidimensional recurrent neural networks, in: Advances in Neural Information Processing Systems (NIPS), 2009, pp. 545–552.F. Grézl, M. Karafiát, S. Kontár and J. Černocký, Probabilistic and bottle-neck features for LVCSR of meetings, in: International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 4, 2007.Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780. doi:10.1162/neco.1997.9.8.1735Impedovo, S. (2014). More than twenty years of advancements on Frontiers in handwriting recognition. Pattern Recognition, 47(3), 916-928. doi:10.1016/j.patcog.2013.05.027Jaeger, S., Manke, S., Reichert, J., & Waibel, A. (2001). Online handwriting recognition: the NPen++ recognizer. International Journal on Document Analysis and Recognition, 3(3), 169-180. doi:10.1007/pl00013559M. Kozielski, P. Doetsch and H. Ney, Improvements in RWTH’s system for off-line handwriting recognition, in: 12th International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2013, pp. 935–939.A. Krizhevsky, I. Sutskever and G.E. Hinton, ImageNet classification with deep convolutional neural networks, in: Advances in Neural Information Processing Systems (NIPS), F. Pereira, C.J.C. Burges, L. Bottou and K.Q. Weinberger, eds, Vol. 25, Curran Associates, Inc., 2012, pp. 1097–1105.Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. doi:10.1109/5.726791M. Liwicki, A. Graves, H. Bunke and J. Schmidhuber, A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks, in: 9th International Conference on Document Analysis and Recognition (ICDAR), 2007, pp. 367–371.Marti, U.-V., & Bunke, H. (2002). The IAM-database: an English sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition, 5(1), 39-46. doi:10.1007/s100320200071S. Marukatat, T. Artieres, R. Gallinari and B. Dorizzi, Sentence recognition through hybrid neuro-Markovian modeling, in: 6th International Conference on Document Analysis and Recognition (ICDAR), 2001, pp. 731–735.F.J. Och, Minimum error rate training in statistical machine translation, in: 41st Annual Meeting on Association for Computational Linguistics, ACL’03, Vol. 1, 2003, pp. 160–167.J. Pastor-Pellicer, S. España-Boquera, M.J. Castro-Bleda and F. Zamora-Martínez, A combined convolutional neural network and dynamic programming approach for text line normalization, in: 13th International Conference on Document Analysis and Recognition (ICDAR), 2015.J. Pastor-Pellicer, S. España-Boquera, F. Zamora-Martínez, M. Zeshan Afzal and M.J. Castro-Bleda, Insights on the use of convolutional neural networks for document image binarization, in: The International Work-Conference on Artificial Neural Networks, Vol. 9095, 2015, pp. 115–126.V. Pham, T. Bluche, C. Kermorvant and J. Louradour, Dropout improves recurrent neural networks for handwriting recognition, in: International Conference on Frontiers in Handwriting Recognition (ICFHR), 2014, pp. 285–290.Plamondon, R., & Srihari, S. N. (2000). Online and off-line handwriting recognition: a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 63-84. doi:10.1109/34.824821Plötz, T., & Fink, G. A. (2009). Markov models for offline handwriting recognition: a survey. International Journal on Document Analysis and Recognition (IJDAR), 12(4), 269-298. doi:10.1007/s10032-009-0098-4A. Poznanski and L. Wolf, CNN-N-gram for HandwritingWord recognition, in: Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2305–2314.Puigcerver, J. (2017). Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition? 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). doi:10.1109/icdar.2017.20L.R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, 1989.Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., … Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3), 211-252. doi:10.1007/s11263-015-0816-yT.N. Sainath, B. Kingsbury and B. Ramabhadran, Auto-encoder bottleneck features using deep belief networks, in: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012, pp. 4153–4156.Sayre, K. M. (1973). Machine recognition of handwritten words: A project report. Pattern Recognition, 5(3), 213-228. doi:10.1016/0031-3203(73)90044-7Schenkel, M., Guyon, I., & Henderson, D. (1995). On-line cursive script recognition using time-delay neural networks and hidden Markov models. Machine Vision and Applications, 8(4), 215-223. doi:10.1007/bf01219589Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673-2681. doi:10.1109/78.650093A.W. Senior and A.J. Robinson, An off-line cursive handwriting recognition system, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, 1998, pp. 309–321.E. Singer and R.P. Lippman, A speech recognizer using radial basis function neural networks in an HMM framework, in: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 1, IEEE, 1992, pp. 629–632.J. Stadermann, A hybrid SVM/HMM acoustic modeling approach to automatic speech recognition, in: International Conference on Spoken Language Processing (ICSLP), 2004.A. Stolcke, SRILM: An extensible language modeling toolkit, in: International Conference on Spoken Language Processing (ICSLP), 2002, pp. 901–904.C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, Going deeper with convolutions, in: Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–12.TOSELLI, A. H., JUAN, A., GONZÁLEZ, J., SALVADOR, I., VIDAL, E., CASACUBERTA, F., … NEY, H. (2004). INTEGRATED HANDWRITING RECOGNITION AND INTERPRETATION USING FINITE-STATE MODELS. International Journal of Pattern Recognition and Artificial Intelligence, 18(04), 519-539. doi:10.1142/s0218001404003344Toselli, A. H., Romero, V., Pastor, M., & Vidal, E. (2010). Multimodal interactive transcription of text images. Pattern Recognition, 43(5), 1814-1825. doi:10.1016/j.patcog.2009.11.019J.M. Vilar, Efficient computation of confidence intervals for word error rates, in: International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2008, pp. 5101–5104.Vinciarelli, A. (2002). A survey on off-line Cursive Word Recognition. Pattern Recognition, 35(7), 1433-1446. doi:10.1016/s0031-3203(01)00129-7Voigtlaender, P., Doetsch, P., & Ney, H. (2016). Handwriting Recognition with Large Multidimensional Long Short-Term Memory Recurrent Neural Networks. 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). doi:10.1109/icfhr.2016.0052E. Wang, Q. Zhang, B. Shen, G. Zhang, X. Lu, Q. Wu and Y. Wang, Intel math kernel library, in: High-Performance Computing on the Intel® Xeon Phi™, Springer, 2014, pp. 167–188.F. Zamora-Martínez et al., April-ANN Toolkit, a Pattern Recognizer in Lua, Artificial Neural Networks Module, 2013, https://github.com/pakozm/ [github.com]april-ann.Zamora-Martínez, F., Frinken, V., España-Boquera, S., Castro-Bleda, M. J., Fischer, A., & Bunke, H. (2014). Neural network language models for off-line handwriting recognition. Pattern Recognition, 47(4), 1642-1652. doi:10.1016/j.patcog.2013.10.020Zeyer, A., Beck, E., Schlüter, R., & Ney, H. (2017). CTC in the Context of Generalized Full-Sum HMM Training. Interspeech 2017. doi:10.21437/interspeech.2017-107
    corecore