54 research outputs found

    An integrated grammar-based approach for mathematical expression recognition

    Full text link
    This is the author’s version of a work that was accepted for publication in Pattern Recognition. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Pattern Recognition 51 (2016) 135–147. DOI 10.1016/j.patcog.2015.09.013.Automatic recognition of mathematical expressions is a challenging pattern recognition problem since there are many ambiguities at different levels. On the one hand, the recognition of the symbols of the mathematical expression. On the other hand, the detection of the two-dimensional structure that relates the symbols and represents the math expression. These problems are closely related since symbol recognition is influenced by the structure of the expression, while the structure strongly depends on the symbols that are recognized. For these reasons, we present an integrated approach that combines several stochastic sources of information and is able to globally determine the most likely expression. This way, symbol segmentation, symbol recognition and structural analysis are simultaneously optimized. In this paper we define the statistical framework of a model based on two-dimensional grammars and its associated parsing algorithm. Since the search space is too large, restrictions are introduced for making the search feasible. We have developed a system that implements this approach and we report results on the large public dataset of the CROHME international competition. This approach significantly outperforms other proposals and was awarded best system using only the training dataset of the competition. (C) 2015 Elsevier Ltd. All rights reserved.This work was partially supported by the Spanish MINECO under the STraDA research project (TIN2012-37475-C02-01) and the FPU Grant (AP2009-4363).Álvaro Muñoz, F.; Sánchez Peiró, JA.; Benedí Ruiz, JM. (2016). An integrated grammar-based approach for mathematical expression recognition. Pattern Recognition. 51:135-147. https://doi.org/10.1016/j.patcog.2015.09.013S1351475

    Stroke order normalization for improving recognition of online handwritten mathematical expressions

    Get PDF
    We present a technique based on stroke order normalization for improving recognition of online handwritten mathematical expressions (ME). The stroke order dependent system has less time complexity than the stroke order free system, but it must incorporate special grammar rules to cope with stroke order variations. The stroke order normalization technique solves this problem and also the problem of unexpected stroke order variations without increasing the time complexity of ME recognition. In order to normalize stroke order, the X-Y cut method is modified since its original form causes problems when structural components in ME overlap. First, vertically ordered strokes are located by detecting vertical symbols and their upper/lower components, which are treated as MEs and reordered recursively. Second, unordered strokes on the left side of the vertical symbols are reordered as horizontally ordered strokes. Third, the remaining strokes are reordered recursively. The horizontally ordered strokes are reordered from left to right, and the vertically ordered strokes are reordered from top to bottom. Finally, the proposed stroke order normalization is combined with the stroke order dependent ME recognition system. The evaluations on the CROHME 2014 database show that the ME recognition system incorporating the stroke order normalization outperforms all other systems that use only CROHME 2014 for training while the processing time is kept low

    Mathematical Expression Recognition based on Probabilistic Grammars

    Full text link
    [EN] Mathematical notation is well-known and used all over the world. Humankind has evolved from simple methods representing countings to current well-defined math notation able to account for complex problems. Furthermore, mathematical expressions constitute a universal language in scientific fields, and many information resources containing mathematics have been created during the last decades. However, in order to efficiently access all that information, scientific documents have to be digitized or produced directly in electronic formats. Although most people is able to understand and produce mathematical information, introducing math expressions into electronic devices requires learning specific notations or using editors. Automatic recognition of mathematical expressions aims at filling this gap between the knowledge of a person and the input accepted by computers. This way, printed documents containing math expressions could be automatically digitized, and handwriting could be used for direct input of math notation into electronic devices. This thesis is devoted to develop an approach for mathematical expression recognition. In this document we propose an approach for recognizing any type of mathematical expression (printed or handwritten) based on probabilistic grammars. In order to do so, we develop the formal statistical framework such that derives several probability distributions. Along the document, we deal with the definition and estimation of all these probabilistic sources of information. Finally, we define the parsing algorithm that globally computes the most probable mathematical expression for a given input according to the statistical framework. An important point in this study is to provide objective performance evaluation and report results using public data and standard metrics. We inspected the problems of automatic evaluation in this field and looked for the best solutions. We also report several experiments using public databases and we participated in several international competitions. Furthermore, we have released most of the software developed in this thesis as open source. We also explore some of the applications of mathematical expression recognition. In addition to the direct applications of transcription and digitization, we report two important proposals. First, we developed mucaptcha, a method to tell humans and computers apart by means of math handwriting input, which represents a novel application of math expression recognition. Second, we tackled the problem of layout analysis of structured documents using the statistical framework developed in this thesis, because both are two-dimensional problems that can be modeled with probabilistic grammars. The approach developed in this thesis for mathematical expression recognition has obtained good results at different levels. It has produced several scientific publications in international conferences and journals, and has been awarded in international competitions.[ES] La notación matemática es bien conocida y se utiliza en todo el mundo. La humanidad ha evolucionado desde simples métodos para representar cuentas hasta la notación formal actual capaz de modelar problemas complejos. Además, las expresiones matemáticas constituyen un idioma universal en el mundo científico, y se han creado muchos recursos que contienen matemáticas durante las últimas décadas. Sin embargo, para acceder de forma eficiente a toda esa información, los documentos científicos han de ser digitalizados o producidos directamente en formatos electrónicos. Aunque la mayoría de personas es capaz de entender y producir información matemática, introducir expresiones matemáticas en dispositivos electrónicos requiere aprender notaciones especiales o usar editores. El reconocimiento automático de expresiones matemáticas tiene como objetivo llenar ese espacio existente entre el conocimiento de una persona y la entrada que aceptan los ordenadores. De este modo, documentos impresos que contienen fórmulas podrían digitalizarse automáticamente, y la escritura se podría utilizar para introducir directamente notación matemática en dispositivos electrónicos. Esta tesis está centrada en desarrollar un método para reconocer expresiones matemáticas. En este documento proponemos un método para reconocer cualquier tipo de fórmula (impresa o manuscrita) basado en gramáticas probabilísticas. Para ello, desarrollamos el marco estadístico formal que deriva varias distribuciones de probabilidad. A lo largo del documento, abordamos la definición y estimación de todas estas fuentes de información probabilística. Finalmente, definimos el algoritmo que, dada cierta entrada, calcula globalmente la expresión matemática más probable de acuerdo al marco estadístico. Un aspecto importante de este trabajo es proporcionar una evaluación objetiva de los resultados y presentarlos usando datos públicos y medidas estándar. Por ello, estudiamos los problemas de la evaluación automática en este campo y buscamos las mejores soluciones. Asimismo, presentamos diversos experimentos usando bases de datos públicas y hemos participado en varias competiciones internacionales. Además, hemos publicado como código abierto la mayoría del software desarrollado en esta tesis. También hemos explorado algunas de las aplicaciones del reconocimiento de expresiones matemáticas. Además de las aplicaciones directas de transcripción y digitalización, presentamos dos propuestas importantes. En primer lugar, desarrollamos mucaptcha, un método para discriminar entre humanos y ordenadores mediante la escritura de expresiones matemáticas, el cual representa una novedosa aplicación del reconocimiento de fórmulas. En segundo lugar, abordamos el problema de detectar y segmentar la estructura de documentos utilizando el marco estadístico formal desarrollado en esta tesis, dado que ambos son problemas bidimensionales que pueden modelarse con gramáticas probabilísticas. El método desarrollado en esta tesis para reconocer expresiones matemáticas ha obtenido buenos resultados a diferentes niveles. Este trabajo ha producido varias publicaciones en conferencias internacionales y revistas, y ha sido premiado en competiciones internacionales.[CA] La notació matemàtica és ben coneguda i s'utilitza a tot el món. La humanitat ha evolucionat des de simples mètodes per representar comptes fins a la notació formal actual capaç de modelar problemes complexos. A més, les expressions matemàtiques constitueixen un idioma universal al món científic, i s'han creat molts recursos que contenen matemàtiques durant les últimes dècades. No obstant això, per accedir de forma eficient a tota aquesta informació, els documents científics han de ser digitalitzats o produïts directament en formats electrònics. Encara que la majoria de persones és capaç d'entendre i produir informació matemàtica, introduir expressions matemàtiques en dispositius electrònics requereix aprendre notacions especials o usar editors. El reconeixement automàtic d'expressions matemàtiques té per objectiu omplir aquest espai existent entre el coneixement d'una persona i l'entrada que accepten els ordinadors. D'aquesta manera, documents impresos que contenen fórmules podrien digitalitzar-se automàticament, i l'escriptura es podria utilitzar per introduir directament notació matemàtica en dispositius electrònics. Aquesta tesi està centrada en desenvolupar un mètode per reconèixer expressions matemàtiques. En aquest document proposem un mètode per reconèixer qualsevol tipus de fórmula (impresa o manuscrita) basat en gramàtiques probabilístiques. Amb aquesta finalitat, desenvolupem el marc estadístic formal que deriva diverses distribucions de probabilitat. Al llarg del document, abordem la definició i estimació de totes aquestes fonts d'informació probabilística. Finalment, definim l'algorisme que, donada certa entrada, calcula globalment l'expressió matemàtica més probable d'acord al marc estadístic. Un aspecte important d'aquest treball és proporcionar una avaluació objectiva dels resultats i presentar-los usant dades públiques i mesures estàndard. Per això, estudiem els problemes de l'avaluació automàtica en aquest camp i busquem les millors solucions. Així mateix, presentem diversos experiments usant bases de dades públiques i hem participat en diverses competicions internacionals. A més, hem publicat com a codi obert la majoria del software desenvolupat en aquesta tesi. També hem explorat algunes de les aplicacions del reconeixement d'expressions matemàtiques. A més de les aplicacions directes de transcripció i digitalització, presentem dues propostes importants. En primer lloc, desenvolupem mucaptcha, un mètode per discriminar entre humans i ordinadors mitjançant l'escriptura d'expressions matemàtiques, el qual representa una nova aplicació del reconeixement de fórmules. En segon lloc, abordem el problema de detectar i segmentar l'estructura de documents utilitzant el marc estadístic formal desenvolupat en aquesta tesi, donat que ambdós són problemes bidimensionals que poden modelar-se amb gramàtiques probabilístiques. El mètode desenvolupat en aquesta tesi per reconèixer expressions matemàtiques ha obtingut bons resultats a diferents nivells. Aquest treball ha produït diverses publicacions en conferències internacionals i revistes, i ha sigut premiat en competicions internacionals.Álvaro Muñoz, F. (2015). Mathematical Expression Recognition based on Probabilistic Grammars [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/51665TESI

    Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition

    Full text link
    Handwritten mathematical expression recognition is a challenging problem due to the complicated two-dimensional structures, ambiguous handwriting input and variant scales of handwritten math symbols. To settle this problem, we utilize the attention based encoder-decoder model that recognizes mathematical expression images from two-dimensional layouts to one-dimensional LaTeX strings. We improve the encoder by employing densely connected convolutional networks as they can strengthen feature extraction and facilitate gradient propagation especially on a small training set. We also present a novel multi-scale attention model which is employed to deal with the recognition of math symbols in different scales and save the fine-grained details that will be dropped by pooling operations. Validated on the CROHME competition task, the proposed method significantly outperforms the state-of-the-art methods with an expression recognition accuracy of 52.8% on CROHME 2014 and 50.1% on CROHME 2016, by only using the official training dataset

    Query-Driven Global Graph Attention Model for Visual Parsing: Recognizing Handwritten and Typeset Math Formulas

    Get PDF
    We present a new visual parsing method based on standard Convolutional Neural Networks (CNNs) for handwritten and typeset mathematical formulas. The Query-Driven Global Graph Attention (QD-GGA) parser employs multi-task learning, using a single feature representation for locating, classifying, and relating symbols. QD-GGA parses formulas by first constructing a Line-Of-Sight (LOS) graph over the input primitives (e.g handwritten strokes or connected components in images). Second, class distributions for LOS nodes and edges are obtained using query-specific feature filters (i.e., attention) in a single feed-forward pass. This allows end-to-end structure learning using a joint loss over primitive node and edge class distributions. Finally, a Maximum Spanning Tree (MST) is extracted from the weighted graph using Edmonds\u27 Arborescence Algorithm. The model may be run recurrently over the input graph, updating attention to focus on symbols detected in the previous iteration. QD-GGA does not require additional grammar rules and the language model is learned from the sets of symbols/relationships and the statistics over them in the training set. We benchmark our system against both handwritten and typeset state-of-the-art math recognition systems. Our preliminary results show that this is a promising new approach for visual parsing of math formulas. Using recurrent execution, symbol detection is near perfect for both handwritten and typeset formulas: we obtain a symbol f-measure of over 99.4% for both the CROHME (handwritten) and INFTYMCCDB-2 (typeset formula image) datasets. Our method is also much faster in both training and execution than state-of-the-art RNN-based formula parsers. The unlabeled structure detection of QDGGA is competitive with encoder-decoder models, but QD-GGA symbol and relationship classification is weaker. We believe this may be addressed through increased use of spatial features and global context

    WordSup: Exploiting Word Annotations for Character based Text Detection

    Full text link
    Imagery texts are usually organized as a hierarchy of several visual elements, i.e. characters, words, text lines and text blocks. Among these elements, character is the most basic one for various languages such as Western, Chinese, Japanese, mathematical expression and etc. It is natural and convenient to construct a common text detection engine based on character detectors. However, training character detectors requires a vast of location annotated characters, which are expensive to obtain. Actually, the existing real text datasets are mostly annotated in word or line level. To remedy this dilemma, we propose a weakly supervised framework that can utilize word annotations, either in tight quadrangles or the more loose bounding boxes, for character detector training. When applied in scene text detection, we are thus able to train a robust character detector by exploiting word annotations in the rich large-scale real scene text datasets, e.g. ICDAR15 and COCO-text. The character detector acts as a key role in the pipeline of our text detection engine. It achieves the state-of-the-art performance on several challenging scene text detection benchmarks. We also demonstrate the flexibility of our pipeline by various scenarios, including deformed text detection and math expression recognition.Comment: 2017 International Conference on Computer Visio

    Discriminative estimation of probabilistic context-free grammars for mathematical expression recognition and retrieval

    Full text link
    [EN] We present a discriminative learning algorithm for the probabilistic estimation of two-dimensional probabilistic context-free grammars (2D-PCFG) for mathematical expressions recognition and retrieval. This algorithm is based on a generalization of the H-criterion as the objective function and the growth transformations as the optimization method. For the development of the discriminative estimation algorithm, the N-best interpretations provided by the 2D-PCFG have been considered. Experimental results are reported on two available datasets: Im2Latex and IBEM. The first experiment compares the proposed discriminative estimation method with the classic Viterbi-based estimation method. The second one studies the performance of the estimated models depending on the length of the mathematical expressions and the number of admissible errors in the metric used.This research has been developed with the support of Grant PID2020-116813RBI00a funded by MCIN/AEI/ 10.13039/501100011033 and FPI grant CIACIF/2021/313 funded by Generalitat Valenciana. Universitat Politecnica de Valencia Grant No. SP20210263Noya García, E.; Benedí Ruiz, JM.; Sánchez Peiró, JA.; Anitei, D. (2023). Discriminative estimation of probabilistic context-free grammars for mathematical expression recognition and retrieval. Pattern Analysis and Applications. 26:1571-1584. https://doi.org/10.1007/s10044-023-01158-81571158426Bahl LR, Jelinek F, Mercer RL (1983) A maximum likelihood approach to continuous speech recognition. IEEE Trans Pattern Anal Machine Intell 5(2):179–190Koehn P (2009) Statistical Machine Translation. Cambridge University Press, ???. https://doi.org/10.1017/CBO9780511815829Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: ICML, vol 2006, pp 369–376. https://doi.org/10.1145/1143844.1143891Marzal A (1993) Cálculo de las k mejores soluciones a problemas de programación dinámica. PhD thesis, Universidad Politécnica de ValenciaJiménez VM, Marzal A (2000) Computation of the N Best Parse Trees for Weighted and Stochastic Context-Free Grammars. In: Advances in Pattern Recognition. Lecture Notes in Computer Science, 1876, pp 183–192 https://doi.org/10.1007/3-540-44522-6_19Ortmanns S, Ney H, Aubert X (1997) A word graph algorithm for large vocabulary continuous speech recognition. Comput Speech Lang 11(1):43–72. https://doi.org/10.1006/csla.1996.0022Noya E, Sánchez JA, Benedí JM (2021) Generation of Hypergraphs from the N-Best Parsing of 2D-Probabilistic Context-Free Grammars for Mathematical Expression Recognition. In: ICPR, pp 5696–5703. https://doi.org/10.1109/ICPR48806.2021.9412273Ueffing N, Och FJ, Ney H (2002) Generation of word graphs in statistical machine translation. In: Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP 2002), pp 156–163. Association for Computational Linguistics, ???. https://doi.org/10.3115/1118693.1118714. https://aclanthology.org/W02-1021Toselli AH, Vidal E, Puigcerver J, Noya-García E (2019) Probabilistic multi-word spotting in handwritten text images. Pattern Anal Appl 22:23–32. https://doi.org/10.1007/s10044-018-0742-zSánchez-Sáez R, Sánchez JA, Benedí JM (2010) Confidence measures for error discrimination in an interactive predictive parsing framework. In: Coling, pp 1220–1228Benedí JM, Sánchez JA (2005) Estimation of stochastic context-free grammars and their use as language models. Comput Speech Lang 19(3):249–274. https://doi.org/10.1016/j.csl.2004.09.001Awal AM, Mouchère H, Viard-Gaudin C (2012) A global learning approach for an online handwritten mathematical expression recognition system. Pattern Recogn Lett 35:68–77. https://doi.org/10.1016/j.patrec.2012.10.024Álvaro F, Sánchez JA, Benedí JM (2016) An Integrated Grammar-based Approach for Mathematical Expression Recognition. Pattern Recogn 51:135–147. https://doi.org/10.1016/j.patcog.2015.09.013Deng Y, Kanervisto A, Ling J, Rush AM (2017) Image-to-markup generation with coarse-to-fine attention. In: Proceedings of the ICML-17, pp 980–989Anitei D, Sánchez JA, Fuentes JM, Paredes R, Benedí JM (2021) ICDAR2021 Competition on mathematical formula detection. In: ICDAR, pp 783–795. https://doi.org/10.1007/978-3-030-86337-1_52Gopalakrishnan PS, Kanevsky D, Nadas A, Nahamoo D (1991) An inequality for rational functions with applications to some statistical estimation problems. IEEE Trans Inf Theory 37(1):107–113. https://doi.org/10.1109/18.61108Maca M, Benedí JM, Sánchez JA (2021) Discriminative Learning for Probabilistic Context-Free Grammars based on Generalized H-Criterion. Preprint arXiv:2103.08656arXiv:2103.08656 [cs.CL]Woodland PC, Povey D (2002) Large scale discriminative training of hidden Markov models for speech recognition. Comput Speech Lang 16(1):25–47. https://doi.org/10.1006/csla.2001.0182Noya E, Benedí JM, Sánchez JA, Anitei D (2022) Discriminative learning of two-dimensional probabilistic context-free grammars for mathematical expression recognition and retrieval. In: IbPRIA, pp 333–347. https://doi.org/10.1007/978-3-031-04881-4_27Zanibbi R, Blostein D (2011) Recognition and Retrieval of Mathematical Expressions. IJDAR 15:331–357. https://doi.org/10.1007/s10032-011-0174-4Huang J, Tan J, Bi N (2020) Overview of mathematical expression recognition. In: Pattern recognition and artificial intelligence, pp 41–54. https://doi.org/10.1007/978-3-030-59830-3_4Mahdavi M, Zanibbi R, Mouchere H, Viard-Gaudin C, Garain U (2019) ICDAR 2019 CROHME + TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection. In: ICDAR, pp 1533–1538. https://doi.org/10.1109/ICDAR.2019.00247Wang DH, Yin F, Wu JW, Yan YP, Huang ZC, Chen GY, Wang Y, Liu CL (2020) ICFHR 2020 Competition on offline recognition and spotting of handwritten mathematical expressions - OffRaSHME. In: ICFHR, pp. 211–215. https://doi.org/10.1109/ICFHR2020.2020.00047Wan Z, Fan K, Wang Q, Zhang S (2019) Recognition of printed mathematical formula symbols based on convolutional neural network. DEStech Transactions on Computer Science and Engineering. https://doi.org/10.12783/dtcse/ica2019/30711Wu J-W, Yin F, Zhang Y-M, Zhang X-Y, Liu C-L (2020) Handwritten mathematical expression recognition via paired adversarial learning. Int J Comput Vis 128:2386–401. https://doi.org/10.1007/s11263-020-01291-5Peng S, Gao L, Yuan K, Tang Z (2021) Image to LaTeX with Graph Neural Network for Mathematical Formula Recognition. In: ICDAR, pp 648–663. https://doi.org/10.1007/978-3-030-86331-9_42Zhao W, Gao L, Yan Z, Peng S, Du L, Zhang Z (2021) Handwritten mathematical expression recognition with bidirectionally trained transformer. In: Document analysis and recognition – ICDAR 2021, pp 570–584. https://doi.org/10.1007/978-3-030-86331-9_37Davila K, Joshi R, Setlur S, Govindaraju V, Zanibbi R (2019) Tangent-V: Math formula image search using line-of-sight graphs, pp 681–695. https://doi.org/10.1007/978-3-030-15712-8_44Zhong W, Zanibbi R (2019) Structural similarity search for formulas using leaf-root paths in operator subtrees, pp 116–129. https://doi.org/10.1007/978-3-030-15712-8_8Mansouri B, Zanibbi R, Oard D (2019) Characterizing searches for mathematical concepts, pp 57–66. https://doi.org/10.1109/JCDL.2019.00019Chou PA (1989) Recognition of equations using a two-dimensional stochastic context-free grammar. In: Visual communications and image processing IV, vol 1199, pp 852–863. https://doi.org/10.1117/12.970095Pru˚\mathring{u}ša D, Hlaváč V (2007) Mathematical Formulae Recognition Using 2D Grammars. ICDAR 2, 849–853. https://doi.org/10.1109/ICDAR.2007.4377035Lari K, Young SJ (1991) Applications of stochastic context-free grammars using the inside-outside algorithm. Comput Speech Lang 5(3):237–257. https://doi.org/10.1016/0885-2308(91)90009-FNey H (1992) Stochastic grammars and pattern recognition. In: Laface, P., De Mori, R. (eds.) Speech recognition and understanding, pp 319–344. https://doi.org/10.1007/978-3-642-76626-8_34Baum LE, Sell GR (1968) Growth transformation for functions on manifolds. Pac J Math 27(2):211–227Casacuberta F (1996) Growth transformations for probabilistic functions of stochastic grammars. IJPRAI 10(3):183–201. https://doi.org/10.1142/S0218001496000153Gopalakrishnan P, Kanevsky D, Nadas A, Nahamoo D, Picheny M (1988) Decoder selection based on cross-entropies. In: ICASSP-88, vol 1, pp 20–23. https://doi.org/10.1109/ICASSP.1988.196499Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: ACL, pp 311–318. https://doi.org/10.3115/1073083.1073135Suzuki M, Tamari F, Fukuda R, Uchida S, Kanahori T (2003) Infty: an integrated ocr system for mathematical documents, pp 95–104. https://doi.org/10.1145/958220.958239Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI 39–11:2298–2304. https://doi.org/10.1109/TPAMI.2016.2646371Singh S (2018) Teaching machines to code: neural markup generation with visual attention. Preprint arXiv:1802.05415arXiv:1802.05415 [cs.CL
    corecore