68 research outputs found
Generation of Hypergraphs from the N-Best Parsing of 2D-Probabilistic Context-Free Grammars for Mathematical Expression Recognition
[EN] We consider hypergraphs as a tool obtained with bidimensional Probabilistic Context-Free Grammars to compactly represent the result of the n-best parse trees for an input image that represents a mathematical expression. More specifically, in this paper we propose: i) an algorithm to compute the N-best parse trees from a 2D-PCFGs, ii) an algorithm to represent the n-best parse trees using a compact representation in the form of hypergraphs, and iii) a formal framework for the development of inference algorithms (inside and outside) and normalization strategies of hypergraphs.This work has been partially supported by the Ministerio de Ciencia y Tecnolog ' ia under the grant TIN2017-91452-EXP (IBEM) and by the Generalitat Valenciana under the grant PROMETEO/2019/121 (DeepPattern).Noya, E.; Sánchez Peiró, JA.; Benedí Ruiz, JM. (2021). Generation of Hypergraphs from the N-Best Parsing of 2D-Probabilistic Context-Free Grammars for Mathematical Expression Recognition. IEEE. 5696-5703. https://doi.org/10.1109/ICPR48806.2021.94122735696570
Una propuesta integral para desarrollo de proyectos en un curso de Compiladores con una metodología de aprendizaje basada en proyectos
En este trabajo se presenta una estrategia integral para el desarrollo y la evaluación de proyectos en una asignatura de Compiladores. En esta asignatura se ha optado por emplear una metodología activa basada en proyectos. Esta aproximación posee ventajas incuestionables; sin embargo, su puesta en marcha presenta dos importantes inconvenientes: sobrecarga de trabajo, tanto para los alumnos como para los profesores; y evaluación justa y realista del proyecto, tanto individual como grupal. Para mitigar el primer problema hemos propuesto un entorno de desarrollo de proyectos, de libre disposición y portable. En cuanto al sistema de evaluación hemos propuesto: un modelo de simulación de usuario, para permitir una evaluación realista del desempeño del proyecto; una prueba práctica individual, para considerar el trabajo individual; y un conjunto de actividades de seguimiento, para considerar el trabajo continuo en el desarrollo del proyecto. Esta estrategia se ha puesto en marcha en los últimos años, y el análisis de sus resultados parece avalar su implantación.This paper presents a comprehensive strategy for the development and evaluation of projects in a course of Compilers. In this course we have chosen to use an active methodology based on projects. This approach has unquestionable advantages. However, its implementation has two major drawbacks: work overload, both for students and teachers; and fair and realistic evaluation of the project, both individual and group. In order to mitigate the first problem we have proposed a project development environment, freely available and portable. Regarding the evaluation system we have proposed: a user simulation model, to allow a global evaluation of the final project performance; an individual practical test, to take into account the individual work of the components of the same team; and a set of monitoring activities, to consider continued work on the project. This strategy has been launched in recent years, and the analysis of its results seems to support its implementation
IMEGE: Image-based Mathematical Expression Global Error
Mathematical expression recognition is an active research eld that is related to document image analysis and typesetting. Several approaches have been proposed to tackle this problem, and automatic methods for performance evaluation are required. Mathematical expressions are usually represented as a coded string like LATEX or MathML for evaluation purpose. This representation has ambiguity problems given that the same expression can be coded in several ways. For that reason, the proposed approaches in the past either manually analyzed recognition results or they reported partial errors as symbol error rate. In this study,
we present a novel global performance evaluation measure for mathematical expression based on image matching. In this way, using an image representation solves the representation ambiguity as well as human beings do. The proposed evaluation method is a global error measure that also provides local information about the recognition result.Álvaro Muñoz, F.; Sánchez Peiró, JA.; Benedí Ruiz, JM. (2011). IMEGE: Image-based Mathematical Expression Global Error. http://hdl.handle.net/10251/1308
Corpus based learning of stochastic, context-free grammars combined with Hidden Markov Models for tRNA modelling
[EN] In this paper, a new method for modelling tRNA secondary structures is presented. This method is based on the combination of stochastic context-free grammars (SCFG) and Hidden Markov Models (HMM). HMM are used to capture the local relations in the loops of the molecule (nonstructured regions) and SCFG are used to capture the long term relations between nucleotides of the arms (structured regions). Given annotated public databases, the HMM and SCFG models are learned by means of automatic inductive learning methods. Two SCFG learning methods have been explored. Both of them take advantage of the structural information associated with the training sequences: one of them is based on a stochastic version of the Sakakibara algorithm and the other one is based on a Corpus based algorithm. A final model is then obtained by merging of the HMM of the nonstructured regions and the SCFG of the structured regions. Finally, the performed experiments on the tRNA sequence corpus and the non-tRNA sequence corpus give significant results. Comparative experiments with another published method are also presented.We would like to thank Diego Linares and Joan Andreu Sanchez for answering all our questions about SCFG, as well as Satoshi Sekine for his evaluation software. We would also like to thank the Ministerio de Sanidad y Consumo of Spain for the grants to the INBIOMED consortium.García Gómez, JM.; Benedí Ruiz, JM.; Vicente Robledo, J.; Robles Viejo, M. (2005). Corpus based learning of stochastic, context-free grammars combined with Hidden Markov Models for tRNA modelling. International Journal of Bioinformatics Research and Applications. 1(3):305-318. doi:10.1504/IJBRA.2005.007908S3053181
An integrated grammar-based approach for mathematical expression recognition
This is the author’s version of a work that was accepted for publication in Pattern Recognition. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Pattern Recognition 51 (2016) 135–147. DOI 10.1016/j.patcog.2015.09.013.Automatic recognition of mathematical expressions is a challenging pattern recognition problem since there are many ambiguities at different levels. On the one hand, the recognition of the symbols of the mathematical expression. On the other hand, the detection of the two-dimensional structure that relates the symbols and represents the math expression. These problems are closely related since symbol recognition is influenced by the structure of the expression, while the structure strongly depends on the symbols that are recognized. For these reasons, we present an integrated approach that combines several stochastic sources of information and is able to globally determine the most likely expression. This way, symbol segmentation, symbol recognition and structural analysis are simultaneously optimized. In this paper we define the statistical framework of a model based on two-dimensional grammars and its associated parsing algorithm. Since the search space is too large, restrictions are introduced for making the search feasible. We have developed a system that implements this approach and we report results on the large public dataset of the CROHME international competition. This approach significantly outperforms other proposals and was awarded best system using only the training dataset of the competition. (C) 2015 Elsevier Ltd. All rights reserved.This work was partially supported by the Spanish MINECO under the STraDA research project (TIN2012-37475-C02-01) and the FPU Grant (AP2009-4363).Álvaro Muñoz, F.; Sánchez Peiró, JA.; Benedí Ruiz, JM. (2016). An integrated grammar-based approach for mathematical expression recognition. Pattern Recognition. 51:135-147. https://doi.org/10.1016/j.patcog.2015.09.013S1351475
Una propuesta para la evaluación de proyectos en un curso de Compiladores con una metodología de aprendizaje basada en proyectos
[EN] In this work, a evaluation system for compilation projects in a course on
programming languages and compilers is proposed. In this course we have
chosen an active learning methodology aimed at the implementation of
a compiler project. Our evaluation system meets the following objectives:
to allow a global evaluation of the final project performance; to take into
account the individual work of the components of the same team; and to
consider continued work on the project. To achieve these objectives, we
have proposed: a user simulation model; an individual practical test; and a
set of monitoring activities. Finally, we also present a statistical analysis
of the results of the last five years. The results of this analysis seem to
support the proposed evaluation system.[ES] En este trabajo se propone un sistema integral de evaluación de proyectos de compilación para la asignatura de “Lenguajes de Programación y Procesadores del Lenguaje”. En esta asignatura hemos optado por una metodología activa orientada a la realización de un proyecto. El sistema de evaluación propuesto atiende a los siguientes objetivos: permitir una evaluación global del desempeño final del proyecto; tener en cuenta el trabajo individual de los componentes de un mismo equipo; y considerar el trabajo continuo en el desarrollo del proyecto. Para ello hemos propuesto: un modelo de simulación de usuario; una prueba práctica individual; y un conjunto de actividades de seguimiento. Finalmente, también presentamos un análisis estadístico de los resultados de los últimos cinco años. Los resultados de este análisis parecen avalar el sistema de evaluación propuesto.Benedí Ruiz, JM.; Vivancos Rubio, E. (2019). Una propuesta para la evaluación de proyectos en un curso de Compiladores con una metodología de aprendizaje basada en proyectos. En IN-RED 2019. V Congreso de Innovación Educativa y Docencia en Red. Editorial Universitat Politècnica de València. 906-917. https://doi.org/10.4995/INRED2019.2019.10459OCS90691
Un entorno para el desarrollo de proyectos en la enseñanza activa de un curso de Compiladores
[EN] This paper describes an experience in a course on Programming languages
and compilers. In this course we have chosen to an active learning methodology
aimed at the implementation of a compiler project. In order to
minimize the huge effort that teachers must invest in the preparation of a
new project every year, a development environment of compile projects is
proposed. The goal of this development environment is to greatly simplify
the process of developing new projects[ES] En este trabajo se describe una experiencia que se lleva a cabo en la asignatura de “Lenguajes de Programación y Procesadores del Lenguaje”. En esta asignatura hemos optado por una metodología activa orientada a la realización de un proyecto. Para minimizar el enorme esfuerzo que el profesorado debe invertir en la preparación de un nuevo proyecto cada año, en este trabajo se propone un entorno de desarrollo de proyectos de compilación que simplifica enormemente el proceso de elaboración de nuevos proyectos.Benedí Ruiz, JM.; Vivancos Rubio, E. (2016). Un entorno para el desarrollo de proyectos en la enseñanza activa de un curso de Compiladores. En In-Red 2016. II Congreso nacional de innovación educativa y docencia en red. Editorial Universitat Politècnica de València. https://doi.org/10.4995/INRED2016.2016.4368OC
The IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions
[EN] Searching for information in printed scientific documents is a challenging problem that has recently received special attention from the Pattern Recognition research community. Mathematical expressions are complex elements that appear in scientific documents, and developing techniques for locating and recognizing them requires the preparation of datasets that can be used as benchmarks. Most current techniques for dealing with mathematical expressions are based on Machine Learning techniques which require a large amount of annotated data. These datasets must be prepared with ground-truth information for automatic training and testing. However, preparing large datasets with ground-truth is a very expensive and time-consuming task. This paper introduces the IBEM dataset, consisting of scientific documents that have been prepared for mathematical expression recognition and searching. This dataset consists of 600 documents, more than 8200 page images with more than 160000 mathematical expressions. It has been automatically generated from the Image 1 version of the documents and can be enlarged easily. The ground-truth includes the position at the page level and the Image 1 transcript for mathematical expressions both embedded in the text and displayed. This paper also reports a baseline classification experiment with mathematical symbols and a baseline experiment of Mathematical Expression Recognition performed on the IBEM dataset. These experiments aim to provide some benchmarks for comparison purposes so that future users of the IBEM dataset can have a baseline framework.This work has been partially supported by MCIN/AEI/10.13039/50110 0 011033 under the grant PID2020-116813RB-I00; the Generalitat Valenciana under the FPI grant CIACIF/2021/313; and by the support of the Valencian Graduate School and Research Network of Artificial Intelligence.Anitei, D.; Sánchez Peiró, JA.; Benedí Ruiz, JM.; Noya García, E. (2023). The IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions. Pattern Recognition Letters. 172:29-36. https://doi.org/10.1016/j.patrec.2023.05.033293617
Interactive Machine Translation using Hierarchical Translation Model
[EN] Current automatic machine translation systems are not able to generate error-free translations and human intervention is often required to correct their output. Alternatively, an interactive framework that integrates the human knowledge into the translation process has been presented in previous works. Here, we describe a new interactive machine translation approach that is able to work with phrase-based and hierarchical translation models, and integrates error-correction all in a unified statistical framework. In our experiments, our approach outperforms previous interactive translation systems, and achieves estimated effort reductions of as much as 48%
relative over a traditional post-edition system.Work supported by the European Union 7 th Framework Program (FP7/2007-2013) under the CasMaCat project (grans agreement no 287576), by Spanish MICINN under grant TIN2012-31723, and by the Generalitat Valenciana under grant ALMPR (Prometeo/2009/014).González-Rubio, J.; Ortiz-Martínez, D.; Benedí Ruiz, JM.; Casacuberta Nolla, F. (2013). Interactive Machine Translation using Hierarchical Translation Model. Association for Computational Linguistics. 244-254. http://hdl.handle.net/10251/20199924425
Discriminative estimation of probabilistic context-free grammars for mathematical expression recognition and retrieval
[EN] We present a discriminative learning algorithm for the probabilistic estimation of two-dimensional probabilistic context-free grammars (2D-PCFG) for mathematical expressions recognition and retrieval. This algorithm is based on a generalization of the H-criterion as the objective function and the growth transformations as the optimization method. For the development of the discriminative estimation algorithm, the N-best interpretations provided by the 2D-PCFG have been considered. Experimental results are reported on two available datasets: Im2Latex and IBEM. The first experiment compares the proposed discriminative estimation method with the classic Viterbi-based estimation method. The second one studies the performance of the estimated models depending on the length of the mathematical expressions and the number of admissible errors in the metric used.This research has been developed with the support of Grant PID2020-116813RBI00a funded by MCIN/AEI/ 10.13039/501100011033 and FPI grant CIACIF/2021/313 funded by Generalitat Valenciana. Universitat Politecnica de Valencia Grant No. SP20210263Noya García, E.; Benedí Ruiz, JM.; Sánchez Peiró, JA.; Anitei, D. (2023). Discriminative estimation of probabilistic context-free grammars for mathematical expression recognition and retrieval. Pattern Analysis and Applications. 26:1571-1584. https://doi.org/10.1007/s10044-023-01158-81571158426Bahl LR, Jelinek F, Mercer RL (1983) A maximum likelihood approach to continuous speech recognition. IEEE Trans Pattern Anal Machine Intell 5(2):179–190Koehn P (2009) Statistical Machine Translation. Cambridge University Press, ???. https://doi.org/10.1017/CBO9780511815829Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: ICML, vol 2006, pp 369–376. https://doi.org/10.1145/1143844.1143891Marzal A (1993) Cálculo de las k mejores soluciones a problemas de programación dinámica. PhD thesis, Universidad Politécnica de ValenciaJiménez VM, Marzal A (2000) Computation of the N Best Parse Trees for Weighted and Stochastic Context-Free Grammars. In: Advances in Pattern Recognition. Lecture Notes in Computer Science, 1876, pp 183–192 https://doi.org/10.1007/3-540-44522-6_19Ortmanns S, Ney H, Aubert X (1997) A word graph algorithm for large vocabulary continuous speech recognition. Comput Speech Lang 11(1):43–72. https://doi.org/10.1006/csla.1996.0022Noya E, Sánchez JA, Benedí JM (2021) Generation of Hypergraphs from the N-Best Parsing of 2D-Probabilistic Context-Free Grammars for Mathematical Expression Recognition. In: ICPR, pp 5696–5703. https://doi.org/10.1109/ICPR48806.2021.9412273Ueffing N, Och FJ, Ney H (2002) Generation of word graphs in statistical machine translation. In: Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP 2002), pp 156–163. Association for Computational Linguistics, ???. https://doi.org/10.3115/1118693.1118714. https://aclanthology.org/W02-1021Toselli AH, Vidal E, Puigcerver J, Noya-García E (2019) Probabilistic multi-word spotting in handwritten text images. Pattern Anal Appl 22:23–32. https://doi.org/10.1007/s10044-018-0742-zSánchez-Sáez R, Sánchez JA, Benedí JM (2010) Confidence measures for error discrimination in an interactive predictive parsing framework. In: Coling, pp 1220–1228Benedí JM, Sánchez JA (2005) Estimation of stochastic context-free grammars and their use as language models. Comput Speech Lang 19(3):249–274. https://doi.org/10.1016/j.csl.2004.09.001Awal AM, Mouchère H, Viard-Gaudin C (2012) A global learning approach for an online handwritten mathematical expression recognition system. Pattern Recogn Lett 35:68–77. https://doi.org/10.1016/j.patrec.2012.10.024Álvaro F, Sánchez JA, Benedí JM (2016) An Integrated Grammar-based Approach for Mathematical Expression Recognition. Pattern Recogn 51:135–147. https://doi.org/10.1016/j.patcog.2015.09.013Deng Y, Kanervisto A, Ling J, Rush AM (2017) Image-to-markup generation with coarse-to-fine attention. In: Proceedings of the ICML-17, pp 980–989Anitei D, Sánchez JA, Fuentes JM, Paredes R, Benedí JM (2021) ICDAR2021 Competition on mathematical formula detection. In: ICDAR, pp 783–795. https://doi.org/10.1007/978-3-030-86337-1_52Gopalakrishnan PS, Kanevsky D, Nadas A, Nahamoo D (1991) An inequality for rational functions with applications to some statistical estimation problems. IEEE Trans Inf Theory 37(1):107–113. https://doi.org/10.1109/18.61108Maca M, Benedí JM, Sánchez JA (2021) Discriminative Learning for Probabilistic Context-Free Grammars based on Generalized H-Criterion. Preprint arXiv:2103.08656arXiv:2103.08656 [cs.CL]Woodland PC, Povey D (2002) Large scale discriminative training of hidden Markov models for speech recognition. Comput Speech Lang 16(1):25–47. https://doi.org/10.1006/csla.2001.0182Noya E, Benedí JM, Sánchez JA, Anitei D (2022) Discriminative learning of two-dimensional probabilistic context-free grammars for mathematical expression recognition and retrieval. In: IbPRIA, pp 333–347. https://doi.org/10.1007/978-3-031-04881-4_27Zanibbi R, Blostein D (2011) Recognition and Retrieval of Mathematical Expressions. IJDAR 15:331–357. https://doi.org/10.1007/s10032-011-0174-4Huang J, Tan J, Bi N (2020) Overview of mathematical expression recognition. In: Pattern recognition and artificial intelligence, pp 41–54. https://doi.org/10.1007/978-3-030-59830-3_4Mahdavi M, Zanibbi R, Mouchere H, Viard-Gaudin C, Garain U (2019) ICDAR 2019 CROHME + TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection. In: ICDAR, pp 1533–1538. https://doi.org/10.1109/ICDAR.2019.00247Wang DH, Yin F, Wu JW, Yan YP, Huang ZC, Chen GY, Wang Y, Liu CL (2020) ICFHR 2020 Competition on offline recognition and spotting of handwritten mathematical expressions - OffRaSHME. In: ICFHR, pp. 211–215. https://doi.org/10.1109/ICFHR2020.2020.00047Wan Z, Fan K, Wang Q, Zhang S (2019) Recognition of printed mathematical formula symbols based on convolutional neural network. DEStech Transactions on Computer Science and Engineering. https://doi.org/10.12783/dtcse/ica2019/30711Wu J-W, Yin F, Zhang Y-M, Zhang X-Y, Liu C-L (2020) Handwritten mathematical expression recognition via paired adversarial learning. Int J Comput Vis 128:2386–401. https://doi.org/10.1007/s11263-020-01291-5Peng S, Gao L, Yuan K, Tang Z (2021) Image to LaTeX with Graph Neural Network for Mathematical Formula Recognition. In: ICDAR, pp 648–663. https://doi.org/10.1007/978-3-030-86331-9_42Zhao W, Gao L, Yan Z, Peng S, Du L, Zhang Z (2021) Handwritten mathematical expression recognition with bidirectionally trained transformer. In: Document analysis and recognition – ICDAR 2021, pp 570–584. https://doi.org/10.1007/978-3-030-86331-9_37Davila K, Joshi R, Setlur S, Govindaraju V, Zanibbi R (2019) Tangent-V: Math formula image search using line-of-sight graphs, pp 681–695. https://doi.org/10.1007/978-3-030-15712-8_44Zhong W, Zanibbi R (2019) Structural similarity search for formulas using leaf-root paths in operator subtrees, pp 116–129. https://doi.org/10.1007/978-3-030-15712-8_8Mansouri B, Zanibbi R, Oard D (2019) Characterizing searches for mathematical concepts, pp 57–66. https://doi.org/10.1109/JCDL.2019.00019Chou PA (1989) Recognition of equations using a two-dimensional stochastic context-free grammar. In: Visual communications and image processing IV, vol 1199, pp 852–863. https://doi.org/10.1117/12.970095Prša D, Hlaváč V (2007) Mathematical Formulae Recognition Using 2D Grammars. ICDAR 2, 849–853. https://doi.org/10.1109/ICDAR.2007.4377035Lari K, Young SJ (1991) Applications of stochastic context-free grammars using the inside-outside algorithm. Comput Speech Lang 5(3):237–257. https://doi.org/10.1016/0885-2308(91)90009-FNey H (1992) Stochastic grammars and pattern recognition. In: Laface, P., De Mori, R. (eds.) Speech recognition and understanding, pp 319–344. https://doi.org/10.1007/978-3-642-76626-8_34Baum LE, Sell GR (1968) Growth transformation for functions on manifolds. Pac J Math 27(2):211–227Casacuberta F (1996) Growth transformations for probabilistic functions of stochastic grammars. IJPRAI 10(3):183–201. https://doi.org/10.1142/S0218001496000153Gopalakrishnan P, Kanevsky D, Nadas A, Nahamoo D, Picheny M (1988) Decoder selection based on cross-entropies. In: ICASSP-88, vol 1, pp 20–23. https://doi.org/10.1109/ICASSP.1988.196499Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: ACL, pp 311–318. https://doi.org/10.3115/1073083.1073135Suzuki M, Tamari F, Fukuda R, Uchida S, Kanahori T (2003) Infty: an integrated ocr system for mathematical documents, pp 95–104. https://doi.org/10.1145/958220.958239Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI 39–11:2298–2304. https://doi.org/10.1109/TPAMI.2016.2646371Singh S (2018) Teaching machines to code: neural markup generation with visual attention. Preprint arXiv:1802.05415arXiv:1802.05415 [cs.CL
- …