9 research outputs found

    Mathematical Formula Recognition and Transformation to a Linear Format Suitable for Vocalization

    Get PDF
    Students with vision impairment encounter barriers in studying mathematics particularly in higher education levels. They must have an equal chance with sighted students in mathematics subjects. Making mathematics accessible to the vision impaired users is a complicated process. This accessibility can be static or dynamic, in static accessibility the user is presented with a representation of the entire mathematic expression passively such as using Braille, dynamic accessibility allows the user to navigate the mathematical content in accordance with its structure interactively such as audio format [1]. MATHSPEAK is an application that accepts objects described in LaTeX and converts it to a linear or sequential representation suitable for vocalization, describing functions to people with severe vision impairment. MATHSPEAK provides interactive dynamic access to mathematic expressions by rendering them to audio format. This paper describes a method to create plain text from images of mathematical formulae and convert this text to LaTeX which is used in the earlier developed algorithm, “MATHSPEAK”

    Statistical Classification of Spatial Relationships among Mathematical Symbols

    Full text link

    The IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions

    Full text link
    [EN] Searching for information in printed scientific documents is a challenging problem that has recently received special attention from the Pattern Recognition research community. Mathematical expressions are complex elements that appear in scientific documents, and developing techniques for locating and recognizing them requires the preparation of datasets that can be used as benchmarks. Most current techniques for dealing with mathematical expressions are based on Machine Learning techniques which require a large amount of annotated data. These datasets must be prepared with ground-truth information for automatic training and testing. However, preparing large datasets with ground-truth is a very expensive and time-consuming task. This paper introduces the IBEM dataset, consisting of scientific documents that have been prepared for mathematical expression recognition and searching. This dataset consists of 600 documents, more than 8200 page images with more than 160000 mathematical expressions. It has been automatically generated from the Image 1 version of the documents and can be enlarged easily. The ground-truth includes the position at the page level and the Image 1 transcript for mathematical expressions both embedded in the text and displayed. This paper also reports a baseline classification experiment with mathematical symbols and a baseline experiment of Mathematical Expression Recognition performed on the IBEM dataset. These experiments aim to provide some benchmarks for comparison purposes so that future users of the IBEM dataset can have a baseline framework.This work has been partially supported by MCIN/AEI/10.13039/50110 0 011033 under the grant PID2020-116813RB-I00; the Generalitat Valenciana under the FPI grant CIACIF/2021/313; and by the support of the Valencian Graduate School and Research Network of Artificial Intelligence.Anitei, D.; Sánchez Peiró, JA.; Benedí Ruiz, JM.; Noya García, E. (2023). The IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions. Pattern Recognition Letters. 172:29-36. https://doi.org/10.1016/j.patrec.2023.05.033293617

    ICDAR 2021 competition on mathematical formula detection

    Full text link
    [EN] This paper introduces the Competition on Mathematical Formula Detection that was organized for the ICDAR 2021. The main goal of this competition was to provide the researchers and practitioners a common framework to research on this topic. A large dataset was prepared for this contest where the GT was automatically generated and manually reviewed. Fourteen participants submitted their results for this competition and these results show that there is still room for improvement especially for the detection of embedded mathematical expressions.This work has been partially supported by the Ministerio de Ciencia y Tecnologia under the grant TIN2017-91452-EXP (IBEM) and by the Generalitat Valenciana under the grant PROMETEO/2019/121 (DeepPattern).Anitei, D.; Sánchez Peiró, JA.; Fuentes-López, JM.; Paredes Palacios, R.; Benedí Ruiz, JM. (2021). ICDAR 2021 competition on mathematical formula detection. Springer. 783-795. https://doi.org/10.1007/978-3-030-86337-1_52783795Deng, Y., Kanervisto, A., Rush, A.M.: What you get is what you see: a visual markup decompiler. arXiv abs/1609.04938 (2016)Gehrke, J., Ginsparg, P., Kleinberg, J.: Overview of the 2003 KDD cup. SIGKDD Explor. Newsl. (2), 149–151 (2003)Oberdiek, H.: The zref package. https://osl.ugr.es/CTAN/macros/latex/contrib/zref/zref.pdfLi, X., et al.: Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection (2020)Mahdavi, M., Zanibbi, R., MouchÚre, H., Viard-Gaudin, C., Garain, U.: ICDAR 2019 CROHME + TFD: competition on recognition of handwritten mathematical expressions and typeset formula detection. In: International Conference on Document Analysis and Recognition (2019)Ohyama, W., Suzuki, M., Uchida, S.: Detecting mathematical expressions in scientific document images using a U-Net trained on a diverse dataset. IEEE Access 7, 144030–144042 (2019)Phillips, I.: Methodologies for using UW databases for OCR and image understanding systems. In: Proceedings of the SPIE, Document Recognition V, vol. 3305, pp. 112–127 (1998)Pizzini, K., Bonzini, P., Meyering, J., Gordon, A.: GNUsed, a stream editor. https://www.gnu.org/software/sed/manual/sed.pdfSolovyev, R., Wang, W., Gabruseva, T.: Weighted boxes fusion: ensembling boxes from different object detection models. Image Vis. Comput. 107, 104117 (2021)Suzuki, M., Uchida, S., Nomura, A.: A ground-truthed mathematical character and symbol image database. In: Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR 2005), pp. 675–679 (2005)Zanibbi, R., Blostein, D.: Recognition and retrieval of mathematical expressions. Int. J. Doc. Anal. Recogn. 14, 331–357 (2011)Zanibbi, R., Oard, D.W., Agarwal, A., Mansouri, B.: Overview of ARQMath 2020: CLEF lab on answer retrieval for questions on math. In: Arampatzis, A., et al. (eds.) CLEF 2020. LNCS, vol. 12260, pp. 169–193. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58219-7_1

    Query-Driven Global Graph Attention Model for Visual Parsing: Recognizing Handwritten and Typeset Math Formulas

    Get PDF
    We present a new visual parsing method based on standard Convolutional Neural Networks (CNNs) for handwritten and typeset mathematical formulas. The Query-Driven Global Graph Attention (QD-GGA) parser employs multi-task learning, using a single feature representation for locating, classifying, and relating symbols. QD-GGA parses formulas by first constructing a Line-Of-Sight (LOS) graph over the input primitives (e.g handwritten strokes or connected components in images). Second, class distributions for LOS nodes and edges are obtained using query-specific feature filters (i.e., attention) in a single feed-forward pass. This allows end-to-end structure learning using a joint loss over primitive node and edge class distributions. Finally, a Maximum Spanning Tree (MST) is extracted from the weighted graph using Edmonds\u27 Arborescence Algorithm. The model may be run recurrently over the input graph, updating attention to focus on symbols detected in the previous iteration. QD-GGA does not require additional grammar rules and the language model is learned from the sets of symbols/relationships and the statistics over them in the training set. We benchmark our system against both handwritten and typeset state-of-the-art math recognition systems. Our preliminary results show that this is a promising new approach for visual parsing of math formulas. Using recurrent execution, symbol detection is near perfect for both handwritten and typeset formulas: we obtain a symbol f-measure of over 99.4% for both the CROHME (handwritten) and INFTYMCCDB-2 (typeset formula image) datasets. Our method is also much faster in both training and execution than state-of-the-art RNN-based formula parsers. The unlabeled structure detection of QDGGA is competitive with encoder-decoder models, but QD-GGA symbol and relationship classification is weaker. We believe this may be addressed through increased use of spatial features and global context

    A ground-truthed mathematical character and symbol image database

    No full text
    This paper describes the specifications for our ground-truthed mathematical character and symbol image database, called InftyCDB-1. The ground-truth of each character is composed of type, font, quality (touched/broken) and link (relative position), etc. The database includes all the characters and symbols of 467 pages of 30 articles on mathematics, and is organized so that it can be used as word image database or as mathematical formula image database. InftyCDB-1 is a public database that is freely usable for research and development purposes.

    Scientific chart image recognition and interpretation

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore