67 research outputs found

    Features and Algorithms for Visual Parsing of Handwritten Mathematical Expressions

    Get PDF
    Math expressions are an essential part of scientific documents. Handwritten math expressions recognition can benefit human-computer interaction especially in the education domain and is a critical part of document recognition and analysis. Parsing the spatial arrangement of symbols is an essential part of math expression recognition. A variety of parsing techniques have been developed during the past three decades, and fall into two groups. The first group is graph-based parsing. It selects a path or sub-graph which obeys some rule to form a possible interpretation for the given expression. The second group is grammar driven parsing. Grammars and related parameters are defined manually for different tasks. The time complexity of these two groups parsing is high, and they often impose some strict constraints to reduce the computation. The aim of this thesis is working towards building a straightforward and effective parser with as few constraints as possible. First, we propose using a line of sight graph for representing the layout of strokes and symbols in math expressions. It achieves higher F-score than other graph representations and reduces search space for parsing. Second, we modify the shape context feature with Parzen window density estimation. This feature set works well for symbol segmentation, symbol classification and symbol layout analysis. We get a higher symbol segmentation F-score than other systems on CROHME 2014 dataset. Finally, we develop a Maximum Spanning Tree (MST) based parser using Edmonds\u27 algorithm, which extracts an MST from the directed line of sight graph in two passes: first symbols are segmented, and then symbols and spatial relationship are labeled. The time complexity of our MST-based parsing is lower than the time complexity of CYK parsing with context-free grammars. Also, our MST-based parsing obtains higher structure rate and expression rate than CYK parsing when symbol segmentation is accurate. Correct structure means we get the structure of the symbol layout tree correct, even though the label of the edge in the symbol layout tree might be wrong. The performance of our math expression recognition system with MST-based parsing is competitive on CROHME 2012 and 2014 datasets. For future work, how to incorporate symbol classifier result and correct segmentation error in MST-based parsing needs more research

    Query-Driven Global Graph Attention Model for Visual Parsing: Recognizing Handwritten and Typeset Math Formulas

    Get PDF
    We present a new visual parsing method based on standard Convolutional Neural Networks (CNNs) for handwritten and typeset mathematical formulas. The Query-Driven Global Graph Attention (QD-GGA) parser employs multi-task learning, using a single feature representation for locating, classifying, and relating symbols. QD-GGA parses formulas by first constructing a Line-Of-Sight (LOS) graph over the input primitives (e.g handwritten strokes or connected components in images). Second, class distributions for LOS nodes and edges are obtained using query-specific feature filters (i.e., attention) in a single feed-forward pass. This allows end-to-end structure learning using a joint loss over primitive node and edge class distributions. Finally, a Maximum Spanning Tree (MST) is extracted from the weighted graph using Edmonds\u27 Arborescence Algorithm. The model may be run recurrently over the input graph, updating attention to focus on symbols detected in the previous iteration. QD-GGA does not require additional grammar rules and the language model is learned from the sets of symbols/relationships and the statistics over them in the training set. We benchmark our system against both handwritten and typeset state-of-the-art math recognition systems. Our preliminary results show that this is a promising new approach for visual parsing of math formulas. Using recurrent execution, symbol detection is near perfect for both handwritten and typeset formulas: we obtain a symbol f-measure of over 99.4% for both the CROHME (handwritten) and INFTYMCCDB-2 (typeset formula image) datasets. Our method is also much faster in both training and execution than state-of-the-art RNN-based formula parsers. The unlabeled structure detection of QDGGA is competitive with encoder-decoder models, but QD-GGA symbol and relationship classification is weaker. We believe this may be addressed through increased use of spatial features and global context

    Applying Hierarchical Contextual Parsing with Visual Density and Geometric Features to Typeset Formula Recognition

    Get PDF
    We demonstrate that recognition of scanned typeset mathematical expression images can be done by extracting maximum spanning trees from line of sight graphs weighted using geometric and visual density features. The approach used is hierarchical contextual parsing (HCP): Hierarchical in terms of starting with connected components and building to the symbol level using visual, spatial, and contextual features of connected components. Once connected components have been segmented into symbols, a new set of spatial, visual, and contextual features are extracted. One set of visual features is used for symbol classification, and another for parsing. The features are used in parsing to assign classifications and confidences to edges in a line of sight symbol graph. Layout trees describe expression structure in terms of spatial relations between symbols, such as horizontal, subscript, and superscript. From the weighted graph Edmonds\u27 algorithm is used to extract a maximum spanning tree. Segmentation and parsing are done without using symbol classification information, and symbol classification is done independently of expression structure recognition. The commonality between the recognition processes is the type of features they use, the visual densities. These visual densities are used for shape, spatial, and contextual information. The contextual information is shown to help in segmentation, parsing, and symbol recognition. The hierarchical contextual parsing has been implemented in the Python and Graph-based Online/Offline Recognizer for Math (Pythagor^m) system and tested on the InftyMCCDB-2 dataset. We created InftyMCCDB-2 from InftyCDB-2 as a open source dataset for scanned typeset math expression recognition. In building InftyMCCDB-2 modified formula structure representations were used to better capture the spatial positioning of symbols in the expression structures. Namely, baseline punctuation and symbol accents were moved out of horizontal baselines as their positions are not horizontally aligned with symbols on a writing line. With the transformed spatial layouts and HCP, 95.97% of expressions were parsed correctly when given symbols and 93.95% correctly parsed when requiring symbol segmentation from connected components. Overall HCP reached 90.83% expression recognition rate from connected components

    Augmented incremental recognition of online handwritten mathematical expressions

    Get PDF
    This paper presents an augmented incremental recognition method for online handwritten mathematical expressions (MEs). If an ME is recognized after all strokes are written (batch recognition), the waiting time increases significantly when the ME becomes longer. On the other hand, the pure incremental recognition method recognizes an ME whenever a new single stroke is input. It shortens the waiting time but degrades the recognition rate due to the limited context. Thus, we propose an augmented incremental recognition method that not only maintains the advantage of the two methods but also reduces their weaknesses. The proposed method has two main features: one is to process the latest stroke, and the other is to find the erroneous segmentations and recognitions in the recent strokes and correct them. In the first process, the segmentation and the recognition by Cocke-Younger-Kasami (CYK) algorithm are only executed for the latest stroke. In the second process, all the previous segmentations are updated if they are significantly changed after the latest stroke is input, and then, all the symbols related to the updated segmentations are updated with their recognition scores. These changes are reflected in the CYK table. In addition, the waiting time is further reduced by employing multi-thread processes. Experiments on our dataset and the CROHME datasets show the effectiveness of this augmented incremental recognition method, which not only maintains recognition rate even compared with the batch recognition method but also reduces the waiting time to a very small level

    Structured editing of handwritten mathematics

    Get PDF
    Teaching effectively requires a clear presentation of the material being taught and interaction with the students. Studies have shown that Tablet PCs provide a good technological support for teaching. The aim of the work presented in this thesis is to design a structure editor of handwritten mathematics that explores the facilities provided by Tablet PCs. The editor is made available in the form of a class library that can be used to extend existing tools. The central feature of the library is the definition of structure for handwritten mathematical expressions which allows syntactic manipulation of expressions. This makes it possible to accurately select, copy and apply algebraic rules, while avoiding the introduction of errors. To facilitate structured manipulation, gestures are used to apply manipulation rules and animations that demonstrate the use of these rules are introduced. Also, some experimental features that can improve the user’s experience and the usability of the library are presented. Furthermore, it is described how to integrate the library into existing tools. In particular, Classroom Presenter, a system developed to create interactive presentations using a Tablet PC, is extended and used to demonstrate how the library’s features can be used in some teaching scenarios. Although there are limitations in the current system, tests performed with teachers and students indicate that it can help to improve the experience of teaching and learning mathematics, particularly calculational mathematics

    Visual Structure Editing of Math Formulas

    Get PDF
    Math formulas can be large and complex resulting in correspondingly large and complex LaTeX math strings for expressing them. We design operations to visually edit the typeset LaTeX formulas. The operations are invoked via the formula\u27s control points, which are created as a way to specify an operation associated with the point\u27s location relative to a symbol in the formula. At the control points, formulas can be extended in multiple ways, LaTeX can be inserted locally by typing, an existing formula can be inserted, or part of the formula itself can be moved to that point. Parts of formulas can be selected by clicking on a symbol or dragging a rectangle over an area in the formula, and the subtree for the selection can be replaced, deleted, moved to another point in the formula, or lifted out of the formula into a chip floating above the canvas. Formula chips can be used as arguments to operations, including a set of existing formulas provided in a symbol palette. Operations can be performed either by making a selection, selecting a control point operation, and then specifying an argument, or by dragging an argument to one of the control points in the formula. We perform an online formula editing experiment to examine if these visual editing operations can be used to reduce the time and actions spent in order to make edits to formulas. With 35 participants completing 18 formula editing tasks split between 3 input conditions of LaTeX only, Visual only, or LaTeX and Visual, we find that on average participants spend the least amount of time on the editing tasks when both editing capabilities are available

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Semantic Domains in Akkadian Text

    Get PDF
    The article examines the possibilities offered by language technology for analyzing semantic fields in Akkadian. The corpus of data for our research group is the existing electronic corpora, Open richly annotated cuneiform corpus (ORACC). In addition to more traditional Assyriological methods, the article explores two language technological methods: Pointwise mutual information (PMI) and Word2vec.Peer reviewe

    CyberResearch on the Ancient Near East and Eastern Mediterranean

    Get PDF
    CyberResearch on the Ancient Near East and Neighboring Regions provides case studies on archaeology, objects, cuneiform texts, and online publishing, digital archiving, and preservation. Eleven chapters present a rich array of material, spanning the fifth through the first millennium BCE, from Anatolia, the Levant, Mesopotamia, and Iran. Customized cyber- and general glossaries support readers who lack either a technical background or familiarity with the ancient cultures. Edited by Vanessa Bigot Juloux, Amy Rebecca Gansell, and Alessandro Di Ludovico, this volume is dedicated to broadening the understanding and accessibility of digital humanities tools, methodologies, and results to Ancient Near Eastern Studies. Ultimately, this book provides a model for introducing cyber-studies to the mainstream of humanities research
    • …
    corecore