1,248 research outputs found

    Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition

    Full text link
    Handwritten mathematical expression recognition is a challenging problem due to the complicated two-dimensional structures, ambiguous handwriting input and variant scales of handwritten math symbols. To settle this problem, we utilize the attention based encoder-decoder model that recognizes mathematical expression images from two-dimensional layouts to one-dimensional LaTeX strings. We improve the encoder by employing densely connected convolutional networks as they can strengthen feature extraction and facilitate gradient propagation especially on a small training set. We also present a novel multi-scale attention model which is employed to deal with the recognition of math symbols in different scales and save the fine-grained details that will be dropped by pooling operations. Validated on the CROHME competition task, the proposed method significantly outperforms the state-of-the-art methods with an expression recognition accuracy of 52.8% on CROHME 2014 and 50.1% on CROHME 2016, by only using the official training dataset

    Context-based Multi-stage Offline Handwritten Mathematical Symbol Recognition using Deep Learning

    Get PDF
    We propose a multi-stage machine learning (ML) architecture to improve the accuracy of offline handwritten mathematical symbol recognition. In the first stage, we train and assemble multiple deep convolutional neural networks to classify isolated mathematical symbols. However, certain ambiguous symbols are hard to classify without the context information of the mathematical expressions where the symbols belong. In the second stage, we train a deep convolutional neural network that further classifies the ambiguous symbols based on the context information of the symbols. To further improve the classification accuracy, in the third stage, we develop a set of rules to classify the ambiguity or otherwise the syntax of the mathematical expressions will be violated. We evaluate the proposed method by using the Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) dataset. The proposed method results the state-of-the-art accuracy of 94.04%, which is 1.62% improvement compared with the previous single-stage approach

    Stroke order normalization for improving recognition of online handwritten mathematical expressions

    Get PDF
    We present a technique based on stroke order normalization for improving recognition of online handwritten mathematical expressions (ME). The stroke order dependent system has less time complexity than the stroke order free system, but it must incorporate special grammar rules to cope with stroke order variations. The stroke order normalization technique solves this problem and also the problem of unexpected stroke order variations without increasing the time complexity of ME recognition. In order to normalize stroke order, the X-Y cut method is modified since its original form causes problems when structural components in ME overlap. First, vertically ordered strokes are located by detecting vertical symbols and their upper/lower components, which are treated as MEs and reordered recursively. Second, unordered strokes on the left side of the vertical symbols are reordered as horizontally ordered strokes. Third, the remaining strokes are reordered recursively. The horizontally ordered strokes are reordered from left to right, and the vertically ordered strokes are reordered from top to bottom. Finally, the proposed stroke order normalization is combined with the stroke order dependent ME recognition system. The evaluations on the CROHME 2014 database show that the ME recognition system incorporating the stroke order normalization outperforms all other systems that use only CROHME 2014 for training while the processing time is kept low

    Query-Driven Global Graph Attention Model for Visual Parsing: Recognizing Handwritten and Typeset Math Formulas

    Get PDF
    We present a new visual parsing method based on standard Convolutional Neural Networks (CNNs) for handwritten and typeset mathematical formulas. The Query-Driven Global Graph Attention (QD-GGA) parser employs multi-task learning, using a single feature representation for locating, classifying, and relating symbols. QD-GGA parses formulas by first constructing a Line-Of-Sight (LOS) graph over the input primitives (e.g handwritten strokes or connected components in images). Second, class distributions for LOS nodes and edges are obtained using query-specific feature filters (i.e., attention) in a single feed-forward pass. This allows end-to-end structure learning using a joint loss over primitive node and edge class distributions. Finally, a Maximum Spanning Tree (MST) is extracted from the weighted graph using Edmonds\u27 Arborescence Algorithm. The model may be run recurrently over the input graph, updating attention to focus on symbols detected in the previous iteration. QD-GGA does not require additional grammar rules and the language model is learned from the sets of symbols/relationships and the statistics over them in the training set. We benchmark our system against both handwritten and typeset state-of-the-art math recognition systems. Our preliminary results show that this is a promising new approach for visual parsing of math formulas. Using recurrent execution, symbol detection is near perfect for both handwritten and typeset formulas: we obtain a symbol f-measure of over 99.4% for both the CROHME (handwritten) and INFTYMCCDB-2 (typeset formula image) datasets. Our method is also much faster in both training and execution than state-of-the-art RNN-based formula parsers. The unlabeled structure detection of QDGGA is competitive with encoder-decoder models, but QD-GGA symbol and relationship classification is weaker. We believe this may be addressed through increased use of spatial features and global context
    corecore