1,183 research outputs found

    Math Search for the Masses: Multimodal Search Interfaces and Appearance-Based Retrieval

    Full text link
    We summarize math search engines and search interfaces produced by the Document and Pattern Recognition Lab in recent years, and in particular the min math search interface and the Tangent search engine. Source code for both systems are publicly available. "The Masses" refers to our emphasis on creating systems for mathematical non-experts, who may be looking to define unfamiliar notation, or browse documents based on the visual appearance of formulae rather than their mathematical semantics.Comment: Paper for Invited Talk at 2015 Conference on Intelligent Computer Mathematics (July, Washington DC

    Symbolic and Visual Retrieval of Mathematical Notation using Formula Graph Symbol Pair Matching and Structural Alignment

    Get PDF
    Large data collections containing millions of math formulae in different formats are available on-line. Retrieving math expressions from these collections is challenging. We propose a framework for retrieval of mathematical notation using symbol pairs extracted from visual and semantic representations of mathematical expressions on the symbolic domain for retrieval of text documents. We further adapt our model for retrieval of mathematical notation on images and lecture videos. Graph-based representations are used on each modality to describe math formulas. For symbolic formula retrieval, where the structure is known, we use symbol layout trees and operator trees. For image-based formula retrieval, since the structure is unknown we use a more general Line of Sight graph representation. Paths of these graphs define symbol pairs tuples that are used as the entries for our inverted index of mathematical notation. Our retrieval framework uses a three-stage approach with a fast selection of candidates as the first layer, a more detailed matching algorithm with similarity metric computation in the second stage, and finally when relevance assessments are available, we use an optional third layer with linear regression for estimation of relevance using multiple similarity scores for final re-ranking. Our model has been evaluated using large collections of documents, and preliminary results are presented for videos and cross-modal search. The proposed framework can be adapted for other domains like chemistry or technical diagrams where two visually similar elements from a collection are usually related to each other

    Off-line Recognition of Printed Mathematical Expressions Using Stochastic Context-Free Grammars

    Full text link
    Off-line recognition of printed mathematical expressions consists of three major steps: segmentation, symbol recognition and structural analysis. In this work we study an approach based on a twodimensional extension of context-free grammars parsing. Finally, some experiments are reported to evaluate the developed system.Álvaro Muñoz, F. (2010). Off-line Recognition of Printed Mathematical Expressions Using Stochastic Context-Free Grammars. http://hdl.handle.net/10251/13732Archivo delegad

    Features and Algorithms for Visual Parsing of Handwritten Mathematical Expressions

    Get PDF
    Math expressions are an essential part of scientific documents. Handwritten math expressions recognition can benefit human-computer interaction especially in the education domain and is a critical part of document recognition and analysis. Parsing the spatial arrangement of symbols is an essential part of math expression recognition. A variety of parsing techniques have been developed during the past three decades, and fall into two groups. The first group is graph-based parsing. It selects a path or sub-graph which obeys some rule to form a possible interpretation for the given expression. The second group is grammar driven parsing. Grammars and related parameters are defined manually for different tasks. The time complexity of these two groups parsing is high, and they often impose some strict constraints to reduce the computation. The aim of this thesis is working towards building a straightforward and effective parser with as few constraints as possible. First, we propose using a line of sight graph for representing the layout of strokes and symbols in math expressions. It achieves higher F-score than other graph representations and reduces search space for parsing. Second, we modify the shape context feature with Parzen window density estimation. This feature set works well for symbol segmentation, symbol classification and symbol layout analysis. We get a higher symbol segmentation F-score than other systems on CROHME 2014 dataset. Finally, we develop a Maximum Spanning Tree (MST) based parser using Edmonds\u27 algorithm, which extracts an MST from the directed line of sight graph in two passes: first symbols are segmented, and then symbols and spatial relationship are labeled. The time complexity of our MST-based parsing is lower than the time complexity of CYK parsing with context-free grammars. Also, our MST-based parsing obtains higher structure rate and expression rate than CYK parsing when symbol segmentation is accurate. Correct structure means we get the structure of the symbol layout tree correct, even though the label of the edge in the symbol layout tree might be wrong. The performance of our math expression recognition system with MST-based parsing is competitive on CROHME 2012 and 2014 datasets. For future work, how to incorporate symbol classifier result and correct segmentation error in MST-based parsing needs more research

    Stroke order normalization for improving recognition of online handwritten mathematical expressions

    Get PDF
    We present a technique based on stroke order normalization for improving recognition of online handwritten mathematical expressions (ME). The stroke order dependent system has less time complexity than the stroke order free system, but it must incorporate special grammar rules to cope with stroke order variations. The stroke order normalization technique solves this problem and also the problem of unexpected stroke order variations without increasing the time complexity of ME recognition. In order to normalize stroke order, the X-Y cut method is modified since its original form causes problems when structural components in ME overlap. First, vertically ordered strokes are located by detecting vertical symbols and their upper/lower components, which are treated as MEs and reordered recursively. Second, unordered strokes on the left side of the vertical symbols are reordered as horizontally ordered strokes. Third, the remaining strokes are reordered recursively. The horizontally ordered strokes are reordered from left to right, and the vertically ordered strokes are reordered from top to bottom. Finally, the proposed stroke order normalization is combined with the stroke order dependent ME recognition system. The evaluations on the CROHME 2014 database show that the ME recognition system incorporating the stroke order normalization outperforms all other systems that use only CROHME 2014 for training while the processing time is kept low

    Query-Driven Global Graph Attention Model for Visual Parsing: Recognizing Handwritten and Typeset Math Formulas

    Get PDF
    We present a new visual parsing method based on standard Convolutional Neural Networks (CNNs) for handwritten and typeset mathematical formulas. The Query-Driven Global Graph Attention (QD-GGA) parser employs multi-task learning, using a single feature representation for locating, classifying, and relating symbols. QD-GGA parses formulas by first constructing a Line-Of-Sight (LOS) graph over the input primitives (e.g handwritten strokes or connected components in images). Second, class distributions for LOS nodes and edges are obtained using query-specific feature filters (i.e., attention) in a single feed-forward pass. This allows end-to-end structure learning using a joint loss over primitive node and edge class distributions. Finally, a Maximum Spanning Tree (MST) is extracted from the weighted graph using Edmonds\u27 Arborescence Algorithm. The model may be run recurrently over the input graph, updating attention to focus on symbols detected in the previous iteration. QD-GGA does not require additional grammar rules and the language model is learned from the sets of symbols/relationships and the statistics over them in the training set. We benchmark our system against both handwritten and typeset state-of-the-art math recognition systems. Our preliminary results show that this is a promising new approach for visual parsing of math formulas. Using recurrent execution, symbol detection is near perfect for both handwritten and typeset formulas: we obtain a symbol f-measure of over 99.4% for both the CROHME (handwritten) and INFTYMCCDB-2 (typeset formula image) datasets. Our method is also much faster in both training and execution than state-of-the-art RNN-based formula parsers. The unlabeled structure detection of QDGGA is competitive with encoder-decoder models, but QD-GGA symbol and relationship classification is weaker. We believe this may be addressed through increased use of spatial features and global context

    Context-based Multi-stage Offline Handwritten Mathematical Symbol Recognition using Deep Learning

    Get PDF
    We propose a multi-stage machine learning (ML) architecture to improve the accuracy of offline handwritten mathematical symbol recognition. In the first stage, we train and assemble multiple deep convolutional neural networks to classify isolated mathematical symbols. However, certain ambiguous symbols are hard to classify without the context information of the mathematical expressions where the symbols belong. In the second stage, we train a deep convolutional neural network that further classifies the ambiguous symbols based on the context information of the symbols. To further improve the classification accuracy, in the third stage, we develop a set of rules to classify the ambiguity or otherwise the syntax of the mathematical expressions will be violated. We evaluate the proposed method by using the Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) dataset. The proposed method results the state-of-the-art accuracy of 94.04%, which is 1.62% improvement compared with the previous single-stage approach

    iDeLog: Iterative Dual Spatial and Kinematic Extraction of Sigma-Lognormal Parameters

    Full text link
    The Kinematic Theory of rapid movements and its associated Sigma-Lognormal model have been extensively used in a large variety of applications. While the physical and biological meaning of the model have been widely tested and validated for rapid movements, some shortcomings have been detected when it is used with continuous long and complex movements. To alleviate such drawbacks, and inspired by the motor equivalence theory and a conceivable visual feedback, this paper proposes a novel framework to extract the Sigma-Lognormal parameters, namely iDeLog. Specifically, iDeLog consists of two steps. The first one, influenced by the motor equivalence model, separately derives an initial action plan defined by a set of virtual points and angles from the trajectory and a sequence of lognormals from the velocity. In the second step, based on a hypothetical visual feedback compatible with an open-loop motor control, the virtual target points of the action plan are iteratively moved to improve the matching between the observed and reconstructed trajectory and velocity. During experiments conducted with handwritten signatures, iDeLog obtained promising results as compared to the previous development of the Sigma-Lognormal.Comment: Accepted Version published by Transactions on Pattern Analysis and Machine Intelligenc
    • 

    corecore