24,358 research outputs found

    Transcribing Content from Structural Images with Spotlight Mechanism

    Full text link
    Transcribing content from structural images, e.g., writing notes from music scores, is a challenging task as not only the content objects should be recognized, but the internal structure should also be preserved. Existing image recognition methods mainly work on images with simple content (e.g., text lines with characters), but are not capable to identify ones with more complex content (e.g., structured symbols), which often follow a fine-grained grammar. To this end, in this paper, we propose a hierarchical Spotlight Transcribing Network (STN) framework followed by a two-stage "where-to-what" solution. Specifically, we first decide "where-to-look" through a novel spotlight mechanism to focus on different areas of the original image following its structure. Then, we decide "what-to-write" by developing a GRU based network with the spotlight areas for transcribing the content accordingly. Moreover, we propose two implementations on the basis of STN, i.e., STNM and STNR, where the spotlight movement follows the Markov property and Recurrent modeling, respectively. We also design a reinforcement method to refine the framework by self-improving the spotlight mechanism. We conduct extensive experiments on many structural image datasets, where the results clearly demonstrate the effectiveness of STN framework.Comment: Accepted by KDD2018 Research Track. In proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'18

    Off-line Recognition of Printed Mathematical Expressions Using Stochastic Context-Free Grammars

    Full text link
    Off-line recognition of printed mathematical expressions consists of three major steps: segmentation, symbol recognition and structural analysis. In this work we study an approach based on a twodimensional extension of context-free grammars parsing. Finally, some experiments are reported to evaluate the developed system.Álvaro Muñoz, F. (2010). Off-line Recognition of Printed Mathematical Expressions Using Stochastic Context-Free Grammars. http://hdl.handle.net/10251/13732Archivo delegad

    IMEGE: Image-based Mathematical Expression Global Error

    Full text link
    Mathematical expression recognition is an active research eld that is related to document image analysis and typesetting. Several approaches have been proposed to tackle this problem, and automatic methods for performance evaluation are required. Mathematical expressions are usually represented as a coded string like LATEX or MathML for evaluation purpose. This representation has ambiguity problems given that the same expression can be coded in several ways. For that reason, the proposed approaches in the past either manually analyzed recognition results or they reported partial errors as symbol error rate. In this study, we present a novel global performance evaluation measure for mathematical expression based on image matching. In this way, using an image representation solves the representation ambiguity as well as human beings do. The proposed evaluation method is a global error measure that also provides local information about the recognition result.Álvaro Muñoz, F.; Sánchez Peiró, JA.; Benedí Ruiz, JM. (2011). IMEGE: Image-based Mathematical Expression Global Error. http://hdl.handle.net/10251/1308

    Query-Driven Global Graph Attention Model for Visual Parsing: Recognizing Handwritten and Typeset Math Formulas

    Get PDF
    We present a new visual parsing method based on standard Convolutional Neural Networks (CNNs) for handwritten and typeset mathematical formulas. The Query-Driven Global Graph Attention (QD-GGA) parser employs multi-task learning, using a single feature representation for locating, classifying, and relating symbols. QD-GGA parses formulas by first constructing a Line-Of-Sight (LOS) graph over the input primitives (e.g handwritten strokes or connected components in images). Second, class distributions for LOS nodes and edges are obtained using query-specific feature filters (i.e., attention) in a single feed-forward pass. This allows end-to-end structure learning using a joint loss over primitive node and edge class distributions. Finally, a Maximum Spanning Tree (MST) is extracted from the weighted graph using Edmonds\u27 Arborescence Algorithm. The model may be run recurrently over the input graph, updating attention to focus on symbols detected in the previous iteration. QD-GGA does not require additional grammar rules and the language model is learned from the sets of symbols/relationships and the statistics over them in the training set. We benchmark our system against both handwritten and typeset state-of-the-art math recognition systems. Our preliminary results show that this is a promising new approach for visual parsing of math formulas. Using recurrent execution, symbol detection is near perfect for both handwritten and typeset formulas: we obtain a symbol f-measure of over 99.4% for both the CROHME (handwritten) and INFTYMCCDB-2 (typeset formula image) datasets. Our method is also much faster in both training and execution than state-of-the-art RNN-based formula parsers. The unlabeled structure detection of QDGGA is competitive with encoder-decoder models, but QD-GGA symbol and relationship classification is weaker. We believe this may be addressed through increased use of spatial features and global context

    Scanning Single Shot Detector for Math in Document Images

    Get PDF
    We introduce the Scanning Single Shot Detector (ScanSSD) for detecting both embedded and displayed math expressions in document images using a single-stage network that does not require page layout, font, or, character information. ScanSSD uses sliding windows to generate sub-images of large document page images rendered at 600 dpi and applies Single Shot Detector (SSD) on each sub-image. Detection results from sub-images are pooled to generate page-level results. For pooling sub-image level detections, we introduce new methods based on the confidence scores and density of detections. ScanSSD is a modular architecture that can be easily applied to detecting other objects in document images. For the math expression detection task, we have created a new dataset called TFD-ICDAR 2019 from the existing GTDB datasets. Our dataset has 569 pages for training with 26,396 math expressions and 236 pages for testing with 11,885 math expressions. ScanSSD achieves an 80.19% F-score at IOU50 and a 72.96% F-score at IOU75 on TFD-ICDAR 2019 test dataset. An earlier version of ScanSSD placed 2nd in the ICDAR 2019 competition on the Typeset Formula Detection (TFD). Our data and code are publicly available at https://github.com/MaliParag/TFD-ICDAR2019 and https://github.com/MaliParag/ScanSSD, respectively

    The development of local solar irradiance for outdoor computer graphics rendering

    Get PDF
    Atmospheric effects are approximated by solving the light transfer equation, LTE, of a given viewing path. The resulting accumulated spectral energy (its visible band) arriving at the observer’s eyes, defines the colour of the object currently on the line of sight. Due to the convenience of using a single rendering equation to solve the LTE for daylight sky and distant objects (aerial perspective), recent methods had opt for a similar kind of approach. Alas, the burden that the real-time calculation brings to the foil had forced these methods to make simplifications that were not in line with the actual world observation. Consequently, the results of these methods are laden with visual-errors. The two most common simplifications made were: i) assuming the atmosphere as a full-scattering medium only and ii) assuming a single density atmosphere profile. This research explored the possibility of replacing the real-time calculation involved in solving the LTE with an analytical-based approach. Hence, the two simplifications made by the previous real-time methods can be avoided. The model was implemented on top of a flight simulator prototype system since the requirements of such system match the objectives of this study. Results were verified against the actual images of the daylight skies. Comparison was also made with the previous methods’ results to showcase the proposed model strengths and advantages over its peers
    corecore