24,358 research outputs found
Transcribing Content from Structural Images with Spotlight Mechanism
Transcribing content from structural images, e.g., writing notes from music
scores, is a challenging task as not only the content objects should be
recognized, but the internal structure should also be preserved. Existing image
recognition methods mainly work on images with simple content (e.g., text lines
with characters), but are not capable to identify ones with more complex
content (e.g., structured symbols), which often follow a fine-grained grammar.
To this end, in this paper, we propose a hierarchical Spotlight Transcribing
Network (STN) framework followed by a two-stage "where-to-what" solution.
Specifically, we first decide "where-to-look" through a novel spotlight
mechanism to focus on different areas of the original image following its
structure. Then, we decide "what-to-write" by developing a GRU based network
with the spotlight areas for transcribing the content accordingly. Moreover, we
propose two implementations on the basis of STN, i.e., STNM and STNR, where the
spotlight movement follows the Markov property and Recurrent modeling,
respectively. We also design a reinforcement method to refine the framework by
self-improving the spotlight mechanism. We conduct extensive experiments on
many structural image datasets, where the results clearly demonstrate the
effectiveness of STN framework.Comment: Accepted by KDD2018 Research Track. In proceedings of the 24th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining
(KDD'18
Off-line Recognition of Printed Mathematical Expressions Using Stochastic Context-Free Grammars
Off-line recognition of printed mathematical expressions consists of three major steps: segmentation,
symbol recognition and structural analysis. In this work we study an approach based on a twodimensional
extension of context-free grammars parsing. Finally, some experiments are reported to
evaluate the developed system.Álvaro Muñoz, F. (2010). Off-line Recognition of Printed Mathematical Expressions Using Stochastic Context-Free Grammars. http://hdl.handle.net/10251/13732Archivo delegad
IMEGE: Image-based Mathematical Expression Global Error
Mathematical expression recognition is an active research eld that is related to document image analysis and typesetting. Several approaches have been proposed to tackle this problem, and automatic methods for performance evaluation are required. Mathematical expressions are usually represented as a coded string like LATEX or MathML for evaluation purpose. This representation has ambiguity problems given that the same expression can be coded in several ways. For that reason, the proposed approaches in the past either manually analyzed recognition results or they reported partial errors as symbol error rate. In this study,
we present a novel global performance evaluation measure for mathematical expression based on image matching. In this way, using an image representation solves the representation ambiguity as well as human beings do. The proposed evaluation method is a global error measure that also provides local information about the recognition result.Álvaro Muñoz, F.; Sánchez Peiró, JA.; Benedí Ruiz, JM. (2011). IMEGE: Image-based Mathematical Expression Global Error. http://hdl.handle.net/10251/1308
Query-Driven Global Graph Attention Model for Visual Parsing: Recognizing Handwritten and Typeset Math Formulas
We present a new visual parsing method based on standard Convolutional Neural Networks (CNNs) for handwritten and typeset mathematical formulas. The Query-Driven Global Graph Attention (QD-GGA) parser employs multi-task learning, using a single feature representation for locating, classifying, and relating symbols. QD-GGA parses formulas by first constructing a Line-Of-Sight (LOS) graph over the input primitives (e.g handwritten strokes or connected components in images). Second, class distributions for LOS nodes and edges are obtained using query-specific feature filters (i.e., attention) in a single feed-forward pass. This allows end-to-end structure learning using a joint loss over primitive node and edge class distributions. Finally, a Maximum Spanning Tree (MST) is extracted from the weighted graph using Edmonds\u27 Arborescence Algorithm. The model may be run recurrently over the input graph, updating attention to focus on symbols detected in the previous iteration. QD-GGA does not require additional grammar rules and the language model is learned from the sets of symbols/relationships and the statistics over them in the training set.
We benchmark our system against both handwritten and typeset state-of-the-art math recognition systems. Our preliminary results show that this is a promising new approach for visual parsing of math formulas. Using recurrent execution, symbol detection is near perfect for both handwritten and typeset formulas: we obtain a symbol f-measure of over 99.4% for both the CROHME (handwritten) and INFTYMCCDB-2 (typeset formula image) datasets. Our method is also much faster in both training and execution than state-of-the-art
RNN-based formula parsers. The unlabeled structure detection of QDGGA is competitive with encoder-decoder models, but QD-GGA symbol and relationship classification is weaker. We believe this may be addressed through increased use of spatial features and global context
Scanning Single Shot Detector for Math in Document Images
We introduce the Scanning Single Shot Detector (ScanSSD) for detecting both embedded and displayed math expressions in document images using a single-stage network that does not require page layout, font, or, character information. ScanSSD uses sliding windows to generate sub-images of large document page images rendered at 600 dpi and applies Single Shot Detector (SSD) on each sub-image. Detection results from sub-images are pooled to generate page-level results. For pooling sub-image level detections, we introduce new methods based on the confidence scores and density of detections. ScanSSD is a modular architecture that can be easily applied to detecting other objects in document images.
For the math expression detection task, we have created a new dataset called TFD-ICDAR 2019 from the existing GTDB datasets. Our dataset has 569 pages for training with 26,396 math expressions and 236 pages for testing with 11,885 math expressions. ScanSSD achieves an 80.19% F-score at IOU50 and a 72.96% F-score at IOU75 on TFD-ICDAR 2019 test dataset. An earlier version of ScanSSD placed 2nd in the ICDAR 2019 competition on the Typeset Formula Detection (TFD). Our data and code are publicly available at https://github.com/MaliParag/TFD-ICDAR2019 and https://github.com/MaliParag/ScanSSD, respectively
The development of local solar irradiance for outdoor computer graphics rendering
Atmospheric effects are approximated by solving the light transfer equation, LTE, of a given viewing path. The resulting accumulated spectral energy (its visible band) arriving at the observer’s eyes, defines the colour of the object currently on the line of sight. Due to the convenience of using a single rendering equation to solve the LTE for daylight sky and distant objects (aerial perspective), recent methods had opt for a similar kind of approach. Alas, the burden that the real-time calculation brings to the foil had forced these methods to make simplifications that were not in line with the actual world observation. Consequently, the results of these methods are laden with visual-errors. The two most common simplifications made were: i) assuming the atmosphere as a full-scattering medium only and ii) assuming a single density atmosphere profile. This research explored the possibility of replacing the real-time calculation involved in solving the LTE with an analytical-based approach. Hence, the two simplifications made by the previous real-time methods can be avoided. The model was implemented on top of a flight simulator prototype system since the requirements of such system match the objectives of this study. Results were verified against the actual images of the daylight skies. Comparison was also made with the previous methods’ results to showcase the proposed model strengths and advantages over its peers
- …