425 research outputs found

    Stroke order normalization for improving recognition of online handwritten mathematical expressions

    Get PDF
    We present a technique based on stroke order normalization for improving recognition of online handwritten mathematical expressions (ME). The stroke order dependent system has less time complexity than the stroke order free system, but it must incorporate special grammar rules to cope with stroke order variations. The stroke order normalization technique solves this problem and also the problem of unexpected stroke order variations without increasing the time complexity of ME recognition. In order to normalize stroke order, the X-Y cut method is modified since its original form causes problems when structural components in ME overlap. First, vertically ordered strokes are located by detecting vertical symbols and their upper/lower components, which are treated as MEs and reordered recursively. Second, unordered strokes on the left side of the vertical symbols are reordered as horizontally ordered strokes. Third, the remaining strokes are reordered recursively. The horizontally ordered strokes are reordered from left to right, and the vertically ordered strokes are reordered from top to bottom. Finally, the proposed stroke order normalization is combined with the stroke order dependent ME recognition system. The evaluations on the CROHME 2014 database show that the ME recognition system incorporating the stroke order normalization outperforms all other systems that use only CROHME 2014 for training while the processing time is kept low

    Features and Algorithms for Visual Parsing of Handwritten Mathematical Expressions

    Get PDF
    Math expressions are an essential part of scientific documents. Handwritten math expressions recognition can benefit human-computer interaction especially in the education domain and is a critical part of document recognition and analysis. Parsing the spatial arrangement of symbols is an essential part of math expression recognition. A variety of parsing techniques have been developed during the past three decades, and fall into two groups. The first group is graph-based parsing. It selects a path or sub-graph which obeys some rule to form a possible interpretation for the given expression. The second group is grammar driven parsing. Grammars and related parameters are defined manually for different tasks. The time complexity of these two groups parsing is high, and they often impose some strict constraints to reduce the computation. The aim of this thesis is working towards building a straightforward and effective parser with as few constraints as possible. First, we propose using a line of sight graph for representing the layout of strokes and symbols in math expressions. It achieves higher F-score than other graph representations and reduces search space for parsing. Second, we modify the shape context feature with Parzen window density estimation. This feature set works well for symbol segmentation, symbol classification and symbol layout analysis. We get a higher symbol segmentation F-score than other systems on CROHME 2014 dataset. Finally, we develop a Maximum Spanning Tree (MST) based parser using Edmonds\u27 algorithm, which extracts an MST from the directed line of sight graph in two passes: first symbols are segmented, and then symbols and spatial relationship are labeled. The time complexity of our MST-based parsing is lower than the time complexity of CYK parsing with context-free grammars. Also, our MST-based parsing obtains higher structure rate and expression rate than CYK parsing when symbol segmentation is accurate. Correct structure means we get the structure of the symbol layout tree correct, even though the label of the edge in the symbol layout tree might be wrong. The performance of our math expression recognition system with MST-based parsing is competitive on CROHME 2012 and 2014 datasets. For future work, how to incorporate symbol classifier result and correct segmentation error in MST-based parsing needs more research

    Applying Hierarchical Contextual Parsing with Visual Density and Geometric Features to Typeset Formula Recognition

    Get PDF
    We demonstrate that recognition of scanned typeset mathematical expression images can be done by extracting maximum spanning trees from line of sight graphs weighted using geometric and visual density features. The approach used is hierarchical contextual parsing (HCP): Hierarchical in terms of starting with connected components and building to the symbol level using visual, spatial, and contextual features of connected components. Once connected components have been segmented into symbols, a new set of spatial, visual, and contextual features are extracted. One set of visual features is used for symbol classification, and another for parsing. The features are used in parsing to assign classifications and confidences to edges in a line of sight symbol graph. Layout trees describe expression structure in terms of spatial relations between symbols, such as horizontal, subscript, and superscript. From the weighted graph Edmonds\u27 algorithm is used to extract a maximum spanning tree. Segmentation and parsing are done without using symbol classification information, and symbol classification is done independently of expression structure recognition. The commonality between the recognition processes is the type of features they use, the visual densities. These visual densities are used for shape, spatial, and contextual information. The contextual information is shown to help in segmentation, parsing, and symbol recognition. The hierarchical contextual parsing has been implemented in the Python and Graph-based Online/Offline Recognizer for Math (Pythagor^m) system and tested on the InftyMCCDB-2 dataset. We created InftyMCCDB-2 from InftyCDB-2 as a open source dataset for scanned typeset math expression recognition. In building InftyMCCDB-2 modified formula structure representations were used to better capture the spatial positioning of symbols in the expression structures. Namely, baseline punctuation and symbol accents were moved out of horizontal baselines as their positions are not horizontally aligned with symbols on a writing line. With the transformed spatial layouts and HCP, 95.97% of expressions were parsed correctly when given symbols and 93.95% correctly parsed when requiring symbol segmentation from connected components. Overall HCP reached 90.83% expression recognition rate from connected components

    Pen-based Methods For Recognition and Animation of Handwritten Physics Solutions

    Get PDF
    There has been considerable interest in constructing pen-based intelligent tutoring systems due to the natural interaction metaphor and low cognitive load afforded by pen-based interaction. We believe that pen-based intelligent tutoring systems can be further enhanced by integrating animation techniques. In this work, we explore methods for recognizing and animating sketched physics diagrams. Our methodologies enable an Intelligent Tutoring System (ITS) to understand the scenario and requirements posed by a given problem statement and to couple this knowledge with a computational model of the student\u27s handwritten solution. These pieces of information are used to construct meaningful animations and feedback mechanisms that can highlight errors in student solutions. We have constructed a prototype ITS that can recognize mathematics and diagrams in a handwritten solution and infer implicit relationships among diagram elements, mathematics and annotations such as arrows and dotted lines. We use natural language processing to identify the domain of a given problem, and use this information to select one or more of four domain-specific physics simulators to animate the user\u27s sketched diagram. We enable students to use their answers to guide animation behavior and also describe a novel algorithm for checking recognized student solutions. We provide examples of scenarios that can be modeled using our prototype system and discuss the strengths and weaknesses of our current prototype. Additionally, we present the findings of a user study that aimed to identify animation requirements for physics tutoring systems. We describe a taxonomy for categorizing different types of animations for physics problems and highlight how the taxonomy can be used to define requirements for 50 physics problems chosen from a university textbook. We also present a discussion of 56 handwritten solutions acquired from physics students and describe how suitable animations could be constructed for each of them

    Drawing from calculators.

    Get PDF

    AutoGraff: towards a computational understanding of graffiti writing and related art forms.

    Get PDF
    The aim of this thesis is to develop a system that generates letters and pictures with a style that is immediately recognizable as graffiti art or calligraphy. The proposed system can be used similarly to, and in tight integration with, conventional computer-aided geometric design tools and can be used to generate synthetic graffiti content for urban environments in games and in movies, and to guide robotic or fabrication systems that can materialise the output of the system with physical drawing media. The thesis is divided into two main parts. The first part describes a set of stroke primitives, building blocks that can be combined to generate different designs that resemble graffiti or calligraphy. These primitives mimic the process typically used to design graffiti letters and exploit well known principles of motor control to model the way in which an artist moves when incrementally tracing stylised letter forms. The second part demonstrates how these stroke primitives can be automatically recovered from input geometry defined in vector form, such as the digitised traces of writing made by a user, or the glyph outlines in a font. This procedure converts the input geometry into a seed that can be transformed into a variety of calligraphic and graffiti stylisations, which depend on parametric variations of the strokes

    Geometristen muotojen reaaliaikainen tunnistus

    Get PDF
    Kynä- ja kosketuskäyttöliittymät vaativat toimiakseen tehokasta ja tarkkaa hahmontunnistusta. Tässä työssä esitellään reaaliaikaisen hahmontunnistuksen käsitteistöä, yleisiä menetelmiä ja aikaisempaa tutkimusta. Lyhyesti käsitellään eri tutkimusryhmien esittämiä hahmontunnistusjärjestelmiä. Lisäksi esitellään geometrisiin piirteisiin perustuva hahmontunnistusjärjestelmä. Työ antaa yksityiskohtaiset kuvaukset piirtoviivan esiprosessointi- ja piirteenirrotusalgoritmeista sekä hahmoluokittelumenetelmästä. Lisäksi kuvaillaan hahmontunnistusheuristiikka kahdelle yksinkertaiselle muodolle (nuoli ja tähti). Joukko koehenkilöitä käytti työssä toteutettua graa_sta käyttöliittymää, minkä tuloksena saatiin realistiset tulokset järjestelmän laskennallisesta suorituskyvystä ja tarkkuudesta: toteutettu järjestelmä on laskennallisesti nopea mutta tunnistustarkkuus monitulkintainen. Lopuksi pohditaan valitun lähestymistavan ongelmia ja rajoitteita.Effective sketch recognition is the basis for pen and touch-based human-computer interfaces. In this thesis the concepts, common methods and earlier work in the research area of online symbol recognition are presented. A set of shape recognition approaches proposed in the past by various research teams are briefly introduced. An online shape recognizer using global geometric features is described. The preprocessing and feature extraction algorithms as well as the shape classification method are described in detail. Recognition heuristics for two simple shapes (arrow and star) are suggested. A graphical user interface was implemented and a group of subjects employed to obtain realistic results of the computational performance and recognition accuracy of the system: the implemented system performs fast but the results on the recognition accuracy were ambiguous. Finally, the problems and restrictions of the approach are discussed

    Spatial vs. Graph-Based Formula Retrieval

    Get PDF
    Recently math formula search engines have become a useful tool for novice users learning a new topic. While systems exist already with the ability to do formula retrieval, they rely on prefix matching and typed query entries. This can be an obstacle for novice users who are not proficient with languages used to express formulas such as LaTeX, or do not remember the left end of a formula, or wish to match formulas at multiple locations (e.g., using `dx\int \quad\quad dx\u27 as a query). We generalize a one dimensional spatial encoding for word spotting in handwritten document images, the Pyramidal Histogram of Characters or PHOC, to obtain the two-dimensional XY-PHOC providing robust spatial embeddings with modest storage requirements, and without requiring costly operations used to generate graphs. The spatial representation captures the relative position of symbols without needing to store explicit edges between symbols. Our spatial representation is able to match queries that are disjoint subgraphs within indexed formulas. Existing graph and tree-based formula retrieval models are not designed to handle disjoint graphs, and relationships may be added to a query that do not exist in the final formula, making it less similar for matching. XY-PHOC embeddings provide a simple spatial embedding providing competitive results in formula similarity search and autocompletion, and supports queries comprised of symbols in two dimensions, without the need to form a connected graph for search