8,334 research outputs found

    Optical Music Recognition with Convolutional Sequence-to-Sequence Models

    Get PDF
    Optical Music Recognition (OMR) is an important technology within Music Information Retrieval. Deep learning models show promising results on OMR tasks, but symbol-level annotated data sets of sufficient size to train such models are not available and difficult to develop. We present a deep learning architecture called a Convolutional Sequence-to-Sequence model to both move towards an end-to-end trainable OMR pipeline, and apply a learning process that trains on full sentences of sheet music instead of individually labeled symbols. The model is trained and evaluated on a human generated data set, with various image augmentations based on real-world scenarios. This data set is the first publicly available set in OMR research with sufficient size to train and evaluate deep learning models. With the introduced augmentations a pitch recognition accuracy of 81% and a duration accuracy of 94% is achieved, resulting in a note level accuracy of 80%. Finally, the model is compared to commercially available methods, showing a large improvements over these applications.Comment: ISMIR 201

    Math Search for the Masses: Multimodal Search Interfaces and Appearance-Based Retrieval

    Full text link
    We summarize math search engines and search interfaces produced by the Document and Pattern Recognition Lab in recent years, and in particular the min math search interface and the Tangent search engine. Source code for both systems are publicly available. "The Masses" refers to our emphasis on creating systems for mathematical non-experts, who may be looking to define unfamiliar notation, or browse documents based on the visual appearance of formulae rather than their mathematical semantics.Comment: Paper for Invited Talk at 2015 Conference on Intelligent Computer Mathematics (July, Washington DC

    Off-line Recognition of Printed Mathematical Expressions Using Stochastic Context-Free Grammars

    Full text link
    Off-line recognition of printed mathematical expressions consists of three major steps: segmentation, symbol recognition and structural analysis. In this work we study an approach based on a twodimensional extension of context-free grammars parsing. Finally, some experiments are reported to evaluate the developed system.Álvaro Muñoz, F. (2010). Off-line Recognition of Printed Mathematical Expressions Using Stochastic Context-Free Grammars. http://hdl.handle.net/10251/13732Archivo delegad

    Interactive interpretation of structured documents: Application to the recognition of handwritten architectural plans

    Get PDF
    International audienceThis paper addresses a whole architecture, including the IMISketch method. IMISketch method incorporates two aspects: document analysis and interactivity. This paper describes a global vision of all the parts of the project. IMISketch is a generic method for an interactive interpretation of handwritten sketches. The analysis of complex documents requires the management of uncertainty. While, in practice the similar methods often induce a large combinatorics, IMISketch method presents several optimization strategies to reduce the combinatorics. The goal of these optimizations is to have a time analysis compatible with user expectations. The decision process is able to solicit the user in the case of strong ambiguity: when it is not sure to make the right decision, the user explicitly validates the right decision to avoid a fastidious a posteriori verification phase due to propagation of errors.This interaction requires solving two major problems: how interpretation results will be presented to the user, and how the user will interact with analysis process. We propose to study the effects of those two aspects. The experiments demonstrate that (i) a progressive presentation of the analysis results, (ii) user interventions during it and (iii) the user solicitation by the analysis process are an efficient strategy for the recognition of complex off-line documents.To validate this interactive analysis method, several experiments are reported on off-line handwritten 2D architectural floor plans

    Semantic Graph Representation Learning for Handwritten Mathematical Expression Recognition

    Full text link
    Handwritten mathematical expression recognition (HMER) has attracted extensive attention recently. However, current methods cannot explicitly study the interactions between different symbols, which may fail when faced similar symbols. To alleviate this issue, we propose a simple but efficient method to enhance semantic interaction learning (SIL). Specifically, we firstly construct a semantic graph based on the statistical symbol co-occurrence probabilities. Then we design a semantic aware module (SAM), which projects the visual and classification feature into semantic space. The cosine distance between different projected vectors indicates the correlation between symbols. And jointly optimizing HMER and SIL can explicitly enhances the model's understanding of symbol relationships. In addition, SAM can be easily plugged into existing attention-based models for HMER and consistently bring improvement. Extensive experiments on public benchmark datasets demonstrate that our proposed module can effectively enhance the recognition performance. Our method achieves better recognition performance than prior arts on both CROHME and HME100K datasets.Comment: 12 Page

    Applying Hierarchical Contextual Parsing with Visual Density and Geometric Features to Typeset Formula Recognition

    Get PDF
    We demonstrate that recognition of scanned typeset mathematical expression images can be done by extracting maximum spanning trees from line of sight graphs weighted using geometric and visual density features. The approach used is hierarchical contextual parsing (HCP): Hierarchical in terms of starting with connected components and building to the symbol level using visual, spatial, and contextual features of connected components. Once connected components have been segmented into symbols, a new set of spatial, visual, and contextual features are extracted. One set of visual features is used for symbol classification, and another for parsing. The features are used in parsing to assign classifications and confidences to edges in a line of sight symbol graph. Layout trees describe expression structure in terms of spatial relations between symbols, such as horizontal, subscript, and superscript. From the weighted graph Edmonds\u27 algorithm is used to extract a maximum spanning tree. Segmentation and parsing are done without using symbol classification information, and symbol classification is done independently of expression structure recognition. The commonality between the recognition processes is the type of features they use, the visual densities. These visual densities are used for shape, spatial, and contextual information. The contextual information is shown to help in segmentation, parsing, and symbol recognition. The hierarchical contextual parsing has been implemented in the Python and Graph-based Online/Offline Recognizer for Math (Pythagor^m) system and tested on the InftyMCCDB-2 dataset. We created InftyMCCDB-2 from InftyCDB-2 as a open source dataset for scanned typeset math expression recognition. In building InftyMCCDB-2 modified formula structure representations were used to better capture the spatial positioning of symbols in the expression structures. Namely, baseline punctuation and symbol accents were moved out of horizontal baselines as their positions are not horizontally aligned with symbols on a writing line. With the transformed spatial layouts and HCP, 95.97% of expressions were parsed correctly when given symbols and 93.95% correctly parsed when requiring symbol segmentation from connected components. Overall HCP reached 90.83% expression recognition rate from connected components
    • 

    corecore