21,440 research outputs found
Math Search for the Masses: Multimodal Search Interfaces and Appearance-Based Retrieval
We summarize math search engines and search interfaces produced by the
Document and Pattern Recognition Lab in recent years, and in particular the min
math search interface and the Tangent search engine. Source code for both
systems are publicly available. "The Masses" refers to our emphasis on creating
systems for mathematical non-experts, who may be looking to define unfamiliar
notation, or browse documents based on the visual appearance of formulae rather
than their mathematical semantics.Comment: Paper for Invited Talk at 2015 Conference on Intelligent Computer
Mathematics (July, Washington DC
Recommended from our members
A Framework for Globally Optimizing Mixed-Integer Signomial Programs
Mixed-integer signomial optimization problems have broad applicability in engineering. Extending the Global Mixed-Integer Quadratic Optimizer, GloMIQO (Misener, Floudas in J. Glob. Optim., 2012. doi:10.1007/s10898-012-9874-7), this manuscript documents a computational framework for deterministically addressing mixed-integer signomial optimization problems to ε-global optimality. This framework generalizes the GloMIQO strategies of (1) reformulating user input, (2) detecting special mathematical structure, and (3) globally optimizing the mixed-integer nonconvex program. Novel contributions of this paper include: flattening an expression tree towards term-based data structures; introducing additional nonconvex terms to interlink expressions; integrating a dynamic implementation of the reformulation-linearization technique into the branch-and-cut tree; designing term-based underestimators that specialize relaxation strategies according to variable bounds in the current tree node. Computational results are presented along with comparison of the computational framework to several state-of-the-art solvers. © 2013 Springer Science+Business Media New York
Probabilistic mathematical formula recognition using a 2D context-free graph grammar
We present a probabilistic framework for the mathematical expression recognition problem. The developed system is flexible in that its grammar can be extended easily thanks to its graph grammar which eliminates the need for specifying rule precedence. It is also optimal in the sense that all possible interpretations of the expressions are expanded without making early commitments or hard decisions. In this paper, we give an overview of the whole system and describe in detail the graph grammar and the parsing process used in the system, along with some preliminary results on character, structure and expression recognition performances
An Image-Based Measure for Evaluation of Mathematical Expression Recognition
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-38628-2_81Mathematical expression recognition is an active research field that is related to document image analysis and typesetting. In this study, we present a novel global performance evaluation measure for mathematical expression recognition based on image matching. Using an image representation for evaluation tries to overcome the representation ambiguity as human beings do. The results of a recent competition were used to perform several experiments in order to analyze the benefits and drawbacks of this measure.This work was partially supported by the Spanish MEC
under the STraDA research project (TIN2012-37475-C02-01), the MITTRAL
(TIN2009-14633-C03-01) project, the FPU grant (AP2009-4363), by the Generalitat Valenciana under the grant Prometeo/2009/014, and through the EU 7th
Framework Programme grant tranScriptorium (Ref: 600707)Álvaro Muñoz, F.; Sánchez Peiró, JA.; Benedí Ruiz, JM. (2013). An Image-Based Measure for Evaluation of Mathematical Expression Recognition. En Pattern Recognition and Image Analysis. Springer. 682-690. https://doi.org/10.1007/978-3-642-38628-2_81S682690Álvaro, F., Sánchez, J.A., Benedí, J.M.: Unbiased evaluation of handwritten mathematical expression recognition. In: Proceedings of ICFHR, Italy, pp. 181–186 (2012)Chan, K.F., Yeung, D.Y.: Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recognition 34(8), 1671–1684 (2001)Chou, P.A.: Recognition of equations using a two-dimensional stochastic context-free grammar. In: Pearlman, W.A. (ed.) Visual Communications and Image Processing IV. SPIE Proceedings Series, vol. 1199, pp. 852–863 (1989)Garain, U., Chaudhuri, B.B.: A corpus for OCR research on mathematical expressions. Int. Journal on Document Analysis and Recognition 7, 241–259 (2005)Keysers, D., Deselaers, T., Gollan, C., Ney, H.: Deformation models for image recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence 29(8), 1422–1435 (2007)Mouchére, H., Viard-Gaudin, C., Garain, U., Kim, D.H., Kim, J.H.: ICFHR 2012 – Competition on Recognition of On-line Mathematical Expressions (CROHME 2012). In: Proceedings of ICFHR, Italy, pp. 807–812 (2012)Otsu, N.: A Threshold Selection Method from Gray-level Histograms. IEEE Transactions on Systems, Man and Cybernetics 9(1), 62–66 (1979)Sain, K., Dasgupta, A., Garain, U.: EMERS: a tree matching-based performance evaluation of mathematical expression recognition system. International Journal of Document Analysis and Recognition (2010)Toselli, A.H., Juan, A., Vidal, E.: Spontaneous Handwriting Recognition and Classification. In: Proceedings of ICPR, England, UK, pp. 433–436 (2004)Zanibbi, R., Blostein, D., Cordy, J.R.: Recognizing mathematical expressions using tree transformation. IEEE Trans. on Pattern Analysis and Machine Intelligence 24(11), 1–13 (2002)Zanibbi, R., Pillay, A., Mouchere, H., Viard-Gaudin, C., Blostein, D.: Stroke-based performance metrics for handwritten mathematical expressions. In: Proceedings of ICDAR, pp. 334–338 (2011
Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context
Mathematical formulae represent complex semantic information in a concise
form. Especially in Science, Technology, Engineering, and Mathematics,
mathematical formulae are crucial to communicate information, e.g., in
scientific papers, and to perform computations using computer algebra systems.
Enabling computers to access the information encoded in mathematical formulae
requires machine-readable formats that can represent both the presentation and
content, i.e., the semantics, of formulae. Exchanging such information between
systems additionally requires conversion methods for mathematical
representation formats. We analyze how the semantic enrichment of formulae
improves the format conversion process and show that considering the textual
context of formulae reduces the error rate of such conversions. Our main
contributions are: (1) providing an openly available benchmark dataset for the
mathematical format conversion task consisting of a newly created test
collection, an extensive, manually curated gold standard and task-specific
evaluation metrics; (2) performing a quantitative evaluation of
state-of-the-art tools for mathematical format conversions; (3) presenting a
new approach that considers the textual context of formulae to reduce the error
rate for mathematical format conversions. Our benchmark dataset facilitates
future research on mathematical format conversions as well as research on many
problems in mathematical information retrieval. Because we annotated and linked
all components of formulae, e.g., identifiers, operators and other entities, to
Wikidata entries, the gold standard can, for instance, be used to train methods
for formula concept discovery and recognition. Such methods can then be applied
to improve mathematical information retrieval systems, e.g., for semantic
formula search, recommendation of mathematical content, or detection of
mathematical plagiarism.Comment: 10 pages, 4 figure
Generalizing input-driven languages: theoretical and practical benefits
Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks
to their simplicity they enjoy various nice algebraic and logic properties that
have been successfully exploited in many application fields. Practically all of
their related problems are decidable, so that they support automatic
verification algorithms. Also, they can be recognized in real-time.
Context-free languages (CFL) are another major family well-suited to
formalize programming, natural, and many other classes of languages; their
increased generative power w.r.t. RL, however, causes the loss of several
closure properties and of the decidability of important problems; furthermore
they need complex parsing algorithms. Thus, various subclasses thereof have
been defined with different goals, spanning from efficient, deterministic
parsing to closure properties, logic characterization and automatic
verification techniques.
Among CFL subclasses, so-called structured ones, i.e., those where the
typical tree-structure is visible in the sentences, exhibit many of the
algebraic and logic properties of RL, whereas deterministic CFL have been
thoroughly exploited in compiler construction and other application fields.
After surveying and comparing the main properties of those various language
families, we go back to operator precedence languages (OPL), an old family
through which R. Floyd pioneered deterministic parsing, and we show that they
offer unexpected properties in two fields so far investigated in totally
independent ways: they enable parsing parallelization in a more effective way
than traditional sequential parsers, and exhibit the same algebraic and logic
properties so far obtained only for less expressive language families
Dynamic analysis of a system of hinge-connected rigid bodies with nonrigid appendages
Equations of motion are derived for use in simulating a spacecraft or other complex electromechanical system amenable to idealization as a set of hinge-connected rigid bodies of tree topology, with rigid axisymmetric rotors and nonrigid appendages attached to each rigid body in the set. In conjunction with a previously published report on finite-element appendage vibration equations, this report provides a complete minimum-dimension formulation suitable for generic programming for digital computer numerical integration
- …