Progress in Recognizing Typeset Mathematics

Abstract

Printed mathematics has a number of features which distinguish it from conventional text. These include structure in two dimensions (fractions, exponents, limits), frequent font changes, symbols with variable shape (quotient bars), and substantially differing notational conventions from source to source. When compounded with more generic problems such as noise and merged or broken characters, printed mathematics offers a challenging arena for recognition. Our project was initially driven by the goal of scanning and parsing some 5,000 pages of elaborate mathematics (tables of definite integrals). While our prototype system demonstrates success on translating noise-free typeset equations into Lisp expressions appropriate for further processing, a more semantic top-down approach appears necessary for higher levels of performance. Such an approach may benefit the incorporation of these programs into a more general document processing viewpoint. We intend to release to the public our somewh..

    Similar works

    Full text

    thumbnail-image

    Available Versions