Extracting Words and Multi-part Symbols in Graphics Rich Documents

Abstract

We present an algorithm for grouping multipart symbols, dashed lines, and character strings for extraction from line drawings. The image undergoes a lossless raster-to-vector conversion creating as its vector representation an undirected graph, a so-called run graph. Next, the image elements of the run graph are extracted and classified probabilistically based upon their geometric features using a decision tree. An area Voronoi tessellation of the members of the sets is constructed, from which a neighborhood graph is derived, which is guaranteed to be minimal and complete. The graph is then traversed to group the members of the various sets for extraction and input to different recognition modules. No a priori font or other domain specific information is required for the grouping, and no special geometrical relationships among the elements are assumed. Results are presented with example images taken from those used by our Swiss cadastral map understanding system

Similar works

Full text

thumbnail-image

CiteSeerX

redirect
Last time updated on 22/10/2014

This paper was published in CiteSeerX.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.