Extracting Words and Multi-part Symbols in Graphics Rich Documents
- Publication date
- 1995
- Publisher
- Springer Verlag
Abstract
We present an algorithm for grouping multipart symbols, dashed lines, and character strings for extraction from line drawings. The image undergoes a lossless raster-to-vector conversion creating as its vector representation an undirected graph, a so-called run graph. Next, the image elements of the run graph are extracted and classified probabilistically based upon their geometric features using a decision tree. An area Voronoi tessellation of the members of the sets is constructed, from which a neighborhood graph is derived, which is guaranteed to be minimal and complete. The graph is then traversed to group the members of the various sets for extraction and input to different recognition modules. No a priori font or other domain specific information is required for the grouping, and no special geometrical relationships among the elements are assumed. Results are presented with example images taken from those used by our Swiss cadastral map understanding system