Search CORE

357 research outputs found

Text Line Segmentation of Historical Documents: a Survey

Author: A. Amin
A. Bozzi
A. Downton
A. Jain
A. Kolcz
Abderrazak Zahour
Bruno Taconet
C.L. Tan
C.V. Lakshmi
E. Cohen
E. Oztop
G. Seni
I.-K. Kim
K. Wong
L. Likforman-Sulem
L. Likforman-Sulem
L. Likforman-Sulem
L. O’Gorman
L.A. Fletcher
Laurence Likforman-Sulem
R. Plamondon
R.D. Lins
U. Pal
V. Shapiro
Ventadert Gusnard de de
Y. Solihin
Y.H. Tseng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/04/2007
Field of study

There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

arXiv.org e-Print Archive

Crossref

Automatic Chinese Postal Address Block Location Using Proximity Descriptors and Cooperative Profit Random Forests.

Author: Dong J.
Dong X.
Sun Jianyuan
Tao D.
Zhou H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/10/2017
Field of study

Locating the destination address block is key to automated sorting of mails. Due to the characteristics of Chinese envelopes used in mainland China, we here exploit proximity cues in order to describe the investigated regions on envelopes. We propose two proximity descriptors encoding spatial distributions of the connected components obtained from the binary envelope images. To locate the destination address block, these descriptors are used together with cooperative profit random forests (CPRFs). Experimental results show that the proposed proximity descriptors are superior to two component descriptors, which only exploit the shape characteristics of the individual components, and the CPRF classifier produces higher recall values than seven state-of-the-art classifiers. These promising results are due to the fact that the proposed descriptors encode the proximity characteristics of the binary envelope images, and the CPRF classifier uses an effective tree node split approach

Queen's University Belfast Research Portal

Crossref

OPUS - University of Technology Sydney

Bournemouth University Research Online

Neural representation of speech segmentation and syntactic structure discrimination

Author: Bai F.
Publication venue: Radboud University Nijmegen
Publication date: 01/01/2022
Field of study

MPG.PuRe

A comprehensive survey of handwritten document benchmarks: structure, usage and evaluation

Author
Publication venue: Springer
Publication date: 24/12/2015
Field of study

Springer - Publisher Connector

Character Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field

Directory of Open Access Books (DOAB)

Information Preserving Processing of Noisy Handwritten Document Images

Author: Chen Jin
Publication venue: Lehigh Preserve
Publication date
Field of study

Many pre-processing techniques that normalize artifacts and clean noise induce anomalies due to discretization of the document image. Important information that could be used at later stages may be lost. A proposed composite-model framework takes into account pre-printed information, user-added data, and digitization characteristics. Its benefits are demonstrated by experiments with statistically significant results. Separating pre-printed ruling lines from user-added handwriting shows how ruling lines impact people\u27s handwriting and how they can be exploited for identifying writers. Ruling line detection based on multi-line linear regression reduces the mean error of counting them from 0.10 to 0.03, 6.70 to 0.06, and 0.13 to 0.02, com- pared to an HMM-based approach on three standard test datasets, thereby reducing human correction time by 50%, 83%, and 72% on average. On 61 page images from 16 rule-form templates, the precision and recall of form cell recognition are increased by 2.7% and 3.7%, compared to a cross-matrix approach. Compensating for and exploiting ruling lines during feature extraction rather than pre-processing raises the writer identification accuracy from 61.2% to 67.7% on a 61-writer noisy Arabic dataset. Similarly, counteracting page-wise skew by subtracting it or transforming contours in a continuous coordinate system during feature extraction improves the writer identification accuracy. An implementation study of contour-hinge features reveals that utilizing the full probabilistic probability distribution function matrix improves the writer identification accuracy from 74.9% to 79.5%

Lehigh University: Lehigh Preserve