4 research outputs found
A Study of Techniques and Challenges in Text Recognition Systems
The core system for Natural Language Processing (NLP) and digitalization is Text Recognition. These systems are critical in bridging the gaps in digitization produced by non-editable documents, as well as contributing to finance, health care, machine translation, digital libraries, and a variety of other fields. In addition, as a result of the pandemic, the amount of digital information in the education sector has increased, necessitating the deployment of text recognition systems to deal with it. Text Recognition systems worked on three different categories of text: (a) Machine Printed, (b) Offline Handwritten, and (c) Online Handwritten Texts. The major goal of this research is to examine the process of typewritten text recognition systems. The availability of historical documents and other traditional materials in many types of texts is another major challenge for convergence. Despite the fact that this research examines a variety of languages, the Gurmukhi language receives the most focus. This paper shows an analysis of all prior text recognition algorithms for the Gurmukhi language. In addition, work on degraded texts in various languages is evaluated based on accuracy and F-measure
A new framework for recognition of heavily degraded characters in historical typewritten documents based on semi-supervised clustering
This paper presents a new semi-supervised clustering
framework to the recognition of heavily degraded characters
in historical typewritten documents, where off-theshelf
OCR typically fails. The constraints are generated
using typographical (collection-independent) domain
knowledge and are used to guide both sample (glyph set)
partitioning and metric learning. Experimental results using
simple features provide encouraging evidence that
this approach can lead to significantly improved clustering
results compared to simple K-Means clustering, as
well as to clustering using a state-of-the art OCR engine