3,265 research outputs found

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

    A Study of Techniques and Challenges in Text Recognition Systems

    Get PDF
    The core system for Natural Language Processing (NLP) and digitalization is Text Recognition. These systems are critical in bridging the gaps in digitization produced by non-editable documents, as well as contributing to finance, health care, machine translation, digital libraries, and a variety of other fields. In addition, as a result of the pandemic, the amount of digital information in the education sector has increased, necessitating the deployment of text recognition systems to deal with it. Text Recognition systems worked on three different categories of text: (a) Machine Printed, (b) Offline Handwritten, and (c) Online Handwritten Texts. The major goal of this research is to examine the process of typewritten text recognition systems. The availability of historical documents and other traditional materials in many types of texts is another major challenge for convergence. Despite the fact that this research examines a variety of languages, the Gurmukhi language receives the most focus. This paper shows an analysis of all prior text recognition algorithms for the Gurmukhi language. In addition, work on degraded texts in various languages is evaluated based on accuracy and F-measure

    A Bottom Up Procedure for Text Line Segmentation of Latin Script

    Full text link
    In this paper we present a bottom up procedure for segmentation of text lines written or printed in the Latin script. The proposed method uses a combination of image morphology, feature extraction and Gaussian mixture model to perform this task. The experimental results show the validity of the procedure.Comment: Accepted and presented at the IEEE conference "International Conference on Advances in Computing, Communications and Informatics (ICACCI) 2017

    Research and Development of Feature Extraction from Myanmar Palm Leaf Manuscripts for the Myanmar Character Recognition System

    Get PDF
    This paper proposed Myanmar palm leaf manuscript handwriting OCR system. Each text area in the Myanmar palm-leaf manuscript is segmented. This segmented character text image is needed to be recognized to transform to Myanmar handwritten characters which express Myanmar’s precious historical and invaluable information. This paper involves two essential steps: preprocessing and feature extraction. The preprocessing is carried out to extract the attractive palm-leaf manuscript region from the Images automatically are taken by the camera and to support the enhanced images for subsequence processes of Myanmar character recognition from Myanmar palm leaves. The one-dimensional segmentation approach is used to crop leaf area in the image which is taken with high resolution. Line count analysis is also done to extract the region for using enough line count. After that, line segmentation is carried out using Object Frequency Histogram along the horizontal lines which can find the best optimal points between the lines. Similarly, the same technique but vertically is used to get each character or smallest group of characters. Totally 18 features are extracted to recognize the Myanmar palm-leaf manuscript characters. Although the experimental results are good enough but some difficulties are still needed to take account related to the connected components.

    Recognition of Marathi Newsprint Text Using Neural Network and Genetic Algorithm

    Get PDF
    Now a day there are many new methodologies required for the increasing needs in newly emerging areas, with this methodologies there are many techniques are present for the character recognition of handprint Devanagri, Bengali, Tamil, China etc. But very little research is for printed material. So in our project we propose the recognition of devnagari printed text using neural network and genetic algorithm. In India, more than 300 million people use Devanagari script for documentation. There has been a significant improvement in the research related to the recognition of printed as well as handwritten Devanagari text in the past few years.. All feature-extraction techniques as well as training, classification and matching techniques useful for the recognition are discussed in various sections of the paper. An attempt is made to address the most important results reported so far and it is also tried to highlight the beneficial directions of the research till date. Moreover, the paper also contains a comprehensive bibliography of many selected papers appeared in reputed journals and conference proceedings as an aid for the researchers working in the field of Devanagari printed text using neural network and genetic algorithm. DOI: 10.17762/ijritcc2321-8169.160411

    Junction Point Detection And Identification Of Broken Character In Touching Arabic Handwritten Text Using Overlapping Set Theory

    Get PDF
    Touching characters are formed when two or more characters share the same space with each other. Therefore, segmentation of these touching character is very challenging research topic especially for handwritten Arabic degraded documents. This is one of the key issue in recognition of the handwritten Arabic text. In order to make the recognition system more effective segmentation of these touching handwritten Arabic characters is considered to be very important research area. In this research, a new method is proposed, which is used to identify the junction or common point of Arabic touching word image by applying overlapping or intersection set theory operation, which will help to trace the correct boundary of the touching characters, identify the broken characters and also segmented these touching handwritten text in an efficient way. The proposed method has been evaluated on Arabic touching handwritten characters taken from handwritten datasets. The results show the efficiency of the proposed method. The proposed method is applicable to both degraded handwritten documents and printed documents
    • …
    corecore