Search CORE

1,350 research outputs found

Recommended from our members

Use of colour for hand-filled form analysis and recognition

Author: Allen T
Sherkat N
Wong WS
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/07/2005
Field of study

Colour information in form analysis is currently under utilised. As technology has advanced and computing costs have reduced, the processing of forms in colour has now become practicable. This paper describes a novel colour-based approach to the extraction of filled data from colour form images. Images are first quantised to reduce the colour complexity and data is extracted by examining the colour characteristics of the images. The improved performance of the proposed method has been verified by comparing the processing time, recognition rate, extraction precision and recall rate to that of an equivalent black and white system

Nottingham Trent Institutional Repository (IRep)

Recognizing Handwriting Styles in a Historical Scanned Document Using Unsupervised Fuzzy Clustering

Author: Brick Aaron
Majumdar Sriparna
Publication venue
Publication date: 28/06/2023
Field of study

The forensic attribution of the handwriting in a digitized document to multiple scribes is a challenging problem of high dimensionality. Unique handwriting styles may be dissimilar in a blend of several factors including character size, stroke width, loops, ductus, slant angles, and cursive ligatures. Previous work on labeled data with Hidden Markov models, support vector machines, and semi-supervised recurrent neural networks have provided moderate to high success. In this study, we successfully detect hand shifts in a historical manuscript through fuzzy soft clustering in combination with linear principal component analysis. This advance demonstrates the successful deployment of unsupervised methods for writer attribution of historical documents and forensic document analysis.Comment: 26 pages in total, 5 figures and 2 table

arXiv.org e-Print Archive

Image-based logical document structure recognition

Author: Grzegorz Kamola
Mariusz Paradowski
Michal Spytkowski
Urszula Markowska-Kaczmar
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Springer - Publisher Connector

The application of new methods for offline recognition in printed Arabic documents

Author: Bouressace Hassina
Publication venue
Publication date: 29/05/2020
Field of study

SZTE Doktori Értekezések Repozitórium (SZTE Repository of Dissertations)

Image-based logical document structure recognition

Author: A Simon
AK Jain
C Strouthopoulos
G Carpenter
G Sainz Palmero
Grzegorz Kamola
J Liang
K Sain
L O’Gorman
LA Fletcher
M Dillencourt
Mariusz Paradowski
Michal Spytkowski
N Nikolaou
N Nikolaou
NX Bach
P Fankhuser
R Rangayyan
S Marinai
Urszula Markowska-Kaczmar
Y Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

User-driven Page Layout Analysis of historical printed Books

Author: Busson Sébastien
Demonet Marie-Luce
Ramel Jean-Yves
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2007
Field of study

International audienceIn this paper, based on the study of the specificity of historical printed books, we first explain the main error sources in classical methods used for page layout analysis. We show that each method (bottom-up and top-down) provides different types of useful information that should not be ignored, if we want to obtain both a generic method and good segmentation results. Next, we propose to use a hybrid segmentation algorithm that builds two maps: a shape map that focuses on connected components and a background map, which provides information about white areas corresponding to block separations in the page. Using this first segmentation, a classification of the extracted blocks can be achieved according to scenarios produced by the user. These scenarios are defined very simply during an interactive stage. The user is able to make processing sequences adapted to the different kinds of images he is likely to meet and according to the user needs. The proposed “user-driven approach” is capable of doing segmentation and labelling of the required user high level concepts efficiently and has achieved above 93% accurate results over different data sets tested. User feedbacks and experimental results demonstrate the effectiveness and usability of our framework mainly because the extraction rules can be defined without difficulty and parameters are not sensitive to page layout variation

HAL Université de Tours

Adaptive Methods for Robust Document Image Understanding

Author: Konya Iuliu
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

A vast amount of digital document material is continuously being produced as part of major digitization efforts around the world. In this context, generic and efficient automatic solutions for document image understanding represent a stringent necessity. We propose a generic framework for document image understanding systems, usable for practically any document types available in digital form. Following the introduced workflow, we shift our attention to each of the following processing stages in turn: quality assurance, image enhancement, color reduction and binarization, skew and orientation detection, page segmentation and logical layout analysis. We review the state of the art in each area, identify current defficiencies, point out promising directions and give specific guidelines for future investigation. We address some of the identified issues by means of novel algorithmic solutions putting special focus on generality, computational efficiency and the exploitation of all available sources of information. More specifically, we introduce the following original methods: a fully automatic detection of color reference targets in digitized material, accurate foreground extraction from color historical documents, font enhancement for hot metal typesetted prints, a theoretically optimal solution for the document binarization problem from both computational complexity- and threshold selection point of view, a layout-independent skew and orientation detection, a robust and versatile page segmentation method, a semi-automatic front page detection algorithm and a complete framework for article segmentation in periodical publications. The proposed methods are experimentally evaluated on large datasets consisting of real-life heterogeneous document scans. The obtained results show that a document understanding system combining these modules is able to robustly process a wide variety of documents with good overall accuracy

bonndoc – Der Publikationsserver der Universität Bonn

A word image coding technique and its applications in information retrieval from imaged documents

Author: ZHANG LI
Publication venue
Publication date: 22/09/2004
Field of study

Master'sMASTER OF SCIENC

ScholarBank@NUS