1,350 research outputs found
Recommended from our members
Use of colour for hand-filled form analysis and recognition
Colour information in form analysis is currently under utilised. As technology has advanced and computing costs have reduced, the processing of forms in colour has now become practicable. This paper describes a novel colour-based approach to the extraction of filled data from colour form images. Images are first quantised to reduce the colour complexity and data is extracted by examining the colour characteristics of the images. The improved performance of the proposed method has been verified by comparing the processing time, recognition rate, extraction precision and recall rate to that of an equivalent black and white system
Recognizing Handwriting Styles in a Historical Scanned Document Using Unsupervised Fuzzy Clustering
The forensic attribution of the handwriting in a digitized document to
multiple scribes is a challenging problem of high dimensionality. Unique
handwriting styles may be dissimilar in a blend of several factors including
character size, stroke width, loops, ductus, slant angles, and cursive
ligatures. Previous work on labeled data with Hidden Markov models, support
vector machines, and semi-supervised recurrent neural networks have provided
moderate to high success. In this study, we successfully detect hand shifts in
a historical manuscript through fuzzy soft clustering in combination with
linear principal component analysis. This advance demonstrates the successful
deployment of unsupervised methods for writer attribution of historical
documents and forensic document analysis.Comment: 26 pages in total, 5 figures and 2 table
User-driven Page Layout Analysis of historical printed Books
International audienceIn this paper, based on the study of the specificity of historical printed books, we first explain the main error sources in classical methods used for page layout analysis. We show that each method (bottom-up and top-down) provides different types of useful information that should not be ignored, if we want to obtain both a generic method and good segmentation results. Next, we propose to use a hybrid segmentation algorithm that builds two maps: a shape map that focuses on connected components and a background map, which provides information about white areas corresponding to block separations in the page. Using this first segmentation, a classification of the extracted blocks can be achieved according to scenarios produced by the user. These scenarios are defined very simply during an interactive stage. The user is able to make processing sequences adapted to the different kinds of images he is likely to meet and according to the user needs. The proposed âuser-driven approachâ is capable of doing segmentation and labelling of the required user high level concepts efficiently and has achieved above 93% accurate results over different data sets tested. User feedbacks and experimental results demonstrate the effectiveness and usability of our framework mainly because the extraction rules can be defined without difficulty and parameters are not sensitive to page layout variation
Adaptive Methods for Robust Document Image Understanding
A vast amount of digital document material is continuously being produced as part of major digitization efforts around the world. In this context, generic and efficient automatic solutions for document image understanding represent a stringent necessity. We propose a generic framework for document image understanding systems, usable for practically any document types available in digital form. Following the introduced workflow, we shift our attention to each of the following processing stages in turn: quality assurance, image enhancement, color reduction and binarization, skew and orientation detection, page segmentation and logical layout analysis. We review the state of the art in each area, identify current defficiencies, point out promising directions and give specific guidelines for future investigation. We address some of the identified issues by means of novel algorithmic solutions putting special focus on generality, computational efficiency and the exploitation of all available sources of information. More specifically, we introduce the following original methods: a fully automatic detection of color reference targets in digitized material, accurate foreground extraction from color historical documents, font enhancement for hot metal typesetted prints, a theoretically optimal solution for the document binarization problem from both computational complexity- and threshold selection point of view, a layout-independent skew and orientation detection, a robust and versatile page segmentation method, a semi-automatic front page detection algorithm and a complete framework for article segmentation in periodical publications. The proposed methods are experimentally evaluated on large datasets consisting of real-life heterogeneous document scans. The obtained results show that a document understanding system combining these modules is able to robustly process a wide variety of documents with good overall accuracy
A word image coding technique and its applications in information retrieval from imaged documents
Master'sMASTER OF SCIENC
- âŠ