Search CORE

2 research outputs found

Re-Typograph Phase I: a Proof-of-Concept for Typeface Parameter Extraction from Historical Documents

Author: Blégean Julien
Bouville Thomas
Cao Hongliu
Ghamizi Salah
Houpin Romain
Lamiroy Bart
Lloyd Matthias
Publication venue: 'Instytut Dermatologii Radoslaw Spiewak'
Publication date: 11/02/2015
Field of study

International audienceThis paper reports on the first phase of an attempt to create a full retro-engineering pipeline that aims to construct a complete set of coherent typographic parameters defining the typefaces used in a printed homogenous text. It should be stressed that this process cannot reasonably be expected to be fully automatic and that it is designed to include human interaction. Although font design is governed by a set of quite robust and formal geometric rulesets, it still heavily relies on subjective human interpretation. Furthermore, different parameters, applied to the generic rulesets may actually result in quite similar and visually difficult to distinguish typefaces, making the retro-engineering an inverse problem that is ill conditioned once shape distortions (related to the printing and/or scanning process) come into play. This work is the first phase of a long iterative process, in which we will progressively study and assess the techniques from the state-of-the-art that are most suited to our problem and investigate new directions when they prove to not quite adequate. As a first step, this is more of a feasibility proof-of-concept, that will allow us to clearly pinpoint the items that will require more in-depth research over the next iterations

INRIA a CCSD electronic archive server

Word Retrieval in Historical Document using Character-Primitives

Author: Ragot Nicolas
Ramel Jean-Yves
Roy Partha Pratim
Publication venue: HAL CCSD
Publication date: 18/09/2011
Field of study

International audienceWord searching and indexing in historical document collections is a challenging problem because, characters in these documents are often touching or broken due to degradation/ ageing effects. For efficient searching in such historical documents, this paper presents a novel approach towards word spotting using string matching of character primitives. We describe the text string as a sequence of primitives which consists of a single character or a part of a character. Primitive segmentation is performed analyzing text background information that is obtained by water reservoir technique. Next, the primitives are clustered using template matching and a codebook of representative primitives is built. Using this primitive codebook, the text information in the document images are encoded and stored. For a query word, we segment it into primitives and encode the word by a string of representative primitives from codebook. Finally, an approximate string matching is applied to find similar words. The matching similarity is used to rank the retrieved words. The proposed method is tested on historical books of French alphabets and we have obtained encouraging results from the experiment

Crossref

HAL Université de Tours