Search CORE

795 research outputs found

Text Line Segmentation of Historical Documents: a Survey

Author: A. Amin
A. Bozzi
A. Downton
A. Jain
A. Kolcz
Abderrazak Zahour
Bruno Taconet
C.L. Tan
C.V. Lakshmi
E. Cohen
E. Oztop
G. Seni
I.-K. Kim
K. Wong
L. Likforman-Sulem
L. Likforman-Sulem
L. Likforman-Sulem
L. O’Gorman
L.A. Fletcher
Laurence Likforman-Sulem
R. Plamondon
R.D. Lins
U. Pal
V. Shapiro
Ventadert Gusnard de de
Y. Solihin
Y.H. Tseng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/04/2007
Field of study

There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

arXiv.org e-Print Archive

Crossref

Decision Fusion and Contextual Information for Arabic Words Recognition for Computing and Informatics

Author: Farah Nadir
Sellami Mokhtar
Souici Labiba
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 06/02/2012
Field of study

The study of multiple classifier systems has become recently an area of intensive research in pattern recognition. Also in handwriting recognition, systems combining several classifiers have been investigated. An approach for recognizing the legal amount for handwritten Arabic bank check is described in this article. The solution uses multiple information sources to recognize words. The recognition step is preformed with a parallel combination of three kinds of classifiers using holistic word structural features. The classification stage results are first normalized, and the sum combination is performed as a decision fusion scheme, after which a syntactic analyzer makes final decision on the candidate words. Using this approach, the obtained results are very interesting and promising

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Dissimilarity Gaussian Mixture Models for Efficient Offline Handwritten Text-Independent Identification using SIFT and RootSIFT Descriptors

Author: Bouridane Ahmed
Khan Faraz
Khelifi Fouad
Tahir Muhammad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2019
Field of study

Handwriting biometrics is the science of identifying the behavioural aspect of an individual’s writing style and exploiting it to develop automated writer identification and verification systems. This paper presents an efficient handwriting identification system which combines Scale Invariant Feature Transform (SIFT) and RootSIFT descriptors in a set of Gaussian mixture models (GMM). In particular, a new concept of similarity and dissimilarity Gaussian mixture models (SGMM and DGMM) is introduced. While a SGMM is constructed for every writer to describe the intra-class similarity that is exhibited between the handwritten texts of the same writer, a DGMM represents the contrast or dissimilarity that exists between the writer’s style on one hand and other different handwriting styles on the other hand. Furthermore, because the handwritten text is described by a number of key point descriptors where each descriptor generates a SGMM/DGMM score, a new weighted histogram method is proposed to derive the intermediate prediction score for each writer’s GMM. The idea of weighted histogram exploits the fact that handwritings from the same writer should exhibit more similar textual patterns than dissimilar ones, hence, by penalizing the bad scores with a cost function, the identification rate can be significantly enhanced. Our proposed system has been extensively assessed using six different public datasets (including three English, two Arabic and one hybrid language) and the results have shown the superiority of the proposed system over state-of-the-art techniques

Northumbria Research Link

A novel approach for handedness detection from off-line handwriting using fuzzy conceptual reduction

Author: A Bensefia
Ali Jaoua
B Ganter
BV Dasarathy
EN Zois
Fethi Ferjani
G Salton
HE Said
I Siddiqi
J Riguet
L Breiman
L Wang
M Bulacu
M Liwicki
M Liwicki
N Arica
N Otsu
R Belohlavek
R Belohlavek
R Plamondon
RO Duda
S Al Maadeed
S Elloumi
S Impedovo
Samir Elloumi
SN Srihari
Somaya Al-Maadeed
U Bhattacharya
U Pal
UV Marti
VK Govindan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Character Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field

Directory of Open Access Books (DOAB)

Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach

Author: Federico Ruozzi
Luca Sala
Matteo Vanzini
Riccardo Amerigo Vigliermo
Riccardo Martoglia
Sonia Bergamaschi
Stefania De Nardis
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

The linguistic and social impact of multiculturalism can no longer be neglected in any sector, creating the urgent need of creating systems and procedures for managing and sharing cultural heritages in both supranational and multi-literate contexts. In order to achieve this goal, text sensing appears to be one of the most crucial research areas. The long-term objective of the DigitalMaktaba project, born from interdisciplinary collaboration between computer scientists, historians, librarians, engineers and linguists, is to establish procedures for the creation, management and cataloguing of archival heritage in non-Latin alphabets. In this paper, we discuss the currently ongoing design of an innovative workflow and tool in the area of text sensing, for the automatic extraction of knowledge and cataloguing of documents written in non-Latin languages (Arabic, Persian and Azerbaijani). The current prototype leverages different OCR, text processing and information extraction techniques in order to provide both a highly accurate extracted text and rich metadata content (including automatically identified cataloguing metadata), overcoming typical limitations of current state of the art approaches. The initial tests provide promising results. The paper includes a discussion of future steps (e.g., AI-based techniques further leveraging the extracted data/metadata and making the system learn from user feedback) and of the many foreseen advantages of this research, both from a technical and a broader cultural-preservation and sharing point of view

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Reconocimiento de notación matemática escrita a mano fuera de línea

Author: RAMIREZ PIÑA CARLOS
RAMIREZ PIÑA CARLOS
Publication venue: 'Universidad Autonoma del Estado de Mexico'
Publication date: 16/10/2017
Field of study

El reconocimiento automático de expresiones matemáticas es uno de los problemas de reconocimiento de patrones, debido a que las matemáticas representan una fuente valiosa de información en muchos a ́reas de investigación. La escritura de expresiones matemáticas a mano es un medio de comunicación utilizado para la transmisión de información y conocimiento, con la cual se pueden generar de una manera sencilla escritos que contienen notación matemática. Este proceso puede volverse tedioso al ser escrito en lenguaje de composición tipográfica que pueda ser procesada por una computadora, tales como LATEX, MathML, entre otros. En los sistemas de reconocimiento de expresiones matem ́aticas existen dos m ́etodos diferentes a saber: fuera de l ́ınea y en l ́ınea. En esta tesis, se estudia el desempen ̃o de un sistema fuera de l ́ınea en donde se describen los pasos b ́asicos para lograr una mejor precisio ́n en el reconocimiento, las cuales esta ́n divididas en dos pasos principales: recono- cimiento de los s ́ımbolos de las ecuaciones matema ́ticas y el ana ́lisis de la estructura en que est ́an compuestos. Con el fin de convertir una expresi ́on matema ́tica escrita a mano en una expresio ́n equivalente en un sistema de procesador de texto, tal como TEX

Red Mexicana de Repositorios Institucionales

Repositorio Institucional de la Universidad Autónoma del Estado de México

Feature Extraction Methods for Character Recognition

Author: Yampolskiy Roman V
Publication venue: RIT Scholar Works
Publication date: 01/01/2004
Field of study

Not Include

RIT Scholar Works