Search CORE

2 research outputs found

A hypothesize-and-verify framework for Text Recognition using Deep Recurrent Neural Networks

Author: Chaudhury Santanu
Rajeswar Sai
Ray Anupama
Publication venue
Publication date: 26/02/2015
Field of study

Deep LSTM is an ideal candidate for text recognition. However text recognition involves some initial image processing steps like segmentation of lines and words which can induce error to the recognition system. Without segmentation, learning very long range context is difficult and becomes computationally intractable. Therefore, alternative soft decisions are needed at the pre-processing level. This paper proposes a hybrid text recognizer using a deep recurrent neural network with multiple layers of abstraction and long range context along with a language model to verify the performance of the deep neural network. In this paper we construct a multi-hypotheses tree architecture with candidate segments of line sequences from different segmentation algorithms at its different branches. The deep neural network is trained on perfectly segmented data and tests each of the candidate segments, generating unicode sequences. In the verification step, these unicode sequences are validated using a sub-string match with the language model and best first search is used to find the best possible combination of alternative hypothesis from the tree structure. Thus the verification framework using language models eliminates wrong segmentation outputs and filters recognition errors

arXiv.org e-Print Archive

Crossref

The use of new technologies to access to handwritten historical information in digital form. Galeón Project

Author: Alonso Villalobos Carlos
Márquez Carmona Lourdes
Pastor Gadea Moisés
Vidal Enrique
Publication venue: Ministerio de Educación, Cultura y Deporte. Secretaría General Técnica (España)
Publication date: 01/01/2014
Field of study

Español: La investigación histórica en archivos obliga a realizar un amplio trabajo de revisión de miles de documentos que, en muchos casos, no tienen relación con el tema de estudio, generando un importante gasto en tiempo y recursos. Para dar respuesta a este problema en relación al estudio del patrimonio arqueológico subacuático, desde el CAS-IAPH se ha ideado el Proyecto Galeón, cuyo objetivo es desarrollar soluciones innovadoras para consultar grandes conjuntos digitalizados de documentos históricos manuscritos. Actualmente no es posible la transcripción automatizada de un gran volumen de imágenes de documentos manuscritos, pero el desarrollo tecnológico en el campo del reconocimiento formal de palabras, puede simplificar este proceso. Para ello se ha ideado un modelo teórico de Búsqueda de Palabras Claves (BPC) basado en Grafos de Palabras (GP), que, además de para el patrimonio cultural marítimo, podría utilizarse para otros temas de investigación. Inglés: Historical research in archives forces to realize an extensive work of reviewing thousands of documents that, in many cases, have no connection with the subject matter, generating a significant expenditure of time and resources. To address this problem in relation to the study of underwater archaeological heritage, from the CAS-IAPH has been devised the Galleon Project, which aims to develop innovative solutions to query large sets of historical documents digitized manuscripts. Nowadays It is not possible the automated transcription of a large volume of images from handwritten documents, but the development in the field of formal recognition of words, can simplify this process. For this we have developed a theoretical model to identify Keywords based on Graphs of Words (GP), which, as well as in the maritime cultural heritage, could be used for any research topic

Activos Digitales IAPH