1,271 research outputs found
Recognizing Degraded Handwritten Characters
In this paper, Slavonic manuscripts from the 11th
century written in Glagolitic script are
investigated. State-of-the-art optical character recognition methods produce poor results
for degraded handwritten document images. This is largely due to a lack of suitable
results from basic pre-processing steps such as binarization and image segmentation.
Therefore, a new, binarization-free approach will be presented that is independent of
pre-processing deficiencies. It additionally incorporates local information in order to
recognize also fragmented or faded characters. The proposed algorithm consists of
two steps: character classification and character localization. Firstly scale invariant
feature transform features are extracted and classified using support vector machines.
On this basis interest points are clustered according to their spatial information. Then,
characters are localized and eventually recognized by a weighted voting scheme of
pre-classified local descriptors. Preliminary results show that the proposed system can
handle highly degraded manuscript images with background noise, e.g. stains, tears,
and faded characters
Computation and Palaeography: Potentials and Limits
This manifesto documents the program and outcomes of Dagstuhl Seminar 12382 ‘Perspectives Workshop: Computation and Palaeography: Potentials and Limits’. The workshop focused on the interaction of palaeography, the study of ancient and me- dieval documents, with computerised tools, particularly those developed for analysis of digital images and text mining. The goal of this marriage of disciplines is to provide e cient solutions to time and labor consuming palaeographic tasks. It furthermore attempts to provide scholars with quantitative evidence to palaeographical arguments, consequently facilitating a better understanding of our cultural heritage through the unique perspective of ancient and medieval documents. The workshop provided a vital opportunity for palaeographers to interact and discuss the potential of digital methods with computer scientists specialising in machine vision and statistical data analysis. This was essential not only in suggesting new directions and ideas for improving palaeographic research, but also in identifying questions which scholars working individually, in their respective elds, would not have asked without directly communicating with colleagues from outside their research community
Détection de motifs graphiques dans des images de documents anciens
International audienceLa détection de motifs graphiques consiste à rechercher dans une collection d'images de documents, les occurences les plus similaires à une requête image. Dans cet article, nous proposons un système non supervisé pour la détection de motifs, sans besoin de segmentation préalable, en nous inspirant de techniques récentes en vision par ordinateur. Notre approche s'appuie sur une décomposition des documents en fenêtres de tailles variées et une description de ces fenêtres par sac de mots visuels, le tout hors-ligne afin de diminuer le temps de calcul. Une technique de compression des données, proposée tout récemment en recherche d'images, permet de maintenir une quantité de mémoire raisonnable, mais nécessite d'approximer le calcul de distance à la requête. De premiers résultats encourageants sont obtenus sur la base de documents DocExplore, une base de documents médiévaux. Abstract-Pattern spotting consists of retrieving the most similar graphical patterns from a collection of document images. Inspired by the recent advances in computer vision and word spotting techniques, we propose in this paper an unsupervised, segmentation-free pattern spotting system. Overall, the system includes a powerful patch-based framework, the bag of visual word model with an offline sliding window mechanism to avoid heavy computational burden during the retrieval process. Our system takes advantage of the most recent powerful compression and distance approximation techniques (product quantization and asymmetric distance computation) to efficiently index the great number of sub-windows produced by sliding windows and allows to retrieve small sized queries in a large indexed corpus
- …