Search CORE

4 research outputs found

Apprentissage progressif pour la reconnaissance de symboles dans les documents graphiques

Author: Barrat Sabine
Tabbone Salvatore
Publication venue: HAL CCSD
Publication date: 25/01/2006
Field of study

National audienceLes méthodes actuelles de reconnaissance de symboles donnent de bons résultats quand il s'agit de reconnaître peu de symboles différents qui sont peu bruités et souvent déconnectés du graphique. Cependant, dans le cas d'applications réelles, les méthodes sont encore mal maîtrisées quand il s'agit de discriminer dans de grandes bases entre plusieurs centaines de symboles différents, souvent complexes et bruités et encapsulés dans les couches graphiques. Dans ce contexte il est nécessaire de mettre en oeuvre des méthodes d'apprentissage. Nous présentons dans cet article une méthode d'apprentissage progressif pour la reconnaissance de symboles qui améliore son propre taux de reconnaissance au fur et à mesure que de nouveaux symboles sont reconnus dans les documents. Pour ce faire, nous proposons une nouvelle exploitation de l'analyse discriminante qui fournit des règles d'affectation à partir d'un échantillon d'apprentissage sur lequel les appartenances aux classes sont connues (apprentissage supervisé). Mais cette méthode ne se révèle efficace que si l'échantillon d'apprentissage et les données ultérieures sont observés dans les mêmes conditions. Or cette hypothèse est rarement vérifiée dans les conditions réelles. Pour pallier ce problème, nous avons adapté une approche récente d'analyse discriminante conditionnelle qui ajoute à chaque observation l'observation d'un vecteur aléatoire, représentatif des effets parasites observés dans l'analyse discriminante classique

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Symbol Recognition: Current Advances and Perspectives

Author: Enric Martí
Ernest Valveny
Gemma Sánchez
Josep Lladós
Publication venue
Publication date: 01/01/2002
Field of study

Abstract. The recognition of symbols in graphic documents is an intensive research activity in the community of pattern recognition and document analysis. A key issue in the interpretation of maps, engineering drawings, diagrams, etc. is the recognition of domain dependent symbols according to a symbol database. In this work we first review the most outstanding symbol recognition methods from two different points of view: application domains and pattern recognition methods. In the second part of the paper, open and unaddressed problems involved in symbol recognition are described, analyzing their current state of art and discussing future research challenges. Thus, issues such as symbol representation, matching, segmentation, learning, scalability of recognition methods and performance evaluation are addressed in this work. Finally, we discuss the perspectives of symbol recognition concerning to new paradigms such as user interfaces in handheld computers or document database and WWW indexing by graphical content

CiteSeerX

Information Preserving Processing of Noisy Handwritten Document Images

Author: Chen Jin
Publication venue: Lehigh Preserve
Publication date
Field of study

Many pre-processing techniques that normalize artifacts and clean noise induce anomalies due to discretization of the document image. Important information that could be used at later stages may be lost. A proposed composite-model framework takes into account pre-printed information, user-added data, and digitization characteristics. Its benefits are demonstrated by experiments with statistically significant results. Separating pre-printed ruling lines from user-added handwriting shows how ruling lines impact people\u27s handwriting and how they can be exploited for identifying writers. Ruling line detection based on multi-line linear regression reduces the mean error of counting them from 0.10 to 0.03, 6.70 to 0.06, and 0.13 to 0.02, com- pared to an HMM-based approach on three standard test datasets, thereby reducing human correction time by 50%, 83%, and 72% on average. On 61 page images from 16 rule-form templates, the precision and recall of form cell recognition are increased by 2.7% and 3.7%, compared to a cross-matrix approach. Compensating for and exploiting ruling lines during feature extraction rather than pre-processing raises the writer identification accuracy from 61.2% to 67.7% on a 61-writer noisy Arabic dataset. Similarly, counteracting page-wise skew by subtracting it or transforming contours in a continuous coordinate system during feature extraction improves the writer identification accuracy. An implementation study of contour-hinge features reveals that utilizing the full probabilistic probability distribution function matrix improves the writer identification accuracy from 74.9% to 79.5%

Lehigh University: Lehigh Preserve