309 research outputs found

    Apprentissage profond de formes manuscrites pour la reconnaissance et le repérage efficace de l'écriture dans les documents numérisés

    Get PDF
    Malgré les efforts importants de la communauté d’analyse de documents, définir une representation robuste pour les formes manuscrites demeure un défi de taille. Une telle representation ne peut pas être définie explicitement par un ensemble de règles, et doit plutôt être obtenue avec une extraction intelligente de caractéristiques de haut niveau à partir d’images de documents. Dans cette thèse, les modèles d’apprentissage profond sont investigués pour la representation automatique de formes manuscrites. Les représentations proposées par ces modèles sont utilisées pour définir un système de reconnaissance et de repérage de mots individuels dans les documents. Le choix de traiter les mots individuellement est motivé par le fait que n’importe quel texte peut être segmenté en un ensemble de mots séparés. Dans une première contribution, une représentation non supervisée profonde est proposée pour la tâche de repérage de mots manuscrits. Cette représentation se base sur l’algorithme de regroupement spherical k-means, qui est employé pour construire une hiérarchie de fonctions paramétriques encodant les images de documents. Les avantages de cette représentation sont multiples. Tout d’abord, elle est définie de manière non supervisée, ce qui évite la nécessité d’avoir des données annotées pour l’entraînement. Ensuite, elle se calcule rapidement et est de taille compacte, permettant ainsi de repérer des mots efficacement. Dans une deuxième contribution, un modèle de bout en bout est développé pour la reconnaissance de mots manuscrits. Ce modèle est composé d’un réseau de neurones convolutifs qui prend en entrée l’image d’un mot et produit en sortie une représentation du texte reconnu. Ce texte est représenté sous la forme d’un ensemble de sous-sequences bidirectionnelles de caractères formant une hiérarchie. Cette représentation se distingue des approches existantes dans la littérature et offre plusieurs avantages par rapport à celles-ci. Notamment, elle est binaire et a une taille fixe, ce qui la rend robuste à la taille du texte. Par ailleurs, elle capture la distribution des sous-séquences de caractères dans le corpus d’entraînement, et permet donc au modèle entraîné de transférer cette connaissance à de nouveaux mots contenant les memes sous-séquences. Dans une troisième et dernière contribution, un modèle de bout en bout est proposé pour résoudre simultanément les tâches de repérage et de reconnaissance. Ce modèle intègre conjointement les textes et les images de mots dans un seul espace vectoriel. Une image est projetée dans cet espace via un réseau de neurones convolutifs entraîné à détecter les différentes forms de caractères. De même, un mot est projeté dans cet espace via un réseau de neurones récurrents. Le modèle proposé est entraîné de manière à ce que l’image d’un mot et son texte soient projetés au même point. Dans l’espace vectoriel appris, les tâches de repérage et de reconnaissance peuvent être traitées efficacement comme un problème de recherche des plus proches voisins

    Advances in Image Processing, Analysis and Recognition Technology

    Get PDF
    For many decades, researchers have been trying to make computers’ analysis of images as effective as the system of human vision is. For this purpose, many algorithms and systems have previously been created. The whole process covers various stages, including image processing, representation and recognition. The results of this work can be applied to many computer-assisted areas of everyday life. They improve particular activities and provide handy tools, which are sometimes only for entertainment, but quite often, they significantly increase our safety. In fact, the practical implementation of image processing algorithms is particularly wide. Moreover, the rapid growth of computational complexity and computer efficiency has allowed for the development of more sophisticated and effective algorithms and tools. Although significant progress has been made so far, many issues still remain, resulting in the need for the development of novel approaches

    OTS: A One-shot Learning Approach for Text Spotting in Historical Manuscripts

    Full text link
    Historical manuscript processing poses challenges like limited annotated training data and novel class emergence. To address this, we propose a novel One-shot learning-based Text Spotting (OTS) approach that accurately and reliably spots novel characters with just one annotated support sample. Drawing inspiration from cognitive research, we introduce a spatial alignment module that finds, focuses on, and learns the most discriminative spatial regions in the query image based on one support image. Especially, since the low-resource spotting task often faces the problem of example imbalance, we propose a novel loss function called torus loss which can make the embedding space of distance metric more discriminative. Our approach is highly efficient and requires only a few training samples while exhibiting the remarkable ability to handle novel characters, and symbols. To enhance dataset diversity, a new manuscript dataset that contains the ancient Dongba hieroglyphics (DBH) is created. We conduct experiments on publicly available VML-HD, TKH, NC datasets, and the new proposed DBH dataset. The experimental results demonstrate that OTS outperforms the state-of-the-art methods in one-shot text spotting. Overall, our proposed method offers promising applications in the field of text spotting in historical manuscripts

    Handwritten Document Image Retrieval

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Arabic Manuscripts Analysis and Retrieval

    Get PDF

    Text-detection and -recognition from natural images

    Get PDF
    Text detection and recognition from images could have numerous functional applications for document analysis, such as assistance for visually impaired people; recognition of vehicle license plates; evaluation of articles containing tables, street signs, maps, and diagrams; keyword-based image exploration; document retrieval; recognition of parts within industrial automation; content-based extraction; object recognition; address block location; and text-based video indexing. This research exploited the advantages of artificial intelligence (AI) to detect and recognise text from natural images. Machine learning and deep learning were used to accomplish this task.In this research, we conducted an in-depth literature review on the current detection and recognition methods used by researchers to identify the existing challenges, wherein the differences in text resulting from disparity in alignment, style, size, and orientation combined with low image contrast and a complex background make automatic text extraction a considerably challenging and problematic task. Therefore, the state-of-the-art suggested approaches obtain low detection rates (often less than 80%) and recognition rates (often less than 60%). This has led to the development of new approaches. The aim of the study was to develop a robust text detection and recognition method from natural images with high accuracy and recall, which would be used as the target of the experiments. This method could detect all the text in the scene images, despite certain specific features associated with the text pattern. Furthermore, we aimed to find a solution to the two main problems concerning arbitrarily shaped text (horizontal, multi-oriented, and curved text) detection and recognition in a low-resolution scene and with various scales and of different sizes.In this research, we propose a methodology to handle the problem of text detection by using novel combination and selection features to deal with the classification algorithms of the text/non-text regions. The text-region candidates were extracted from the grey-scale images by using the MSER technique. A machine learning-based method was then applied to refine and validate the initial detection. The effectiveness of the features based on the aspect ratio, GLCM, LBP, and HOG descriptors was investigated. The text-region classifiers of MLP, SVM, and RF were trained using selections of these features and their combinations. The publicly available datasets ICDAR 2003 and ICDAR 2011 were used to evaluate the proposed method. This method achieved the state-of-the-art performance by using machine learning methodologies on both databases, and the improvements were significant in terms of Precision, Recall, and F-measure. The F-measure for ICDAR 2003 and ICDAR 2011 was 81% and 84%, respectively. The results showed that the use of a suitable feature combination and selection approach could significantly increase the accuracy of the algorithms.A new dataset has been proposed to fill the gap of character-level annotation and the availability of text in different orientations and of curved text. The proposed dataset was created particularly for deep learning methods which require a massive completed and varying range of training data. The proposed dataset includes 2,100 images annotated at the character and word levels to obtain 38,500 samples of English characters and 12,500 words. Furthermore, an augmentation tool has been proposed to support the proposed dataset. The missing of object detection augmentation tool encroach to proposed tool which has the ability to update the position of bounding boxes after applying transformations on images. This technique helps to increase the number of samples in the dataset and reduce the time of annotations where no annotation is required. The final part of the thesis presents a novel approach for text spotting, which is a new framework for an end-to-end character detection and recognition system designed using an improved SSD convolutional neural network, wherein layers are added to the SSD networks and the aspect ratio of the characters is considered because it is different from that of the other objects. Compared with the other methods considered, the proposed method could detect and recognise characters by training the end-to-end model completely. The performance of the proposed method was better on the proposed dataset; it was 90.34. Furthermore, the F-measure of the method’s accuracy on ICDAR 2015, ICDAR 2013, and SVT was 84.5, 91.9, and 54.8, respectively. On ICDAR13, the method achieved the second-best accuracy. The proposed method could spot text in arbitrarily shaped (horizontal, oriented, and curved) scene text.</div
    corecore