87 research outputs found

    Handwritten OCR for Indic Scripts: A Comprehensive Overview of Machine Learning and Deep Learning Techniques

    Get PDF
    The potential uses of cursive optical character recognition, commonly known as OCR, in a number of industries, particularly document digitization, archiving, even language preservation, have attracted a lot of interest lately. In the framework of optical character recognition (OCR), the goal of this research is to provide a thorough understanding of both cutting-edge methods and the unique difficulties presented by Indic scripts. A thorough literature search was conducted in order to conduct this study, during which time relevant publications, conference proceedings, and scientific files were looked for up to the year 2023. As a consequence of the inclusion criteria that were developed to concentrate on studies only addressing Handwritten OCR on Indic scripts, 53 research publications were chosen as the process's outcome. The review provides a thorough analysis of the methodology and approaches employed in the chosen study. Deep neural networks, conventional feature-based methods, machine learning techniques, and hybrid systems have all been investigated as viable answers to the problem of effectively deciphering Indian scripts, because they are famously challenging to write. To operate, these systems require pre-processing techniques, segmentation schemes, and language models. The outcomes of this methodical examination demonstrate that despite the fact that Hand Scanning for Indic script has advanced significantly, room still exists for advancement. Future research could focus on developing trustworthy models that can handle a range of writing styles and enhance accuracy using less-studied Indic scripts. This profession may advance with the creation of collected datasets and defined standards

    Adaptive combinations of classifiers with application to on-line handwritten character recognition

    Get PDF
    Classifier combining is an effective way of improving classification performance. User adaptation is clearly another valid approach for improving performance in a user-dependent system, and even though adaptation is usually performed on the classifier level, also adaptive committees can be very effective. Adaptive committees have the distinct ability of performing adaptation without detailed knowledge of the classifiers. Adaptation can therefore be used even with classification systems that intrinsically are not suited for adaptation, whether that be due to lack of access to the workings of the classifier or simply a classification scheme not suitable for continuous learning. This thesis proposes methods for adaptive combination of classifiers in the setting of on-line handwritten character recognition. The focal part of the work introduces adaptive classifier combination schemes, of which the two most prominent ones are the Dynamically Expanding Context (DEC) committee and the Class-Confidence Critic Combining (CCCC) committee. Both have been shown to be capable of successful adaptation to the user in the task of on-line handwritten character recognition. Particularly the highly modular CCCC framework has shown impressive performance also in a doubly-adaptive setting of combining adaptive classifiers by using an adaptive committee. In support of this main topic of the thesis, some discussion on a methodology for deducing correct character labeling from user actions is presented. Proper labeling is paramount for effective adaptation, and deducing the labels from the user's actions is necessary to perform adaptation transparently to the user. In that way, the user does not need to give explicit feedback on the correctness of the recognition results. Also, an overview is presented of adaptive classification methods for single-classifier adaptation in handwritten character recognition developed at the Laboratory of Computer and Information Science of the Helsinki University of Technology, CIS-HCR. Classifiers based on the CIS-HCR system have been used in the adaptive committee experiments as both member classifiers and to provide a reference level. Finally, two distinct approaches for improving the performance of committee classifiers further are discussed. Firstly, methods for committee rejection are presented and evaluated. Secondly, measures of classifier diversity for classifier selection, based on the concept of diversity of errors, are presented and evaluated. The topic of this thesis hence covers three important aspects of pattern recognition: on-line adaptation, combining classifiers, and a practical evaluation setting of handwritten character recognition. A novel approach combining these three core ideas has been developed and is presented in the introductory text and the included publications. To reiterate, the main contributions of this thesis are: 1) introduction of novel adaptive committee classification methods, 2) introduction of novel methods for measuring classifier diversity, 3) presentation of some methods for implementing committee rejection, 4) discussion and introduction of a method for effective label deduction from on-line user actions, and as a side-product, 5) an overview of the CIS-HCR adaptive on-line handwritten character recognition system.Luokittimien yhdistäminen komitealuokittimella on tehokas keino luokitustarkkuuden parantamiseen. Laskentatehon jatkuva kasvu tekee myös useiden luokittimien yhtäaikaisesta käytöstä yhä varteenotettavamman vaihtoehdon. Järjestelmän adaptoituminen (mukautuminen) käyttäjään on toinen hyvä keino käyttäjäriippumattoman järjestelmän tarkkuuden parantantamiseksi. Vaikka adaptaatio yleensä toteutetaan luokittimen tasolla, myös adaptiiviset komitealuokittimet voivat olla hyvin tehokkaita. Adaptiiviset komiteat voivat adaptoitua ilman yksityiskohtaista tietoa jäsenluokittimista. Adaptaatiota voidaan näin käyttää myös luokittelujärjestelmissä, jotka eivät ole itsessään sopivia adaptaatioon. Adaptaatioon sopimattomuus voi johtua esimerkiksi siitä, että luokittimen totetutusta ei voida muuttaa, tai siitä, että käytetään luokittelumenetelmää, joka ei sovellu jatkuvaan oppimiseen. Tämä väitöskirja käsittelee menetelmiä luokittimien adaptiiviseen yhdistämiseen käyttäen sovelluskohteena käsinkirjoitettujen merkkien on-line-tunnistusta. Keskeisin osa työtä esittelee uusia adaptiivisia luokittimien yhdistämismenetelmiä, joista kaksi huomattavinta ovat Dynamically Expanding Context (DEC) -komitea sekä Class-Confidence Critic Combining (CCCC) -komitea. Molemmat näistä ovat osoittautuneet kykeneviksi tehokkaaseen käyttäjä-adaptaatioon käsinkirjoitettujen merkkien on-line-tunnistuksessa. Erityisesti hyvin modulaarisella CCCC järjestelmällä on saatu hyviä tuloksia myös kaksinkertaisesti adaptiivisessa asetelmassa, jossa yhdistetään adaptiivisia jäsenluokittimia adaptiivisen komitean avulla. Väitöskirjan pääteeman tukena esitetään myös malli ja käytännön esimerkki siitä, miten käyttäjän toimista merkeille voidaan päätellä oikeat luokat. Merkkien todellisen luokan onnistunut päättely on elintärkeää tehokkaalle adaptaatiolle. Jotta adaptaatio voitaisiin suorittaa käyttäjälle läpinäkyvästi, merkkien todelliset luokat on kyettävä päättelemään käyttäjän toimista. Tällä tavalla käyttäjän ei tarvitse antaa suoraa palautetta tunnistustuloksen oikeellisuudesta. Työssä esitetään myös yleiskatsaus Teknillisen korkeakoulun Informaatiotekniikan laboratoriossa kehitettyyn adaptiiviseen käsinkirjoitettujen merkkien tunnistusjärjestelmään. Tähän järjestelmään perustuvia luokittimia on käytetty adaptiivisten komitealuokittimien kokeissa sekä jäsenluokittimina että vertailutasona. Lopuksi esitellään kaksi erillistä menetelmää komitealuokittimen tarkkuuden edelleen parantamiseksi. Näistä ensimmäinen on joukko menetelmiä komitealuokittimen rejektion (hylkäyksen) toteuttamiseksi. Toinen esiteltävä menetelmä on käyttää luokittimien erilaisuuden mittoja jäsenluokittimien valintaa varten. Ehdotetut uudet erilaisuusmitat perustuvat käsitteeseen, jota kutsumme virheiden erilaisuudeksi. Väitöskirjan aihe kattaa kolme hahmontunnistuksen tärkeää osa-aluetta: online-adaptaation, luokittimien yhdistämisen ja käytännön sovellusalana käsinkirjoitettujen merkkien tunnistuksen. Näistä kolmesta lähtökohdasta on kehitetty uudenlainen synteesi, joka esitetään johdantotekstissä sekä liitteenä olevissa julkaisuissa. Tämän väitöskirjan oleellisimmat kontribuutiot ovat siten: 1) uusien adaptiivisten komitealuokittimien esittely, 2) uudenlaisten menetelmien esittely luokittimien erilaisuuden mittaamiseksi, 3) joidenkin komitearejektiomenetelmien esittely, 4) pohdinnan ja erään toteutustavan esittely syötettyjen merkkien todellisen luokan päättelemiseksi käyttäjän toimista, sekä sivutuotteena 5) kattava yleiskatsaus CIS-HCR adaptiiviseen on-line käsinkirjoitettujen merkkien tunnistusjärjestelmään.reviewe

    Reliable pattern recognition system with novel semi-supervised learning approach

    Get PDF
    Over the past decade, there has been considerable progress in the design of statistical machine learning strategies, including Semi-Supervised Learning (SSL) approaches. However, researchers still have difficulties in applying most of these learning strategies when two or more classes overlap, and/or when each class has a bimodal/multimodal distribution. In this thesis, an efficient, robust, and reliable recognition system with a novel SSL scheme has been developed to overcome overlapping problems between two classes and bimodal distribution within each class. This system was based on the nature of category learning and recognition to enhance the system's performance in relevant applications. In the training procedure, besides the supervised learning strategy, the unsupervised learning approach was applied to retrieve the "extra information" that could not be obtained from the images themselves. This approach was very helpful for the classification between two confusing classes. In this SSL scheme, both the training data and the test data were utilized in the final classification. In this thesis, the design of a promising supervised learning model with advanced state-of-the-art technologies is firstly presented, and a novel rejection measurement for verification of rejected samples, namely Linear Discriminant Analysis Measurement (LDAM), is defined. Experiments on CENPARMI's Hindu-Arabic Handwritten Numeral Database, CENPARMI's Numerals Database, and NIST's Numerals Database were conducted in order to evaluate the efficiency of LDAM. Moreover, multiple verification modules, including a Writing Style Verification (WSV) module, have been developed according to four newly defined error categories. The error categorization was based on the different costs of misclassification. The WSV module has been developed by the unsupervised learning approach to automatically retrieve the person's writing styles so that the rejected samples can be classified and verified accordingly. As a result, errors on CENPARMI's Hindu-Arabic Handwritten Numeral Database (24,784 training samples, 6,199 testing samples) were reduced drastically from 397 to 59, and the final recognition rate of this HAHNR reached 99.05%, a significantly higher rate compared to other experiments on the same database. When the rejection option was applied on this database, the recognition rate, error rate, and reliability were 97.89%, 0.63%, and 99.28%, respectivel

    Neural Networks for Document Image and Text Processing

    Full text link
    Nowadays, the main libraries and document archives are investing a considerable effort on digitizing their collections. Indeed, most of them are scanning the documents and publishing the resulting images without their corresponding transcriptions. This seriously limits the document exploitation possibilities. When the transcription is necessary, it is manually performed by human experts, which is a very expensive and error-prone task. Obtaining transcriptions to the level of required quality demands the intervention of human experts to review and correct the resulting output of the recognition engines. To this end, it is extremely useful to provide interactive tools to obtain and edit the transcription. Although text recognition is the final goal, several previous steps (known as preprocessing) are necessary in order to get a fine transcription from a digitized image. Document cleaning, enhancement, and binarization (if they are needed) are the first stages of the recognition pipeline. Historical Handwritten Documents, in addition, show several degradations, stains, ink-trough and other artifacts. Therefore, more sophisticated and elaborate methods are required when dealing with these kind of documents, even expert supervision in some cases is needed. Once images have been cleaned, main zones of the image have to be detected: those that contain text and other parts such as images, decorations, versal letters. Moreover, the relations among them and the final text have to be detected. Those preprocessing steps are critical for the final performance of the system since an error at this point will be propagated during the rest of the transcription process. The ultimate goal of the Document Image Analysis pipeline is to receive the transcription of the text (Optical Character Recognition and Handwritten Text Recognition). During this thesis we aimed to improve the main stages of the recognition pipeline, from the scanned documents as input to the final transcription. We focused our effort on applying Neural Networks and deep learning techniques directly on the document images to extract suitable features that will be used by the different tasks dealt during the following work: Image Cleaning and Enhancement (Document Image Binarization), Layout Extraction, Text Line Extraction, Text Line Normalization and finally decoding (or text line recognition). As one can see, the following work focuses on small improvements through the several Document Image Analysis stages, but also deals with some of the real challenges: historical manuscripts and documents without clear layouts or very degraded documents. Neural Networks are a central topic for the whole work collected in this document. Different convolutional models have been applied for document image cleaning and enhancement. Connectionist models have been used, as well, for text line extraction: first, for detecting interest points and combining them in text segments and, finally, extracting the lines by means of aggregation techniques; and second, for pixel labeling to extract the main body area of the text and then the limits of the lines. For text line preprocessing, i.e., to normalize the text lines before recognizing them, similar models have been used to detect the main body area and then to height-normalize the images giving more importance to the central area of the text. Finally, Convolutional Neural Networks and deep multilayer perceptrons have been combined with hidden Markov models to improve our transcription engine significantly. The suitability of all these approaches has been tested with different corpora for any of the stages dealt, giving competitive results for most of the methodologies presented.Hoy en día, las principales librerías y archivos está invirtiendo un esfuerzo considerable en la digitalización de sus colecciones. De hecho, la mayoría están escaneando estos documentos y publicando únicamente las imágenes sin transcripciones, limitando seriamente la posibilidad de explotar estos documentos. Cuando la transcripción es necesaria, esta se realiza normalmente por expertos de forma manual, lo cual es una tarea costosa y propensa a errores. Si se utilizan sistemas de reconocimiento automático se necesita la intervención de expertos humanos para revisar y corregir la salida de estos motores de reconocimiento. Por ello, es extremadamente útil para proporcionar herramientas interactivas con el fin de generar y corregir la transcripciones. Aunque el reconocimiento de texto es el objetivo final del Análisis de Documentos, varios pasos previos (preprocesamiento) son necesarios para conseguir una buena transcripción a partir de una imagen digitalizada. La limpieza, mejora y binarización de las imágenes son las primeras etapas del proceso de reconocimiento. Además, los manuscritos históricos tienen una mayor dificultad en el preprocesamiento, puesto que pueden mostrar varios tipos de degradaciones, manchas, tinta a través del papel y demás dificultades. Por lo tanto, este tipo de documentos requiere métodos de preprocesamiento más sofisticados. En algunos casos, incluso, se precisa de la supervisión de expertos para garantizar buenos resultados en esta etapa. Una vez que las imágenes han sido limpiadas, las diferentes zonas de la imagen deben de ser localizadas: texto, gráficos, dibujos, decoraciones, letras versales, etc. Por otra parte, también es importante conocer las relaciones entre estas entidades. Estas etapas del pre-procesamiento son críticas para el rendimiento final del sistema, ya que los errores cometidos en aquí se propagarán al resto del proceso de transcripción. El objetivo principal del trabajo presentado en este documento es mejorar las principales etapas del proceso de reconocimiento completo: desde las imágenes escaneadas hasta la transcripción final. Nuestros esfuerzos se centran en aplicar técnicas de Redes Neuronales (ANNs) y aprendizaje profundo directamente sobre las imágenes de los documentos, con la intención de extraer características adecuadas para las diferentes tareas: Limpieza y Mejora de Documentos, Extracción de Líneas, Normalización de Líneas de Texto y, finalmente, transcripción del texto. Como se puede apreciar, el trabajo se centra en pequeñas mejoras en diferentes etapas del Análisis y Procesamiento de Documentos, pero también trata de abordar tareas más complejas: manuscritos históricos, o documentos que presentan degradaciones. Las ANNs y el aprendizaje profundo son uno de los temas centrales de esta tesis. Diferentes modelos neuronales convolucionales se han desarrollado para la limpieza y mejora de imágenes de documentos. También se han utilizado modelos conexionistas para la extracción de líneas: primero, para detectar puntos de interés y segmentos de texto y, agregarlos para extraer las líneas del documento; y en segundo lugar, etiquetando directamente los píxeles de la imagen para extraer la zona central del texto y así definir los límites de las líneas. Para el preproceso de las líneas de texto, es decir, la normalización del texto antes del reconocimiento final, se han utilizado modelos similares a los mencionados para detectar la zona central del texto. Las imagenes se rescalan a una altura fija dando más importancia a esta zona central. Por último, en cuanto a reconocimiento de escritura manuscrita, se han combinado técnicas de ANNs y aprendizaje profundo con Modelos Ocultos de Markov, mejorando significativamente los resultados obtenidos previamente por nuestro motor de reconocimiento. La idoneidad de todos estos enfoques han sido testeados con diferentes corpus en cada una de las tareas tratadas., obtenieAvui en dia, les principals llibreries i arxius històrics estan invertint un esforç considerable en la digitalització de les seues col·leccions de documents. De fet, la majoria estan escanejant aquests documents i publicant únicament les imatges sense les seues transcripcions, fet que limita seriosament la possibilitat d'explotació d'aquests documents. Quan la transcripció del text és necessària, normalment aquesta és realitzada per experts de forma manual, la qual cosa és una tasca costosa i pot provocar errors. Si s'utilitzen sistemes de reconeixement automàtic es necessita la intervenció d'experts humans per a revisar i corregir l'eixida d'aquests motors de reconeixement. Per aquest motiu, és extremadament útil proporcionar eines interactives amb la finalitat de generar i corregir les transcripcions generades pels motors de reconeixement. Tot i que el reconeixement del text és l'objectiu final de l'Anàlisi de Documents, diversos passos previs (coneguts com preprocessament) són necessaris per a l'obtenció de transcripcions acurades a partir d'imatges digitalitzades. La neteja, millora i binarització de les imatges (si calen) són les primeres etapes prèvies al reconeixement. A més a més, els manuscrits històrics presenten una major dificultat d'analisi i preprocessament, perquè poden mostrar diversos tipus de degradacions, taques, tinta a través del paper i altres peculiaritats. Per tant, aquest tipus de documents requereixen mètodes de preprocessament més sofisticats. En alguns casos, fins i tot, es precisa de la supervisió d'experts per a garantir bons resultats en aquesta etapa. Una vegada que les imatges han sigut netejades, les diferents zones de la imatge han de ser localitzades: text, gràfics, dibuixos, decoracions, versals, etc. D'altra banda, també és important conéixer les relacions entre aquestes entitats i el text que contenen. Aquestes etapes del preprocessament són crítiques per al rendiment final del sistema, ja que els errors comesos en aquest moment es propagaran a la resta del procés de transcripció. L'objectiu principal del treball que estem presentant és millorar les principals etapes del procés de reconeixement, és a dir, des de les imatges escanejades fins a l'obtenció final de la transcripció del text. Els nostres esforços se centren en aplicar tècniques de Xarxes Neuronals (ANNs) i aprenentatge profund directament sobre les imatges de documents, amb la intenció d'extraure característiques adequades per a les diferents tasques analitzades: neteja i millora de documents, extracció de línies, normalització de línies de text i, finalment, transcripció. Com es pot apreciar, el treball realitzat aplica xicotetes millores en diferents etapes de l'Anàlisi de Documents, però també tracta d'abordar tasques més complexes: manuscrits històrics, o documents que presenten degradacions. Les ANNs i l'aprenentatge profund són un dels temes centrals d'aquesta tesi. Diferents models neuronals convolucionals s'han desenvolupat per a la neteja i millora de les dels documents. També s'han utilitzat models connexionistes per a la tasca d'extracció de línies: primer, per a detectar punts d'interés i segments de text i, agregar-los per a extraure les línies del document; i en segon lloc, etiquetant directament els pixels de la imatge per a extraure la zona central del text i així definir els límits de les línies. Per al preprocés de les línies de text, és a dir, la normalització del text abans del reconeixement final, s'han utilitzat models similars als utilitzats per a l'extracció de línies. Finalment, quant al reconeixement d'escriptura manuscrita, s'han combinat tècniques de ANNs i aprenentatge profund amb Models Ocults de Markov, que han millorat significativament els resultats obtinguts prèviament pel nostre motor de reconeixement. La idoneïtat de tots aquests enfocaments han sigut testejats amb diferents corpus en cadascuna de les tasques tractadPastor Pellicer, J. (2017). Neural Networks for Document Image and Text Processing [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90443TESI

    Multi-classifier systems for off-line signature verification

    Get PDF
    Handwritten signatures are behavioural biometric traits that are known to incorporate a considerable amount of intra-class variability. The Hidden Markov Model (HMM) has been successfully employed in many off-line signature verification (SV) systems due to the sequential nature and variable size of the signature data. In particular, the left-to-right topology of HMMs is well adapted to the dynamic characteristics of occidental handwriting, in which the hand movements are always from left to right. As with most generative classifiers, HMMs require a considerable amount of training data to achieve a high level of generalization performance. Unfortunately, the number of signature samples available to train an off-line SV system is very limited in practice. Moreover, only random forgeries are employed to train the system, which must in turn to discriminate between genuine samples and random, simple and skilled forgeries during operations. These last two forgery types are not available during the training phase. The approaches proposed in this Thesis employ the concept of multi-classifier systems (MCS) based on HMMs to learn signatures at several levels of perception. By extracting a high number of features, a pool of diversified classifiers can be generated using random subspaces, which overcomes the problem of having a limited amount of training data. Based on the multi-hypotheses principle, a new approach for combining classifiers in the ROC space is proposed. A technique to repair concavities in ROC curves allows for overcoming the problem of having a limited amount of genuine samples, and, especially, for evaluating performance of biometric systems more accurately. A second important contribution is the proposal of a hybrid generative-discriminative classification architecture. The use of HMMs as feature extractors in the generative stage followed by Support Vector Machines (SVMs) as classifiers in the discriminative stage allows for a better design not only of the genuine class, but also of the impostor class. Moreover, this approach provides a more robust learning than a traditional HMM-based approach when a limited amount of training data is available. The last contribution of this Thesis is the proposal of two new strategies for the dynamic selection (DS) of ensemble of classifiers. Experiments performed with the PUCPR and GPDS signature databases indicate that the proposed DS strategies achieve a higher level of performance in off-line SV than other reference DS and static selection (SS) strategies from literature

    Multi-feature approach for writer-independent offline signature verification

    Get PDF
    Some of the fundamental problems facing handwritten signature verification are the large number of users, the large number of features, the limited number of reference signatures for training, the high intra-personal variability of the signatures and the unavailability of forgeries as counterexamples. This research first presents a survey of offline signature verification techniques, focusing on the feature extraction and verification strategies. The goal is to present the most important advances, as well as the current challenges in this field. Of particular interest are the techniques that allow for designing a signature verification system based on a limited amount of data. Next is presented a novel offline signature verification system based on multiple feature extraction techniques, dichotomy transformation and boosting feature selection. Using multiple feature extraction techniques increases the diversity of information extracted from the signature, thereby producing features that mitigate intra-personal variability, while dichotomy transformation ensures writer-independent classification, thus relieving the verification system from the burden of a large number of users. Finally, using boosting feature selection allows for a low cost writer-independent verification system that selects features while learning. As such, the proposed system provides a practical framework to explore and learn from problems with numerous potential features. Comparison of simulation results from systems found in literature confirms the viability of the proposed system, even when only a single reference signature is available. The proposed system provides an efficient solution to a wide range problems (eg. biometric authentication) with limited training samples, new training samples emerging during operations, numerous classes, and few or no counterexamples
    corecore