9 research outputs found

    Optimizing the neural network training for OCR error correction of historical Hebrew texts

    Get PDF
    Over the past few decades, large archives of paper-based documents such as books and newspapers have been digitized using Optical Character Recognition. This technology is error-prone, especially for historical documents. To correct OCR errors, post-processing algorithms have been proposed based on natural language analysis and machine learning techniques such as neural networks. Neural network's disadvantage is the vast amount of manually labeled data required for training, which is often unavailable. This paper proposes an innovative method for training a light-weight neural network for Hebrew OCR post-correction using significantly less manually created data. The main research goal is to develop a method for automatically generating language and task-specific training data to improve the neural network results for OCR post-correction, and to investigate which type of dataset is the most effective for OCR post-correction of historical documents. To this end, a series of experiments using several datasets was conducted. The evaluation corpus was based on Hebrew newspapers from the JPress project. An analysis of historical OCRed newspapers was done to learn common language and corpus-specific OCR errors. We found that training the network using the proposed method is more effective than using randomly generated errors. The results also show that the performance of the neural net-work for OCR post-correction strongly depends on the genre and area of the training data. Moreover, neural networks that were trained with the proposed method outperform other state-of-the-art neural networks for OCR post-correction and complex spellcheckers. These results may have practical implications for many digital humanities projects

    Aprendizado automático utilizando um modelo LSTM aplicado como auxiliar no controle de orientação e velocidade de robô móvel

    Get PDF
    Trabalho de conclusão de curso (graduação)—Universidade de Brasília, Faculdade de Tecnologia, Curso de Graduação em Engenharia de Controle e Automação, 2019.Nos últimos anos, aconteceu uma grande popularização da robótica. Muitos robôs, autônomos ou teleoperados, são usados diariamente em operações repetitivas ou que ofereçam risco aos seres humanos. Entretanto, uma desvantagem dos robôs teleoperados com relação aos robôs autônomos é a necessidade de capacitação de um operador. Assim como houve uma popularização da robótica, recentemente também ocorreu uma grande popularização das redes neurais profundas, abrindo-se assim um espaço para pesquisas em inteligência artificial aplicada à robótica. Sabendo que as redes LSTM têm boa capacidade em aprender sequências e sabendo também que perfil de pilotagem de um operador pode ser visto como uma sequência temporal de comandos, este trabalho propõe o treinamento de uma LSTM profunda a fim de criar um módulo de auxílio de direção a partir de dados de pilotagem coletados de um operador experiente. Mais especificamente, este trabalho tem três objetivos. O primeiro objetivo é criar uma base de dados com dados de pilotagem de um operador experiente, considerada como sendo a pilotagem ideal, para o robô Pioneer 3-AT. O segundo objetivo é propor, treinar e validar arquiteturas de LSTM profunda que consigam aprender os padrões da pilotagem ideal. Por fim, o terceiro objetivo é propor e validar um algoritmo que faça correções em tempo real na pilotagem de um usuário que nunca pilotou o Pioneer. Depois de vários experimentos, construiu-se uma base de dados composta de dados de odometria e dos comandos de velocidade de um operador experiente com o robô, e as arquiteturas propostas foram treinadas e validadas. Isso por sua vez mostrou que uma LSTM profunda consegue aprender os padrões da pilotagem ideal. Os melhores modelos obtidos foram então testados no algoritmo de correção, que consistiu em escolher entre o comando do usuário e o comando da rede com base na diferença entre os dois. Com isto, o algoritmo foi validado em testes e entrevistas com usuários sem experiência de pilotagem do robô. Destas entrevistas e do acompanhamento dos testes, pôde-se verificar que as correções feitas pelo algoritmo impuseram movimentos mais suaves aos usuários, ainda que algumas pessoas não se sentissem confortáveis com as correções impostas.Over the last years, there was a massive popularization of robotics. Many robots, autonomous or teleoperated, are used daily for tasks too dangerous or too repetitive for humans. However, one disadvantage of teleoperated robots against autonomous robots is the need to train an operator. Just as there was a popularization of robotics, in recent years there was algo a massive popularization of deep neural networks, opening the way for research in artificial intelligence applied to robotics. Knowing that LSTM networks have the potential to learn sequences and also knowing that a pilot’s driving profile can be seen as a temporal sequence of commands, this thesis proposes the use of a deep LSTM network in order to create a steering assistance module from data collected from an experienced pilot. More specifically, this thesis has three obcjetives. The first objective is to create a database composed of driving data from an experienced pilot, considered as the ideal driving, for the Pioneer 3-AT robot. The second obcjetive is to propose, train and validate deep LSTM networks that can learn the patterns of the ideal driving. Lastly, the third objective is to propose and validate an algorithm that corrects the driving of an inexperienced user in real time. After many experiments, a database was constructed, composed of data from the robots’ odometry and the experienced operator’s commands, and the proposed architectures were trained and validated. This in turn showed that a deep LSTM network can learn the patterns of the ideal driving. The best models obtained were then tested on the real time correction algorithm, which consists of choosing between the pilot’s command and the network’s suggestion based on the difference of the two. Thus, the algorithm was validated on tests and interviews with people inexperienced in driving the robot. From these interviews and test follow-up, is was verified that the corrections made by the algorithm imposed smoother movements on the users’ driving, although some people did not feel comfortable with the imposed corrections

    Neural Networks for Text Correction and Completion in Keyboard Decoding

    No full text
    Despite the ubiquity of mobile and wearable text messaging applications, the problem of keyboard text decoding is not tackled sufficiently in the light of the enormous success of the deep learning Recurrent Neural Network (RNN) and Convolutional Neural Networks (CNN) for natural language understanding. In particular, considering that the keyboard decoders should operate on devices with memory and processor resource constraints, makes it challenging to deploy industrial scale deep neural network (DNN) models. This paper proposes a sequence-to-sequence neural attention network system for automatic text correction and completion. Given an erroneous sequence, our model encodes character level hidden representations and then decodes the revised sequence thus enabling auto-correction and completion. We achieve this by a combination of character level CNN and gated recurrent unit (GRU) encoder along with and a word level gated recurrent unit (GRU) attention decoder. Unlike traditional language models that learn from billions of words, our corpus size is only 12 million words; an order of magnitude smaller. The memory footprint of our learnt model for inference and prediction is also an order of magnitude smaller than the conventional language model based text decoders. We report baseline performance for neural keyboard decoders in such limited domain. Our models achieve a word level accuracy of 90%90\% and a character error rate CER of 2.4%2.4\% over the Twitter typo dataset. We present a novel dataset of noisy to corrected mappings by inducing the noise distribution from the Twitter data over the OpenSubtitles 2009 dataset; on which our model predicts with a word level accuracy of 98%98\% and sequence accuracy of 68.9%68.9\%. In our user study, our model achieved an average CER of 2.6%2.6\% with the state-of-the-art non-neural touch-screen keyboard decoder at CER of 1.6%1.6\%
    corecore