2,661 research outputs found

    Handling out-of-vocabulary words and recognition errors based on word linguistic context for handwritten sentence recognition

    No full text
    International audienceIn this paper we investigate the use of linguistic information given by language models to deal with word recognition errors on handwritten sentences. We focus especially on errors due to out-of-vocabulary (OOV) words. First, word posterior probabilities are computed and used to detect error hypotheses on output sentences. An SVM classifier allows these errors to be categorized according to defined types. Then, a post-processing step is performed using a language model based on Part-of-Speech (POS) tags which is combined to the n-gram model previously used. Thus, error hypotheses can be further recognized and POS tags can be assigned to the OOV words. Experiments on on-line handwritten sentences show that the proposed approach allows a significant reduction of the word error rate

    Emotional Storyteller for Vision Impaired and Hearing-Impaired Children

    Get PDF
    Tellie is an innovative mobile app designed to offer an immersive and emotionally enriched storytelling experience for children who are visually and hearing impaired. It achieves this through four main objectives: Text extraction utilizes the CRAFT model and a combination of Convolutional Neural Networks (CNNs), Connectionist Temporal Classification (CTC), and Long Short-Term Memory (LSTM) networks to accurately extract and recognize text from images in storybooks. Recognition of Emotions in Sentences employs BERT to detect and distinguish emotions at the sentence level including happiness, anger, sadness, and surprise. Conversion of Text to Human Natural Audio with Emotion transforms text into emotionally expressive audio using Tacotron2 and Wave Glow, enhancing the synthesized speech with emotional styles to create engaging audio narratives. Conversion of Text to Sign Language: To cater to the Deaf and hard-of-hearing community, Tellie translates text into sign language using CNNs, ensuring alignment with real sign language expressions. These objectives combine to create Tellie, a groundbreaking app that empowers visually and hearing-impaired children with access to captivating storytelling experiences, promoting accessibility and inclusivity through the harmonious integration of language, creativity, and technology. This research demonstrates the potential of advanced technologies in fostering inclusive and emotionally engaging storytelling for all children

    On the Use of Neural Text Generation for the Task of Optical Character Recognition

    Get PDF
    Optical Character Recognition (OCR), is extraction of textual data from scanned text documents to facilitate their indexing, searching, editing and to reduce storage space. Although OCR systems have improved significantly in recent years, they still suffer in situations where the OCR output does not match the text in the original document. Deep learning models have contributed positively to many problems but their full potential to many other problems are yet to be explored. In this paper we propose a post-processing approach based on the application deep learning to improve the accuracy of OCR system (minimizing the error rate).We report on the use of neural network language models to accomplish the task of correcting incorrectly predicted characters/words by OCR systems. We applied our approach to the IAM handwriting database. Our proposed approach delivers significant accuracy improvement of 20:41% in F-score, 10:86% in character level comparison using Levenshtein distance and 20:69% in document level comparison over previously reported context based OCR empirical results of IAM handwriting database

    Automatic Scaling of Text for Training Second Language Reading Comprehension

    Get PDF
    For children learning their first language, reading is one of the most effective ways to acquire new vocabulary. Studies link students who read more with larger and more complex vocabularies. For second language learners, there is a substantial barrier to reading. Even the books written for early first language readers assume a base vocabulary of nearly 7000 word families and a nuanced understanding of grammar. This project will look at ways that technology can help second language learners overcome this high barrier to entry, and the effectiveness of learning through reading for adults acquiring a foreign language. Through the implementation of Dokusha, an automatic graded reader generator for Japanese, this project will explore how advancements in natural language processing can be used to automatically simplify text for extensive reading in Japanese as a foreign language

    A limited-size ensemble of homogeneous CNN/LSTMs for high-performance word classification

    Get PDF
    The strength of long short-term memory neural networks (LSTMs) that have been applied is more located in handling sequences of variable length than in handling geometric variability of the image patterns. In this paper, an end-to-end convolutional LSTM neural network is used to handle both geometric variation and sequence variability. The best results for LSTMs are often based on large-scale training of an ensemble of network instances. We show that high performances can be reached on a common benchmark set by using proper data augmentation for just five such networks using a proper coding scheme and a proper voting scheme. The networks have similar architectures (convolutional neural network (CNN): five layers, bidirectional LSTM (BiLSTM): three layers followed by a connectionist temporal classification (CTC) processing step). The approach assumes differently scaled input images and different feature map sizes. Three datasets are used: the standard benchmark RIMES dataset (French); a historical handwritten dataset KdK (Dutch); the standard benchmark George Washington (GW) dataset (English). Final performance obtained for the word-recognition test of RIMES was 96.6%, a clear improvement over other state-of-the-art approaches which did not use a pre-trained network. On the KdK and GW datasets, our approach also shows good results. The proposed approach is deployed in the Monk search engine for historical-handwriting collections
    • …
    corecore