4,613 research outputs found

    Unconstrained Scene Text and Video Text Recognition for Arabic Script

    Full text link
    Building robust recognizers for Arabic has always been challenging. We demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid architecture in recognizing Arabic text in videos and natural scenes. We outperform previous state-of-the-art on two publicly available video text datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a new Arabic scene text dataset and establish baseline results. For scripts like Arabic, a major challenge in developing robust recognizers is the lack of large quantity of annotated data. We overcome this by synthesising millions of Arabic text images from a large vocabulary of Arabic words and phrases. Our implementation is built on top of the model introduced here [37] which is proven quite effective for English scene text recognition. The model follows a segmentation-free, sequence to sequence transcription approach. The network transcribes a sequence of convolutional features from the input image to a sequence of target labels. This does away with the need for segmenting input image into constituent characters/glyphs, which is often difficult for Arabic script. Further, the ability of RNNs to model contextual dependencies yields superior recognition results.Comment: 5 page

    Diagonal Based Feature Extraction for Handwritten Alphabets Recognition System using Neural Network

    Full text link
    An off-line handwritten alphabetical character recognition system using multilayer feed forward neural network is described in the paper. A new method, called, diagonal based feature extraction is introduced for extracting the features of the handwritten alphabets. Fifty data sets, each containing 26 alphabets written by various people, are used for training the neural network and 570 different handwritten alphabetical characters are used for testing. The proposed recognition system performs quite well yielding higher levels of recognition accuracy compared to the systems employing the conventional horizontal and vertical methods of feature extraction. This system will be suitable for converting handwritten documents into structural text form and recognizing handwritten names

    Learning to Read by Spelling: Towards Unsupervised Text Recognition

    Full text link
    This work presents a method for visual text recognition without using any paired supervisory data. We formulate the text recognition task as one of aligning the conditional distribution of strings predicted from given text images, with lexically valid strings sampled from target corpora. This enables fully automated, and unsupervised learning from just line-level text-images, and unpaired text-string samples, obviating the need for large aligned datasets. We present detailed analysis for various aspects of the proposed method, namely - (1) impact of the length of training sequences on convergence, (2) relation between character frequencies and the order in which they are learnt, (3) generalisation ability of our recognition network to inputs of arbitrary lengths, and (4) impact of varying the text corpus on recognition accuracy. Finally, we demonstrate excellent text recognition accuracy on both synthetically generated text images, and scanned images of real printed books, using no labelled training examples

    κΈ°κΈ° μƒμ—μ„œμ˜ 심측 신경망 κ°œμΈν™” 방법

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (석사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 컴퓨터곡학뢀, 2019. 2. Egger, Bernhard.There exist several deep neural network (DNN) architectures suitable for embedded inference, however little work has focused on training neural networks on-device. User customization of DNNs is desirable due to the difficulty of collecting a training set representative of real world scenarios. Additionally, inter-user variation means that a general model has a limitation on its achievable accuracy. In this thesis, a DNN architecture that allows for low power on-device user customization is proposed. This approach is applied to handwritten character recognition of both the Latin and the Korean alphabets. Experiments show a 3.5-fold reduction of the prediction error after user customization for both alphabets compared to a DNN trained with general data. This architecture is additionally evaluated using a number of embedded processors demonstrating its practical application.λ‚΄μž₯ν˜• κΈ°κΈ°μ—μ„œ 심측 신경망을 μΆ”λ‘ ν•  수 μžˆλŠ” μ•„ν‚€ν…μ²˜λ“€μ€ μ‘΄μž¬ν•˜μ§€λ§Œ λ‚΄μž₯ν˜• κΈ°κΈ°μ—μ„œ 신경망을 ν•™μŠ΅ν•˜λŠ” μ—°κ΅¬λŠ” λ³„λ‘œ 이뀄지지 μ•Šμ•˜λ‹€. μ‹€μ œ ν™˜κ²½μ„ λ°˜μ˜ν•˜λŠ” ν•™μŠ΅μš© 데이터 집합을 λͺ¨μœΌλŠ” 것이 μ–΄λ ΅κ³  μ‚¬μš©μžκ°„μ˜ λ‹€μ–‘μ„±μœΌλ‘œ 인해 일반적으둜 ν•™μŠ΅λœ λͺ¨λΈμ΄ μΆ©λΆ„ν•œ 정확도λ₯Ό 가지기엔 ν•œκ³„κ°€ μ‘΄μž¬ν•˜κΈ° λ•Œλ¬Έμ— μ‚¬μš©μž λ§žμΆ€ν˜• 심측 신경망이 ν•„μš”ν•˜λ‹€. 이 λ…Όλ¬Έμ—μ„œλŠ” κΈ°κΈ°μƒμ—μ„œ μ €μ „λ ₯으둜 μ‚¬μš©μž λ§žμΆ€ν™”κ°€ κ°€λŠ₯ν•œ 심측 신경망 μ•„ν‚€ν…μ²˜λ₯Ό μ œμ•ˆν•œλ‹€. μ΄λŸ¬ν•œ μ ‘κ·Ό 방법은 라틴어와 ν•œκΈ€μ˜ 필기체 κΈ€μž 인식에 μ μš©λœλ‹€. 라틴어와 ν•œκΈ€μ— μ‚¬μš©μž λ§žμΆ€ν™”λ₯Ό μ μš©ν•˜μ—¬ 일반적인 λ°μ΄ν„°λ‘œ ν•™μŠ΅ν•œ 심측 신경망보닀 3.5λ°°λ‚˜ μž‘μ€ 예츑 였λ₯˜μ˜ κ²°κ³Όλ₯Ό μ–»μ—ˆλ‹€. λ˜ν•œ 이 μ•„ν‚€ν…μ²˜μ˜ μ‹€μš©μ„±μ„ 보여주기 μœ„ν•˜μ—¬ λ‹€μ–‘ν•œ λ‚΄μž₯ν˜• ν”„λ‘œμ„Έμ„œμ—μ„œ μ‹€ν—˜μ„ μ§„ν–‰ν•˜μ˜€λ‹€.Abstract i Contents iii List of Figures vii List of Tables ix Chapter 1 Introduction 1 Chapter 2 Motivation 4 Chapter 3 Background 6 3.1 Deep Neural Networks 6 3.1.1 Inference 6 3.1.2 Training 7 3.2 Convolutional Neural Networks 8 3.3 On-Device Acceleration 9 3.3.1 Hardware Accelerators 9 3.3.2 Software Optimization 10 Chapter 4 Methodology 12 4.1 Initialization 13 4.2 On-Device Training 14 Chapter 5 Implementation 16 5.1 Pre-processing 16 5.2 Latin Handwritten Character Recognition 17 5.2.1 Dataset and BIE Selection 17 5.2.2 AE Design 17 5.3 Korean Handwritten Character Recognition 21 5.3.1 Dataset and BIE Selection 21 5.3.2 AE Design 21 Chapter 6 On-Device Acceleration 26 6.1 Architecure Optimizations 27 6.2 Compiler Optimizations 29 Chapter 7 Experimental Setup 30 Chapter 8 Evaluation 33 8.1 Latin Handwritten Character Recognition 33 8.2 Korean Handwritten Character Recognition 38 8.3 On-Device Acceleration 40 Chapter 9 Related Work 44 Chapter 10 Conclusion 47 Bibliography 47 μš”μ•½ 55 Acknowledgements 56Maste

    Adaptive Algorithms for Automated Processing of Document Images

    Get PDF
    Large scale document digitization projects continue to motivate interesting document understanding technologies such as script and language identification, page classification, segmentation and enhancement. Typically, however, solutions are still limited to narrow domains or regular formats such as books, forms, articles or letters and operate best on clean documents scanned in a controlled environment. More general collections of heterogeneous documents challenge the basic assumptions of state-of-the-art technology regarding quality, script, content and layout. Our work explores the use of adaptive algorithms for the automated analysis of noisy and complex document collections. We first propose, implement and evaluate an adaptive clutter detection and removal technique for complex binary documents. Our distance transform based technique aims to remove irregular and independent unwanted foreground content while leaving text content untouched. The novelty of this approach is in its determination of best approximation to clutter-content boundary with text like structures. Second, we describe a page segmentation technique called Voronoi++ for complex layouts which builds upon the state-of-the-art method proposed by Kise [Kise1999]. Our approach does not assume structured text zones and is designed to handle multi-lingual text in both handwritten and printed form. Voronoi++ is a dynamically adaptive and contextually aware approach that considers components' separation features combined with Docstrum [O'Gorman1993] based angular and neighborhood features to form provisional zone hypotheses. These provisional zones are then verified based on the context built from local separation and high-level content features. Finally, our research proposes a generic model to segment and to recognize characters for any complex syllabic or non-syllabic script, using font-models. This concept is based on the fact that font files contain all the information necessary to render text and thus a model for how to decompose them. Instead of script-specific routines, this work is a step towards a generic character and recognition scheme for both Latin and non-Latin scripts
    • …
    corecore