4,613 research outputs found
Unconstrained Scene Text and Video Text Recognition for Arabic Script
Building robust recognizers for Arabic has always been challenging. We
demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid
architecture in recognizing Arabic text in videos and natural scenes. We
outperform previous state-of-the-art on two publicly available video text
datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a
new Arabic scene text dataset and establish baseline results. For scripts like
Arabic, a major challenge in developing robust recognizers is the lack of large
quantity of annotated data. We overcome this by synthesising millions of Arabic
text images from a large vocabulary of Arabic words and phrases. Our
implementation is built on top of the model introduced here [37] which is
proven quite effective for English scene text recognition. The model follows a
segmentation-free, sequence to sequence transcription approach. The network
transcribes a sequence of convolutional features from the input image to a
sequence of target labels. This does away with the need for segmenting input
image into constituent characters/glyphs, which is often difficult for Arabic
script. Further, the ability of RNNs to model contextual dependencies yields
superior recognition results.Comment: 5 page
Diagonal Based Feature Extraction for Handwritten Alphabets Recognition System using Neural Network
An off-line handwritten alphabetical character recognition system using
multilayer feed forward neural network is described in the paper. A new method,
called, diagonal based feature extraction is introduced for extracting the
features of the handwritten alphabets. Fifty data sets, each containing 26
alphabets written by various people, are used for training the neural network
and 570 different handwritten alphabetical characters are used for testing. The
proposed recognition system performs quite well yielding higher levels of
recognition accuracy compared to the systems employing the conventional
horizontal and vertical methods of feature extraction. This system will be
suitable for converting handwritten documents into structural text form and
recognizing handwritten names
Learning to Read by Spelling: Towards Unsupervised Text Recognition
This work presents a method for visual text recognition without using any
paired supervisory data. We formulate the text recognition task as one of
aligning the conditional distribution of strings predicted from given text
images, with lexically valid strings sampled from target corpora. This enables
fully automated, and unsupervised learning from just line-level text-images,
and unpaired text-string samples, obviating the need for large aligned
datasets. We present detailed analysis for various aspects of the proposed
method, namely - (1) impact of the length of training sequences on convergence,
(2) relation between character frequencies and the order in which they are
learnt, (3) generalisation ability of our recognition network to inputs of
arbitrary lengths, and (4) impact of varying the text corpus on recognition
accuracy. Finally, we demonstrate excellent text recognition accuracy on both
synthetically generated text images, and scanned images of real printed books,
using no labelled training examples
κΈ°κΈ° μμμμ μ¬μΈ΅ μ κ²½λ§ κ°μΈν λ°©λ²
νμλ
Όλ¬Έ (μμ¬)-- μμΈλνκ΅ λνμ : 곡과λν μ»΄ν¨ν°κ³΅νλΆ, 2019. 2. Egger, Bernhard.There exist several deep neural network (DNN) architectures suitable for embedded inference, however little work has focused on training neural networks on-device.
User customization of DNNs is desirable due to the difficulty of collecting a training set representative of real world scenarios.
Additionally, inter-user variation means that a general model has a limitation on its achievable accuracy.
In this thesis, a DNN architecture that allows for low power on-device user customization is proposed.
This approach is applied to handwritten character recognition of both the Latin and the Korean alphabets.
Experiments show a 3.5-fold reduction of the prediction error after user customization for both alphabets compared to a DNN trained with general data.
This architecture is additionally evaluated using a number of embedded processors demonstrating its practical application.λ΄μ₯ν κΈ°κΈ°μμ μ¬μΈ΅ μ κ²½λ§μ μΆλ‘ ν μ μλ μν€ν
μ²λ€μ μ‘΄μ¬νμ§λ§ λ΄μ₯ν κΈ°κΈ°μμ μ κ²½λ§μ νμ΅νλ μ°κ΅¬λ λ³λ‘ μ΄λ€μ§μ§ μμλ€. μ€μ νκ²½μ λ°μνλ νμ΅μ© λ°μ΄ν° μ§ν©μ λͺ¨μΌλ κ²μ΄ μ΄λ ΅κ³ μ¬μ©μκ°μ λ€μμ±μΌλ‘ μΈν΄ μΌλ°μ μΌλ‘ νμ΅λ λͺ¨λΈμ΄ μΆ©λΆν μ νλλ₯Ό κ°μ§κΈ°μ νκ³κ° μ‘΄μ¬νκΈ° λλ¬Έμ μ¬μ©μ λ§μΆ€ν μ¬μΈ΅ μ κ²½λ§μ΄ νμνλ€. μ΄ λ
Όλ¬Έμμλ κΈ°κΈ°μμμ μ μ λ ₯μΌλ‘ μ¬μ©μ λ§μΆ€νκ° κ°λ₯ν μ¬μΈ΅ μ κ²½λ§ μν€ν
μ²λ₯Ό μ μνλ€. μ΄λ¬ν μ κ·Ό λ°©λ²μ λΌν΄μ΄μ νκΈμ ν기체 κΈμ μΈμμ μ μ©λλ€. λΌν΄μ΄μ νκΈμ μ¬μ©μ λ§μΆ€νλ₯Ό μ μ©νμ¬ μΌλ°μ μΈ λ°μ΄ν°λ‘ νμ΅ν μ¬μΈ΅ μ κ²½λ§λ³΄λ€ 3.5λ°°λ μμ μμΈ‘ μ€λ₯μ κ²°κ³Όλ₯Ό μ»μλ€. λν μ΄ μν€ν
μ²μ μ€μ©μ±μ 보μ¬μ£ΌκΈ° μνμ¬ λ€μν λ΄μ₯ν νλ‘μΈμμμ μ€νμ μ§ννμλ€.Abstract i
Contents iii
List of Figures vii
List of Tables ix
Chapter 1 Introduction 1
Chapter 2 Motivation 4
Chapter 3 Background 6
3.1 Deep Neural Networks 6
3.1.1 Inference 6
3.1.2 Training 7
3.2 Convolutional Neural Networks 8
3.3 On-Device Acceleration 9
3.3.1 Hardware Accelerators 9
3.3.2 Software Optimization 10
Chapter 4 Methodology 12
4.1 Initialization 13
4.2 On-Device Training 14
Chapter 5 Implementation 16
5.1 Pre-processing 16
5.2 Latin Handwritten Character Recognition 17
5.2.1 Dataset and BIE Selection 17
5.2.2 AE Design 17
5.3 Korean Handwritten Character Recognition 21
5.3.1 Dataset and BIE Selection 21
5.3.2 AE Design 21
Chapter 6 On-Device Acceleration 26
6.1 Architecure Optimizations 27
6.2 Compiler Optimizations 29
Chapter 7 Experimental Setup 30
Chapter 8 Evaluation 33
8.1 Latin Handwritten Character Recognition 33
8.2 Korean Handwritten Character Recognition 38
8.3 On-Device Acceleration 40
Chapter 9 Related Work 44
Chapter 10 Conclusion 47
Bibliography 47
μμ½ 55
Acknowledgements 56Maste
Adaptive Algorithms for Automated Processing of Document Images
Large scale document digitization projects continue to motivate interesting document understanding technologies such as script and language identification, page classification, segmentation and enhancement. Typically, however, solutions are still limited to narrow domains or regular formats such as books, forms, articles or letters and operate best on clean documents scanned in a controlled environment. More general collections of heterogeneous documents challenge the basic assumptions of state-of-the-art technology regarding quality, script, content and layout. Our work explores the use of adaptive algorithms for the automated analysis of noisy and complex document collections.
We first propose, implement and evaluate an adaptive clutter detection and removal technique for complex binary documents. Our distance transform based technique aims to remove irregular and independent unwanted foreground content while leaving text content untouched. The novelty of this approach is in its determination of best approximation to clutter-content boundary with text like structures.
Second, we describe a page segmentation technique called Voronoi++ for complex layouts which builds upon the state-of-the-art method proposed by Kise [Kise1999]. Our approach does not assume structured text zones and is designed to handle multi-lingual text in both handwritten and printed form. Voronoi++ is a dynamically adaptive and contextually aware approach that considers components' separation features combined with Docstrum [O'Gorman1993] based angular and neighborhood features to form provisional zone hypotheses. These provisional zones are then verified based on the context built from local separation and high-level content features.
Finally, our research proposes a generic model to segment and to recognize characters for any complex syllabic or non-syllabic script, using font-models. This concept is based on the fact that font files contain all the information necessary to render text and thus a model for how to decompose them. Instead of script-specific routines, this work is a step towards a generic character and recognition scheme for both Latin and non-Latin scripts
- β¦