5 research outputs found

    On the Ability of a CNN to Realize Image-to-Image Language Conversion

    Full text link
    The purpose of this paper is to reveal the ability that Convolutional Neural Networks (CNN) have on the novel task of image-to-image language conversion. We propose a new network to tackle this task by converting images of Korean Hangul characters directly into images of the phonetic Latin character equivalent. The conversion rules between Hangul and the phonetic symbols are not explicitly provided. The results of the proposed network show that it is possible to perform image-to-image language conversion. Moreover, it shows that it can grasp the structural features of Hangul even from limited learning data. In addition, it introduces a new network to use when the input and output have significantly different features.Comment: Published at ICDAR 201

    ์ผ๋ฐ˜์ ์ธ ๋ฌธ์ž ์ด๋ฏธ์ง€์˜ ์–ธ์–ด๋ถ„๋ฅ˜

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ๊ณ„์‚ฐ๊ณผํ•™์ „๊ณต, 2021. 2. ๊ฐ•๋ช…์ฃผ.As other machine learning fields, there has been a lot of progress in text detection and recognition to obtain text information contained in images since the deep learning era. When multiple languages are mixed in the im- age, the process of recognition typically goes through a detection, language classification and recognition. This dissertation aims to classify languages of image patches which are the results of text detection. As far as we know, there are no prior research exactly targeting language classification of images. So we started from basic backbone networks that are used commonly in many other general object detection fields. With a ResNeSt-based network which is based on Resnet and automated pre-processing of ground-truth data to improve classification performance, we can achieve state of the art record of this task with a public benchmark dataset.๋‹ค๋ฅธ ๊ธฐ๊ณ„ํ•™์Šต๋ถ„์•ผ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, ์ด๋ฏธ์ง€๊ฐ€ ๋‹ด๊ณ  ์žˆ๋Š” ๋ฌธ์ž์ •๋ณด๋ฅผ ์–ป์–ด ๋‚ด๋ ค๋Š” ๋ฌธ์ž์ธ์‹ ๋ถ„์•ผ์—์„œ๋„ ๋”ฅ๋Ÿฌ๋‹ ์ดํ›„ ๋งŽ์€ ์ง„์ „์ด ์žˆ์—ˆ๋‹ค. ์ธ์‹์˜ ๊ณผ์ •์€ ํ†ต์ƒ์ ์œผ๋กœ ๋ฌธ์ž๊ฒ€์ถœ, ๋ฌธ์ž์ธ์‹์˜ ๊ณผ์ •์„ ์ฐจ๋ก€๋กœ ๊ฑฐ์น˜๋Š”๋ฐ, ๋‹ค์ˆ˜์˜ ์–ธ์–ด๊ฐ€ ํ˜ผ์žฌํ•  ๊ฒฝ์šฐ ๊ฒ€์ถœ๊ณผ ์ธ์‹ ์‚ฌ์ด์— ์–ธ์–ด๋ถ„๋ฅ˜ ๋‹จ๊ณ„๋ฅผ ํ•œ๋ฒˆ ๋” ๊ฑฐ์น˜๋Š” ๊ฒƒ์ด ๋ณดํ†ต ์ด๋‹ค. ๋ณธ์—ฐ๊ตฌ๋Š”๋ฌธ์ž๊ฒ€์ถœ์ดํ›„์˜๋‹จ๊ณ„์—์„œ์ด๋ฏธ์ง€ํŒจ์น˜๋“ค์„๊ฐ์–ธ์–ด์—๋”ฐ๋ผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ๋ถ„๋ฅ˜์ž‘์—…๋งŒ์„ ์ „๋ฌธ์ ์œผ๋กœ ๋‹ค๋ฃฌ ์„ ํ–‰์—ฐ๊ตฌ๊ฐ€ ์—†์œผ ๋ฏ€๋กœ, ์ผ๋ฐ˜์ ์ธ ๊ฐ์ฒด๊ฒ€์ถœ์—์„œ ์“ฐ์ด๋Š” ๋„คํŠธ์›Œํฌ ์ค‘์—์„œ ์ ์ ˆํ•œ ๊ฒƒ์„ ์„ ํƒํ•˜๊ณ  ์‘์šฉํ•˜์˜€๋‹ค. ResNeSt๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœํ•œ ๋„คํŠธ์›Œํฌ์™€ ์ž๋™ํ™”๋œ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์„ ํ†ตํ•ด ๊ณต๊ฐœ๋œ ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ์…‹์„ ๊ธฐ์ค€์œผ๋กœ ๊ฐ€์žฅ ์ข‹์€ ๊ธฐ๋ก์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.Abstract i 1 Introduction 1 1.1 OpticalCharacterRecognition.................. 1 1.2 DeepLearning........................... 2 2 Backgrounds 4 2.1 Detection ............................. 4 2.2 Recognition ............................ 5 2.3 LanguageClassification...................... 6 2.4 Multi-lingualText(MLT)..................... 7 2.5 ConvolutionalNeuralNetwork(CNN) . . . . . . . . . . . . . . 7 2.6 AttentionMechanism....................... 8 2.7 RelatedWorks........................... 9 2.7.1 Detectors ......................... 9 2.7.2 Recognizers ........................ 14 2.7.3 End-to-end methods (detector + recognizer) . . . . . . 14 2.8 Dataset .............................. 15 2.8.1 ICDARMLT ....................... 15 2.8.2 Syntheticdata:Gupta.................. 17 2.8.3 COCO-Text........................ 17 3 Proposed Methods 18 3.1 BaseNetworkSelection...................... 18 3.1.1 Googlenet ......................... 18 3.1.2 ShufflenetV2 ....................... 20 3.1.3 Resnet........................... 21 3.1.4 WideResnet........................ 23 3.1.5 ResNeXt.......................... 24 3.1.6 ResNeSt(Split-Attention network) ............ 24 3.1.7 Densenet.......................... 25 3.1.8 EfficientNet ........................ 25 3.1.9 Automaticsearch:AutoSTR .............. 27 3.2 Methods.............................. 28 3.2.1 Groundtruthcleansing.................. 28 3.2.2 Divide-and-stack ..................... 32 3.2.3 Usingadditionaldata................... 33 3.2.4 OHEM........................... 34 3.2.5 Network using the number of characters . . . . . . . . 35 3.2.6 UseofR-CNNstructure ................. 36 3.2.7 Highresolutioninput................... 39 3.2.8 Handling outliers using variant of OHEM . . . . . . . . 39 3.2.9 Variable sized input images using the attention . . . . 41 3.2.10 Classbalancing ...................... 41 3.2.11 Finetuningonspecificclasses.............. 42 3.2.12 Optimizerselection.................... 42 3.3 Result ............................... 42 4 Conclusion 44 Abstract (in Korean) 49Docto

    Large vocabulary off-line handwritten word recognition

    Get PDF
    Considerable progress has been made in handwriting recognition technology over the last few years. Thus far, handwriting recognition systems have been limited to small-scale and very constrained applications where the number on different words that a system can recognize is the key point for its performance. The capability of dealing with large vocabularies, however, opens up many more applications. In order to translate the gains made by research into large and very-large vocabulary handwriting recognition, it is necessary to further improve the computational efficiency and the accuracy of the current recognition strategies and algorithms. In this thesis we focus on efficient and accurate large vocabulary handwriting recognition. The main challenge is to speedup the recognition process and to improve the recognition accuracy. However. these two aspects are in mutual conftict. It is relatively easy to improve recognition speed while trading away some accuracy. But it is much harder to improve the recognition speed while preserving the accuracy. First, several strategies have been investigated for improving the performance of a baseline recognition system in terms of recognition speed to deal with large and very-large vocabularies. Next, we improve the performance in terms of recognition accuracy while preserving all the original characteristics of the baseline recognition system: omniwriter, unconstrained handwriting, and dynamic lexicons. The main contributions of this thesis are novel search strategies and a novel verification approach that allow us to achieve a 120 speedup and 10% accuracy improvement over a state-of-art baselinรจ recognition system for a very-large vocabulary recognition task (80,000 words). The improvements in speed are obtained by the following techniques: lexical tree search, standard and constrained lexicon-driven level building algorithms, fast two-level decoding algorithm, and a distributed recognition scheme. The recognition accuracy is improved by post-processing the list of the candidate N-best-scoring word hypotheses generated by the baseline recognition system. The list also contains the segmentation of such word hypotheses into characters . A verification module based on a neural network classifier is used to generate a score for each segmented character and in the end, the scores from the baseline recognition system and the verification module are combined to optimize performance. A rejection mechanism is introduced over the combination of the baseline recognition system with the verification module to improve significantly the word recognition rate to about 95% while rejecting 30% of the word hypotheses

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains
    corecore