1 research outputs found

    ์ผ๋ฐ˜์ ์ธ ๋ฌธ์ž ์ด๋ฏธ์ง€์˜ ์–ธ์–ด๋ถ„๋ฅ˜

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ๊ณ„์‚ฐ๊ณผํ•™์ „๊ณต, 2021. 2. ๊ฐ•๋ช…์ฃผ.As other machine learning fields, there has been a lot of progress in text detection and recognition to obtain text information contained in images since the deep learning era. When multiple languages are mixed in the im- age, the process of recognition typically goes through a detection, language classification and recognition. This dissertation aims to classify languages of image patches which are the results of text detection. As far as we know, there are no prior research exactly targeting language classification of images. So we started from basic backbone networks that are used commonly in many other general object detection fields. With a ResNeSt-based network which is based on Resnet and automated pre-processing of ground-truth data to improve classification performance, we can achieve state of the art record of this task with a public benchmark dataset.๋‹ค๋ฅธ ๊ธฐ๊ณ„ํ•™์Šต๋ถ„์•ผ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, ์ด๋ฏธ์ง€๊ฐ€ ๋‹ด๊ณ  ์žˆ๋Š” ๋ฌธ์ž์ •๋ณด๋ฅผ ์–ป์–ด ๋‚ด๋ ค๋Š” ๋ฌธ์ž์ธ์‹ ๋ถ„์•ผ์—์„œ๋„ ๋”ฅ๋Ÿฌ๋‹ ์ดํ›„ ๋งŽ์€ ์ง„์ „์ด ์žˆ์—ˆ๋‹ค. ์ธ์‹์˜ ๊ณผ์ •์€ ํ†ต์ƒ์ ์œผ๋กœ ๋ฌธ์ž๊ฒ€์ถœ, ๋ฌธ์ž์ธ์‹์˜ ๊ณผ์ •์„ ์ฐจ๋ก€๋กœ ๊ฑฐ์น˜๋Š”๋ฐ, ๋‹ค์ˆ˜์˜ ์–ธ์–ด๊ฐ€ ํ˜ผ์žฌํ•  ๊ฒฝ์šฐ ๊ฒ€์ถœ๊ณผ ์ธ์‹ ์‚ฌ์ด์— ์–ธ์–ด๋ถ„๋ฅ˜ ๋‹จ๊ณ„๋ฅผ ํ•œ๋ฒˆ ๋” ๊ฑฐ์น˜๋Š” ๊ฒƒ์ด ๋ณดํ†ต ์ด๋‹ค. ๋ณธ์—ฐ๊ตฌ๋Š”๋ฌธ์ž๊ฒ€์ถœ์ดํ›„์˜๋‹จ๊ณ„์—์„œ์ด๋ฏธ์ง€ํŒจ์น˜๋“ค์„๊ฐ์–ธ์–ด์—๋”ฐ๋ผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ๋ถ„๋ฅ˜์ž‘์—…๋งŒ์„ ์ „๋ฌธ์ ์œผ๋กœ ๋‹ค๋ฃฌ ์„ ํ–‰์—ฐ๊ตฌ๊ฐ€ ์—†์œผ ๋ฏ€๋กœ, ์ผ๋ฐ˜์ ์ธ ๊ฐ์ฒด๊ฒ€์ถœ์—์„œ ์“ฐ์ด๋Š” ๋„คํŠธ์›Œํฌ ์ค‘์—์„œ ์ ์ ˆํ•œ ๊ฒƒ์„ ์„ ํƒํ•˜๊ณ  ์‘์šฉํ•˜์˜€๋‹ค. ResNeSt๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœํ•œ ๋„คํŠธ์›Œํฌ์™€ ์ž๋™ํ™”๋œ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์„ ํ†ตํ•ด ๊ณต๊ฐœ๋œ ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ์…‹์„ ๊ธฐ์ค€์œผ๋กœ ๊ฐ€์žฅ ์ข‹์€ ๊ธฐ๋ก์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.Abstract i 1 Introduction 1 1.1 OpticalCharacterRecognition.................. 1 1.2 DeepLearning........................... 2 2 Backgrounds 4 2.1 Detection ............................. 4 2.2 Recognition ............................ 5 2.3 LanguageClassification...................... 6 2.4 Multi-lingualText(MLT)..................... 7 2.5 ConvolutionalNeuralNetwork(CNN) . . . . . . . . . . . . . . 7 2.6 AttentionMechanism....................... 8 2.7 RelatedWorks........................... 9 2.7.1 Detectors ......................... 9 2.7.2 Recognizers ........................ 14 2.7.3 End-to-end methods (detector + recognizer) . . . . . . 14 2.8 Dataset .............................. 15 2.8.1 ICDARMLT ....................... 15 2.8.2 Syntheticdata:Gupta.................. 17 2.8.3 COCO-Text........................ 17 3 Proposed Methods 18 3.1 BaseNetworkSelection...................... 18 3.1.1 Googlenet ......................... 18 3.1.2 ShufflenetV2 ....................... 20 3.1.3 Resnet........................... 21 3.1.4 WideResnet........................ 23 3.1.5 ResNeXt.......................... 24 3.1.6 ResNeSt(Split-Attention network) ............ 24 3.1.7 Densenet.......................... 25 3.1.8 EfficientNet ........................ 25 3.1.9 Automaticsearch:AutoSTR .............. 27 3.2 Methods.............................. 28 3.2.1 Groundtruthcleansing.................. 28 3.2.2 Divide-and-stack ..................... 32 3.2.3 Usingadditionaldata................... 33 3.2.4 OHEM........................... 34 3.2.5 Network using the number of characters . . . . . . . . 35 3.2.6 UseofR-CNNstructure ................. 36 3.2.7 Highresolutioninput................... 39 3.2.8 Handling outliers using variant of OHEM . . . . . . . . 39 3.2.9 Variable sized input images using the attention . . . . 41 3.2.10 Classbalancing ...................... 41 3.2.11 Finetuningonspecificclasses.............. 42 3.2.12 Optimizerselection.................... 42 3.3 Result ............................... 42 4 Conclusion 44 Abstract (in Korean) 49Docto
    corecore