일반적인 문자 이미지의 언어분류

Abstract

학위논문 (박사) -- 서울대학교 대학원 : 자연과학대학 협동과정 계산과학전공, 2021. 2. 강명주.As other machine learning fields, there has been a lot of progress in text detection and recognition to obtain text information contained in images since the deep learning era. When multiple languages are mixed in the im- age, the process of recognition typically goes through a detection, language classification and recognition. This dissertation aims to classify languages of image patches which are the results of text detection. As far as we know, there are no prior research exactly targeting language classification of images. So we started from basic backbone networks that are used commonly in many other general object detection fields. With a ResNeSt-based network which is based on Resnet and automated pre-processing of ground-truth data to improve classification performance, we can achieve state of the art record of this task with a public benchmark dataset.다른 기계학습분야와 마찬가지로, 이미지가 담고 있는 문자정보를 얻어 내려는 문자인식 분야에서도 딥러닝 이후 많은 진전이 있었다. 인식의 과정은 통상적으로 문자검출, 문자인식의 과정을 차례로 거치는데, 다수의 언어가 혼재할 경우 검출과 인식 사이에 언어분류 단계를 한번 더 거치는 것이 보통 이다. 본연구는문자검출이후의단계에서이미지패치들을각언어에따라 분류하는 것을 목표로 한다. 분류작업만을 전문적으로 다룬 선행연구가 없으 므로, 일반적인 객체검출에서 쓰이는 네트워크 중에서 적절한 것을 선택하고 응용하였다. ResNeSt를 기반으로한 네트워크와 자동화된 전처리 과정을 통해 공개된 벤치마크 데이터셋을 기준으로 가장 좋은 기록을 달성할 수 있었다.Abstract i 1 Introduction 1 1.1 OpticalCharacterRecognition.................. 1 1.2 DeepLearning........................... 2 2 Backgrounds 4 2.1 Detection ............................. 4 2.2 Recognition ............................ 5 2.3 LanguageClassification...................... 6 2.4 Multi-lingualText(MLT)..................... 7 2.5 ConvolutionalNeuralNetwork(CNN) . . . . . . . . . . . . . . 7 2.6 AttentionMechanism....................... 8 2.7 RelatedWorks........................... 9 2.7.1 Detectors ......................... 9 2.7.2 Recognizers ........................ 14 2.7.3 End-to-end methods (detector + recognizer) . . . . . . 14 2.8 Dataset .............................. 15 2.8.1 ICDARMLT ....................... 15 2.8.2 Syntheticdata:Gupta.................. 17 2.8.3 COCO-Text........................ 17 3 Proposed Methods 18 3.1 BaseNetworkSelection...................... 18 3.1.1 Googlenet ......................... 18 3.1.2 ShufflenetV2 ....................... 20 3.1.3 Resnet........................... 21 3.1.4 WideResnet........................ 23 3.1.5 ResNeXt.......................... 24 3.1.6 ResNeSt(Split-Attention network) ............ 24 3.1.7 Densenet.......................... 25 3.1.8 EfficientNet ........................ 25 3.1.9 Automaticsearch:AutoSTR .............. 27 3.2 Methods.............................. 28 3.2.1 Groundtruthcleansing.................. 28 3.2.2 Divide-and-stack ..................... 32 3.2.3 Usingadditionaldata................... 33 3.2.4 OHEM........................... 34 3.2.5 Network using the number of characters . . . . . . . . 35 3.2.6 UseofR-CNNstructure ................. 36 3.2.7 Highresolutioninput................... 39 3.2.8 Handling outliers using variant of OHEM . . . . . . . . 39 3.2.9 Variable sized input images using the attention . . . . 41 3.2.10 Classbalancing ...................... 41 3.2.11 Finetuningonspecificclasses.............. 42 3.2.12 Optimizerselection.................... 42 3.3 Result ............................... 42 4 Conclusion 44 Abstract (in Korean) 49Docto

    Similar works