Search CORE

3 research outputs found

DropSample: A New Training Method to Enhance Deep Convolutional Neural Networks for Large-Scale Unconstrained Handwritten Chinese Character Recognition

Author: Feng Ziyong
Jin Lianwen
Tao Dacheng
Xie Zecheng
Yang Weixin
Publication venue
Publication date: 20/05/2015
Field of study

Inspired by the theory of Leitners learning box from the field of psychology, we propose DropSample, a new method for training deep convolutional neural networks (DCNNs), and apply it to large-scale online handwritten Chinese character recognition (HCCR). According to the principle of DropSample, each training sample is associated with a quota function that is dynamically adjusted on the basis of the classification confidence given by the DCNN softmax output. After a learning iteration, samples with low confidence will have a higher probability of being selected as training data in the next iteration; in contrast, well-trained and well-recognized samples with very high confidence will have a lower probability of being involved in the next training iteration and can be gradually eliminated. As a result, the learning process becomes more efficient as it progresses. Furthermore, we investigate the use of domain-specific knowledge to enhance the performance of DCNN by adding a domain knowledge layer before the traditional CNN. By adopting DropSample together with different types of domain-specific knowledge, the accuracy of HCCR can be improved efficiently. Experiments on the CASIA-OLHDWB 1.0, CASIA-OLHWDB 1.1, and ICDAR 2013 online HCCR competition datasets yield outstanding recognition rates of 97.33%, 97.06%, and 97.51% respectively, all of which are significantly better than the previous best results reported in the literature.Comment: 18 pages, 8 figures, 5 table

arXiv.org e-Print Archive

Synbols: Probing Learning Algorithms with Synthetic Datasets

Author: Atighehchian Parmida
Branchaud-Charron Frédéric
Caccia Massimo
Charlin Laurent
Craddock Matt
Drouin Alexandre
Lacoste Alexandre
Laradji Issam
Rodríguez Pau
Vázquez David
Publication venue
Publication date: 04/11/2020
Field of study

Progress in the field of machine learning has been fueled by the introduction of benchmark datasets pushing the limits of existing algorithms. Enabling the design of datasets to test specific properties and failure modes of learning algorithms is thus a problem of high interest, as it has a direct impact on innovation in the field. In this sense, we introduce Synbols -- Synthetic Symbols -- a tool for rapidly generating new datasets with a rich composition of latent features rendered in low resolution images. Synbols leverages the large amount of symbols available in the Unicode standard and the wide range of artistic font provided by the open font community. Our tool's high-level interface provides a language for rapidly generating new distributions on the latent features, including various types of textures and occlusions. To showcase the versatility of Synbols, we use it to dissect the limitations and flaws in standard learning algorithms in various learning setups including supervised learning, active learning, out of distribution generalization, unsupervised representation learning, and object counting

arXiv.org e-Print Archive

AN ADAPTIVE BAG-OF-FEATURES FRAMEWORK FOR ARABIC HANDWRITING RECOGNITION

Author
Publication venue
Publication date
Field of study

KFUPM ePrints