3 research outputs found
DropSample: A New Training Method to Enhance Deep Convolutional Neural Networks for Large-Scale Unconstrained Handwritten Chinese Character Recognition
Inspired by the theory of Leitners learning box from the field of psychology,
we propose DropSample, a new method for training deep convolutional neural
networks (DCNNs), and apply it to large-scale online handwritten Chinese
character recognition (HCCR). According to the principle of DropSample, each
training sample is associated with a quota function that is dynamically
adjusted on the basis of the classification confidence given by the DCNN
softmax output. After a learning iteration, samples with low confidence will
have a higher probability of being selected as training data in the next
iteration; in contrast, well-trained and well-recognized samples with very high
confidence will have a lower probability of being involved in the next training
iteration and can be gradually eliminated. As a result, the learning process
becomes more efficient as it progresses. Furthermore, we investigate the use of
domain-specific knowledge to enhance the performance of DCNN by adding a domain
knowledge layer before the traditional CNN. By adopting DropSample together
with different types of domain-specific knowledge, the accuracy of HCCR can be
improved efficiently. Experiments on the CASIA-OLHDWB 1.0, CASIA-OLHWDB 1.1,
and ICDAR 2013 online HCCR competition datasets yield outstanding recognition
rates of 97.33%, 97.06%, and 97.51% respectively, all of which are
significantly better than the previous best results reported in the literature.Comment: 18 pages, 8 figures, 5 table
Synbols: Probing Learning Algorithms with Synthetic Datasets
Progress in the field of machine learning has been fueled by the introduction
of benchmark datasets pushing the limits of existing algorithms. Enabling the
design of datasets to test specific properties and failure modes of learning
algorithms is thus a problem of high interest, as it has a direct impact on
innovation in the field. In this sense, we introduce Synbols -- Synthetic
Symbols -- a tool for rapidly generating new datasets with a rich composition
of latent features rendered in low resolution images. Synbols leverages the
large amount of symbols available in the Unicode standard and the wide range of
artistic font provided by the open font community. Our tool's high-level
interface provides a language for rapidly generating new distributions on the
latent features, including various types of textures and occlusions. To
showcase the versatility of Synbols, we use it to dissect the limitations and
flaws in standard learning algorithms in various learning setups including
supervised learning, active learning, out of distribution generalization,
unsupervised representation learning, and object counting