47,798 research outputs found

    Open Set Chinese Character Recognition using Multi-typed Attributes

    Get PDF
    Recognition of Off-line Chinese characters is still a challenging problem, especially in historical documents, not only in the number of classes extremely large in comparison to contemporary image retrieval methods, but also new unseen classes can be expected under open learning conditions (even for CNN). Chinese character recognition with zero or a few training samples is a difficult problem and has not been studied yet. In this paper, we propose a new Chinese character recognition method by multi-type attributes, which are based on pronunciation, structure and radicals of Chinese characters, applied to character recognition in historical books. This intermediate attribute code has a strong advantage over the common `one-hot' class representation because it allows for understanding complex and unseen patterns symbolically using attributes. First, each character is represented by four groups of attribute types to cover a wide range of character possibilities: Pinyin label, layout structure, number of strokes, three different input methods such as Cangjie, Zhengma and Wubi, as well as a four-corner encoding method. A convolutional neural network (CNN) is trained to learn these attributes. Subsequently, characters can be easily recognized by these attributes using a distance metric and a complete lexicon that is encoded in attribute space. We evaluate the proposed method on two open data sets: printed Chinese character recognition for zero-shot learning, historical characters for few-shot learning and a closed set: handwritten Chinese characters. Experimental results show a good general classification of seen classes but also a very promising generalization ability to unseen characters.Comment: 29 pages, submitted to Pattern Recognitio

    Neural Priming for Sample-Efficient Adaptation

    Full text link
    We propose Neural Priming, a technique for adapting large pretrained models to distribution shifts and downstream tasks given few or no labeled examples. Presented with class names or unlabeled test samples, Neural Priming enables the model to recall and conditions its parameters on relevant data seen throughout pretraining, thereby priming it for the test distribution. Neural Priming can be performed at test time in even for pretraining datasets as large as LAION-2B. Performing lightweight updates on the recalled data significantly improves accuracy across a variety of distribution shift and transfer learning benchmarks. Concretely, in the zero-shot setting, we see a 2.45 improvement in accuracy on ImageNet and 3.81 accuracy improvement on average across standard transfer learning benchmarks. Further, using our test time inference scheme, we see a 1.41 accuracy improvement on ImageNetV2. These results demonstrate the effectiveness of Neural Priming in addressing the common challenge of limited labeled data and changing distributions. Code is available at github.com/RAIVNLab/neural-priming.Comment: 18 pages, 8 figures, 9 table

    Exploring 3D Data and Beyond in a Low Data Regime

    Get PDF
    3D object classification of point clouds is an essential task as laser scanners, or other depth sensors, producing point clouds are now a commodity on, e.g., autonomous vehicles, surveying vehicles, service robots, and drones. There have been fewer advances using deep learning methods in the area of point clouds compared to 2D images and videos, partially because the data in a point cloud are typically unordered as opposed to the pixels in a 2D image, which implies standard deep learning architectures are not suitable. Additionally, we identify there is a shortcoming of labelled 3D data in many computer vision tasks, as collecting 3D data is significantly more costly and difficult. This implies using zero- or few-shot learning approaches, where some classes have not been observed often or at all during training. As our first objective, we study the problem of 3D object classification of point clouds in a supervised setting where there are labelled samples for each class in the dataset. To this end, we introduce the {3DCapsule}, which is a 3D extension of the recently introduced Capsule concept by Hinton et al. that makes it applicable to unordered point sets. The 3DCapsule is a drop-in replacement of the commonly used fully connected classifier. It is demonstrated that when the 3DCapsule is applied to contemporary 3D point set classification architectures, it consistently shows an improvement, in particular when subjected to noisy data. We then turn our attention to the problem of 3D object classification of point clouds in a Zero-shot Learning (ZSL) setting, where there are no labelled data for some classes. Several recent 3D point cloud recognition algorithms are adapted to the ZSL setting with some necessary changes to their respective architectures. To the best of our knowledge, at the time, this was the first attempt to classify unseen 3D point cloud objects in a ZSL setting. A standard protocol (which includes the choice of datasets and determines the seen/unseen split) to evaluate such systems is also proposed. In the next contribution, we address the hubness problem on 3D point cloud data, which is when a model is biased to predict only a few particular labels for most of the test instances. To this end, we propose a loss function which is useful for both Zero-Shot and Generalized Zero-Shot Learning. Besides, we tackle 3D object classification of point clouds in a different setting, called the transductive setting, wherein the test samples are allowed to be observed during the training stage but then as unlabelled data. We extend, for the first time, transductive Zero-Shot Learning (ZSL) and Generalized Zero-Shot Learning (GZSL) approaches to the domain of 3D point cloud classification by developing a novel triplet loss that takes advantage of the unlabeled test data. While designed for the task of 3D point cloud classification, the method is also shown to be applicable to the more common use-case of 2D image classification. Lastly, we study the Generalized Zero-Shot Learning (GZSL) problem in the 2D image domain. However, we also demonstrate that our proposed method is applicable to 3D point cloud data. We propose using a mixture of subspaces which represents input features and semantic information in a way that reduces the imbalance between seen and unseen prediction scores. Subspaces define the cluster structure of the visual domain and help describe the visual and semantic domain considering the overall distribution of the data

    f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning

    Full text link
    When labeled training data is scarce, a promising data augmentation approach is to generate visual features of unknown classes using their attributes. To learn the class conditional distribution of CNN features, these models rely on pairs of image features and class attributes. Hence, they can not make use of the abundance of unlabeled data samples. In this paper, we tackle any-shot learning problems i.e. zero-shot and few-shot, in a unified feature generating framework that operates in both inductive and transductive learning settings. We develop a conditional generative model that combines the strength of VAE and GANs and in addition, via an unconditional discriminator, learns the marginal feature distribution of unlabeled images. We empirically show that our model learns highly discriminative CNN features for five datasets, i.e. CUB, SUN, AWA and ImageNet, and establish a new state-of-the-art in any-shot learning, i.e. inductive and transductive (generalized) zero- and few-shot learning settings. We also demonstrate that our learned features are interpretable: we visualize them by inverting them back to the pixel space and we explain them by generating textual arguments of why they are associated with a certain label.Comment: Accepted at CVPR 201
    • …
    corecore