4 research outputs found

    Few-Shot Image Recognition by Predicting Parameters from Activations

    Full text link
    In this paper, we are interested in the few-shot learning problem. In particular, we focus on a challenging scenario where the number of categories is large and the number of examples per novel category is very limited, e.g. 1, 2, or 3. Motivated by the close relationship between the parameters and the activations in a neural network associated with the same category, we propose a novel method that can adapt a pre-trained neural network to novel categories by directly predicting the parameters from the activations. Zero training is required in adaptation to novel categories, and fast inference is realized by a single forward pass. We evaluate our method by doing few-shot image recognition on the ImageNet dataset, which achieves the state-of-the-art classification accuracy on novel categories by a significant margin while keeping comparable performance on the large-scale categories. We also test our method on the MiniImageNet dataset and it strongly outperforms the previous state-of-the-art methods

    Learning to Associate Words and Images Using a Large-scale Graph

    Full text link
    We develop an approach for unsupervised learning of associations between co-occurring perceptual events using a large graph. We applied this approach to successfully solve the image captcha of China's railroad system. The approach is based on the principle of suspicious coincidence. In this particular problem, a user is presented with a deformed picture of a Chinese phrase and eight low-resolution images. They must quickly select the relevant images in order to purchase their train tickets. This problem presents several challenges: (1) the teaching labels for both the Chinese phrases and the images were not available for supervised learning, (2) no pre-trained deep convolutional neural networks are available for recognizing these Chinese phrases or the presented images, and (3) each captcha must be solved within a few seconds. We collected 2.6 million captchas, with 2.6 million deformed Chinese phrases and over 21 million images. From these data, we constructed an association graph, composed of over 6 million vertices, and linked these vertices based on co-occurrence information and feature similarity between pairs of images. We then trained a deep convolutional neural network to learn a projection of the Chinese phrases onto a 230-dimensional latent space. Using label propagation, we computed the likelihood of each of the eight images conditioned on the latent space projection of the deformed phrase for each captcha. The resulting system solved captchas with 77% accuracy in 2 seconds on average. Our work, in answering this practical challenge, illustrates the power of this class of unsupervised association learning techniques, which may be related to the brain's general strategy for associating language stimuli with visual objects on the principle of suspicious coincidence.Comment: 8 pages, 7 figures, 14th Conference on Computer and Robot Vision 201

    Structured Learning with Manifold Representations of Natural Data Variations

    Get PDF
    According to the manifold hypothesis, natural variations in high-dimensional data lie on or near a low-dimensional, nonlinear manifold. Additionally, many identity-preserving transformations are shared among classes of data which can allow for an efficient representation of data variations: a limited set of transformations can describe a majority of variations in many classes. This work demonstrates the learning of generative models of identity-preserving transformations on data manifolds in order to analyze, generate, and exploit the natural variations in data for machine learning tasks. The introduced transformation representations are incorporated into several novel models to highlight the ability to generate realistic samples of semantically meaningful transformations, to generalize transformations beyond their source domain, and to estimate transformations between data samples. We first develop a model for learning 3D manifold-based transformations from 2D projected inputs which can be used to perform depth inference from 2D moving inputs. We then confirm that our generative model of transformations can be generalized across classes by defining two transfer learning tasks that map transformations learned from a rich dataset to previously unseen data. Next, we develop the manifold autoencoder, which learns low-dimensional manifold structure from complex data in the latent space of an autoencoder and adapts the latent space to accommodate this structure. Finally, we introduce the Variational Autoencoder with Learned Latent Structure (VAELLS) which incorporates a learnable manifold model into the fully probabilistic generative framework of a variational autoencoder.Ph.D
    corecore