4 research outputs found
Few-Shot Image Recognition by Predicting Parameters from Activations
In this paper, we are interested in the few-shot learning problem. In
particular, we focus on a challenging scenario where the number of categories
is large and the number of examples per novel category is very limited, e.g. 1,
2, or 3. Motivated by the close relationship between the parameters and the
activations in a neural network associated with the same category, we propose a
novel method that can adapt a pre-trained neural network to novel categories by
directly predicting the parameters from the activations. Zero training is
required in adaptation to novel categories, and fast inference is realized by a
single forward pass. We evaluate our method by doing few-shot image recognition
on the ImageNet dataset, which achieves the state-of-the-art classification
accuracy on novel categories by a significant margin while keeping comparable
performance on the large-scale categories. We also test our method on the
MiniImageNet dataset and it strongly outperforms the previous state-of-the-art
methods
Learning to Associate Words and Images Using a Large-scale Graph
We develop an approach for unsupervised learning of associations between
co-occurring perceptual events using a large graph. We applied this approach to
successfully solve the image captcha of China's railroad system. The approach
is based on the principle of suspicious coincidence. In this particular
problem, a user is presented with a deformed picture of a Chinese phrase and
eight low-resolution images. They must quickly select the relevant images in
order to purchase their train tickets. This problem presents several
challenges: (1) the teaching labels for both the Chinese phrases and the images
were not available for supervised learning, (2) no pre-trained deep
convolutional neural networks are available for recognizing these Chinese
phrases or the presented images, and (3) each captcha must be solved within a
few seconds. We collected 2.6 million captchas, with 2.6 million deformed
Chinese phrases and over 21 million images. From these data, we constructed an
association graph, composed of over 6 million vertices, and linked these
vertices based on co-occurrence information and feature similarity between
pairs of images. We then trained a deep convolutional neural network to learn a
projection of the Chinese phrases onto a 230-dimensional latent space. Using
label propagation, we computed the likelihood of each of the eight images
conditioned on the latent space projection of the deformed phrase for each
captcha. The resulting system solved captchas with 77% accuracy in 2 seconds on
average. Our work, in answering this practical challenge, illustrates the power
of this class of unsupervised association learning techniques, which may be
related to the brain's general strategy for associating language stimuli with
visual objects on the principle of suspicious coincidence.Comment: 8 pages, 7 figures, 14th Conference on Computer and Robot Vision 201
Structured Learning with Manifold Representations of Natural Data Variations
According to the manifold hypothesis, natural variations in high-dimensional data lie on or near a low-dimensional, nonlinear manifold. Additionally, many identity-preserving transformations are shared among classes of data which can allow for an efficient representation of data variations: a limited set of transformations can describe a majority of variations in many classes. This work demonstrates the learning of generative models of identity-preserving transformations on data manifolds in order to analyze, generate, and exploit the natural variations in data for machine learning tasks. The introduced transformation representations are incorporated into several novel models to highlight the ability to generate realistic samples of semantically meaningful transformations, to generalize transformations beyond their source domain, and to estimate transformations between data samples. We first develop a model for learning 3D manifold-based transformations from 2D projected inputs which can be used to perform depth inference from 2D moving inputs. We then confirm that our generative model of transformations can be generalized across classes by defining two transfer learning tasks that map transformations learned from a rich dataset to previously unseen data. Next, we develop the manifold autoencoder, which learns low-dimensional manifold structure from complex data in the latent space of an autoencoder and adapts the latent space to accommodate this structure. Finally, we introduce the Variational Autoencoder with Learned Latent Structure (VAELLS) which incorporates a learnable manifold model into the fully probabilistic generative framework of a variational autoencoder.Ph.D