118,632 research outputs found
One-shot learning of object categories
Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by maximum likelihood (ML) and maximum a posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully
Closing the Generalization Gap in One-Shot Object Detection
Despite substantial progress in object detection and few-shot learning,
detecting objects based on a single example - one-shot object detection -
remains a challenge: trained models exhibit a substantial generalization gap,
where object categories used during training are detected much more reliably
than novel ones. Here we show that this generalization gap can be nearly closed
by increasing the number of object categories used during training. Our results
show that the models switch from memorizing individual categories to learning
object similarity over the category distribution, enabling strong
generalization at test time. Importantly, in this regime standard methods to
improve object detection models like stronger backbones or longer training
schedules also benefit novel categories, which was not the case for smaller
datasets like COCO. Our results suggest that the key to strong few-shot
detection models may not lie in sophisticated metric learning approaches, but
instead in scaling the number of categories. Future data annotation efforts
should therefore focus on wider datasets and annotate a larger number of
categories rather than gathering more images or instances per category
Analogy-Forming Transformers for Few-Shot 3D Parsing
We present Analogical Networks, a model that encodes domain knowledge
explicitly, in a collection of structured labelled 3D scenes, in addition to
implicitly, as model parameters, and segments 3D object scenes with analogical
reasoning: instead of mapping a scene to part segments directly, our model
first retrieves related scenes from memory and their corresponding part
structures, and then predicts analogous part structures for the input scene,
via an end-to-end learnable modulation mechanism. By conditioning on more than
one retrieved memories, compositions of structures are predicted, that mix and
match parts across the retrieved memories. One-shot, few-shot or many-shot
learning are treated uniformly in Analogical Networks, by conditioning on the
appropriate set of memories, whether taken from a single, few or many memory
exemplars, and inferring analogous parses. We show Analogical Networks are
competitive with state-of-the-art 3D segmentation transformers in many-shot
settings, and outperform them, as well as existing paradigms of meta-learning
and few-shot learning, in few-shot settings. Analogical Networks successfully
segment instances of novel object categories simply by expanding their memory,
without any weight updates. Our code and models are publicly available in the
project webpage: http://analogicalnets.github.io/.Comment: ICLR 202
Improving Siamese Networks for One Shot Learning using Kernel Based Activation functions
The lack of a large amount of training data has always been the constraining
factor in solving a lot of problems in machine learning, making One Shot
Learning one of the most intriguing ideas in machine learning. It aims to learn
information about object categories from one, or only a few training examples.
This process of learning in deep learning is usually accomplished by proper
objective function, i.e; loss function and embeddings extraction i.e;
architecture. In this paper, we discussed about metrics based deep learning
architectures for one shot learning such as Siamese neural networks and present
a method to improve on their accuracy using Kafnets (kernel-based
non-parametric activation functions for neural networks) by learning proper
embeddings with relatively less number of epochs. Using kernel activation
functions, we are able to achieve strong results which exceed those of ReLU
based deep learning models in terms of embeddings structure, loss convergence,
and accuracy.Comment: 15 pages, 8 figure
- …