66,609 research outputs found
Meta learning for few shot learning
Few-shot learning aims to scale visual recognition to open-ended growth of new classes with limited labelled examples, thus alleviating data and computation bottleneck of conventional deep learning. This thesis proposes a meta learning (a.k.a. learning to learn), paradigm to tackle the real-world few shot learning challenges.
Firstly, we present a parameterized multi-metric based meta learning algorithm (RelationNet2). Existing metric learning algorithms are always based on training a global deep embedding and metric to support image similarity matching, but we propose a deep comparison network comprised of embedding and relation modules learning multiple non-linear distance metrics based on different levels of features simultaneously. Furthermore, images are represented as \todo{a} distribution rather than vectors via learning parameterized Gaussian noise regularization, reducing overfitting and enable the use of deeper embeddings.
We next consider the fact that several recent competitors develop effective few-shot learners through strong conventional representations in combination with very simple classifiers, questioning whether “meta-learning” is necessary or highly effective features are sufficient. To defend meta-learning, we take an approach agnostic to the off-the-shelf features, and focus exclusively on meta-learning the final classifier layer. Specifically, we introduce MetaQDA, a Bayesian meta-learning extension of quadratic discriminant analysis classifier, that is complementary to advances in feature representations, leading to high accuracy and state-of-the-art uncertainty calibration performance in predictions.
Finally, we investigate the extension of MetaQDA to more generalized real-world scenarios beyond the narrow standard few-shot benchmarks. Our model achieves both many-shot and few-shot classification accuracy in generalized few-shot learning. In terms of few-shot class-incremental learning, MetaQDA is inherently suitable to novel classes growing \todo{scenarios}. As for open-set recognition, we calculate the probability belonging to novel class by Bayes' Rule, maintaining high accuracy in both close-set recognition and open-set rejection.
Overall, our contributions in few-shot meta-learning advance state of the art under both accuracy and calibration metrics, explore a series of increasingly realistic problem settings, to support more researchers and practitioners in future exploration
Sparse Spatial Transformers for Few-Shot Learning
Learning from limited data is a challenging task since the scarcity of data
leads to a poor generalization of the trained model. The classical global
pooled representation is likely to lose useful local information. Recently,
many few shot learning methods address this challenge by using deep descriptors
and learning a pixel-level metric. However, using deep descriptors as feature
representations may lose the contextual information of the image. And most of
these methods deal with each class in the support set independently, which
cannot sufficiently utilize discriminative information and task-specific
embeddings. In this paper, we propose a novel Transformer based neural network
architecture called Sparse Spatial Transformers (SSFormers), which can find
task-relevant features and suppress task-irrelevant features. Specifically, we
first divide each input image into several image patches of different sizes to
obtain dense local features. These features retain contextual information while
expressing local information. Then, a sparse spatial transformer layer is
proposed to find spatial correspondence between the query image and the entire
support set to select task-relevant image patches and suppress task-irrelevant
image patches. Finally, we propose to use an image patch matching module for
calculating the distance between dense local representations, thus to determine
which category the query image belongs to in the support set. Extensive
experiments on popular few-shot learning benchmarks show that our method
achieves the state-of-the-art performance
Learning Knowledge-Enhanced Contextual Language Representations for Domain Natural Language Understanding
Knowledge-Enhanced Pre-trained Language Models (KEPLMs) improve the
performance of various downstream NLP tasks by injecting knowledge facts from
large-scale Knowledge Graphs (KGs). However, existing methods for pre-training
KEPLMs with relational triples are difficult to be adapted to close domains due
to the lack of sufficient domain graph semantics. In this paper, we propose a
Knowledge-enhanced lANGuAge Representation learning framework for various
clOsed dOmains (KANGAROO) via capturing the implicit graph structure among the
entities. Specifically, since the entity coverage rates of closed-domain KGs
can be relatively low and may exhibit the global sparsity phenomenon for
knowledge injection, we consider not only the shallow relational
representations of triples but also the hyperbolic embeddings of deep
hierarchical entity-class structures for effective knowledge fusion.Moreover,
as two closed-domain entities under the same entity-class often have locally
dense neighbor subgraphs counted by max point biconnected component, we further
propose a data augmentation strategy based on contrastive learning over
subgraphs to construct hard negative samples of higher quality. It makes the
underlying KELPMs better distinguish the semantics of these neighboring
entities to further complement the global semantic sparsity. In the
experiments, we evaluate KANGAROO over various knowledge-aware and general NLP
tasks in both full and few-shot learning settings, outperforming various KEPLM
training paradigms performance in closed-domains significantly.Comment: emnlp 202
Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors
The impressive performance of deep convolutional neural networks in
single-view 3D reconstruction suggests that these models perform non-trivial
reasoning about the 3D structure of the output space. However, recent work has
challenged this belief, showing that complex encoder-decoder architectures
perform similarly to nearest-neighbor baselines or simple linear decoder models
that exploit large amounts of per category data in standard benchmarks. On the
other hand settings where 3D shape must be inferred for new categories with few
examples are more natural and require models that generalize about shapes. In
this work we demonstrate experimentally that naive baselines do not apply when
the goal is to learn to reconstruct novel objects using very few examples, and
that in a \emph{few-shot} learning setting, the network must learn concepts
that can be applied to new categories, avoiding rote memorization. To address
deficiencies in existing approaches to this problem, we propose three
approaches that efficiently integrate a class prior into a 3D reconstruction
model, allowing to account for intra-class variability and imposing an implicit
compositional structure that the model should learn. Experiments on the popular
ShapeNet database demonstrate that our method significantly outperform existing
baselines on this task in the few-shot setting
- …