3,921 research outputs found
Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes
In this paper, we move towards combining large parametric models with
non-parametric prototypical networks. We propose prototypical fine-tuning, a
novel prototypical framework for fine-tuning pretrained language models (LM),
which automatically learns a bias to improve predictive performance for varying
data sizes, especially low-resource settings. Our prototypical fine-tuning
approach can automatically adjust the model capacity according to the number of
data points and the model's inherent attributes. Moreover, we propose four
principles for effective prototype fine-tuning towards the optimal solution.
Experimental results across various datasets show that our work achieves
significant performance improvements under various low-resource settings, as
well as comparable and usually better performances in high-resource scenarios.Comment: Published as a conference paper at AAAI 202
Breaking Sticks and Ambiguities with Adaptive Skip-gram
Recently proposed Skip-gram model is a powerful method for learning
high-dimensional word representations that capture rich semantic relationships
between words. However, Skip-gram as well as most prior work on learning word
representations does not take into account word ambiguity and maintain only
single representation per word. Although a number of Skip-gram modifications
were proposed to overcome this limitation and learn multi-prototype word
representations, they either require a known number of word meanings or learn
them using greedy heuristic approaches. In this paper we propose the Adaptive
Skip-gram model which is a nonparametric Bayesian extension of Skip-gram
capable to automatically learn the required number of representations for all
words at desired semantic resolution. We derive efficient online variational
learning algorithm for the model and empirically demonstrate its efficiency on
word-sense induction task
Learning to Learn Variational Semantic Memory
In this paper, we introduce variational semantic memory into meta-learning to
acquire long-term knowledge for few-shot learning. The variational semantic
memory accrues and stores semantic information for the probabilistic inference
of class prototypes in a hierarchical Bayesian framework. The semantic memory
is grown from scratch and gradually consolidated by absorbing information from
tasks it experiences. By doing so, it is able to accumulate long-term, general
knowledge that enables it to learn new concepts of objects. We formulate memory
recall as the variational inference of a latent memory variable from addressed
contents, which offers a principled way to adapt the knowledge to individual
tasks. Our variational semantic memory, as a new long-term memory module,
confers principled recall and update mechanisms that enable semantic
information to be efficiently accrued and adapted for few-shot learning.
Experiments demonstrate that the probabilistic modelling of prototypes achieves
a more informative representation of object classes compared to deterministic
vectors. The consistent new state-of-the-art performance on four benchmarks
shows the benefit of variational semantic memory in boosting few-shot
recognition.Comment: accepted to NeurIPS 2020; code is available in
https://github.com/YDU-uva/VS
Amortized Bayesian prototype meta-learning: A new probabilistic meta-learning approach to few-shot image classification
Probabilistic meta-learning methods recently have achieved impressive success in few-shot image classification. However, they introduce a huge number of random variables for neural network weights and thus severe computational and inferential challenges. In this paper, we propose a novel probabilistic meta-learning method called amortized Bayesian prototype meta-learning. In contrast to previous methods, we introduce only a small number of random variables for latent class prototypes rather than a huge number for network weights; we learn to learn the posterior distributions of these latent prototypes in an amortized inference way with no need for an extra amortization network, such that we can easily approximate their posteriors conditional on few labeled samples, whenever at meta-training or meta-testing stage. The proposed method can be trained end-to-end without any pre-training. Compared with other probabilistic meta-learning methods, our proposed approach is more interpretable with much less random variables, while still be able to achieve competitive performance for few-shot image classification problems on various benchmark datasets. Its excellent robustness and predictive uncertainty are also demonstrated through ablation studies
- …