4 research outputs found
Low-Shot Learning with Imprinted Weights
Human vision is able to immediately recognize novel visual categories after
seeing just one or a few training examples. We describe how to add a similar
capability to ConvNet classifiers by directly setting the final layer weights
from novel training examples during low-shot learning. We call this process
weight imprinting as it directly sets weights for a new category based on an
appropriately scaled copy of the embedding layer activations for that training
example. The imprinting process provides a valuable complement to training with
stochastic gradient descent, as it provides immediate good classification
performance and an initialization for any further fine-tuning in the future. We
show how this imprinting process is related to proxy-based embeddings. However,
it differs in that only a single imprinted weight vector is learned for each
novel category, rather than relying on a nearest-neighbor distance to training
instances as typically used with embedding methods. Our experiments show that
using averaging of imprinted weights provides better generalization than using
nearest-neighbor instance embeddings.Comment: CVPR 201
Event sequence metric learning
In this paper we consider a challenging problem of learning discriminative
vector representations for event sequences generated by real-world users.
Vector representations map behavioral client raw data to the low-dimensional
fixed-length vectors in the latent space. We propose a novel method of learning
those vector embeddings based on metric learning approach. We propose a
strategy of raw data subsequences generation to apply a metric learning
approach in a fully self-supervised way. We evaluated the method over several
public bank transactions datasets and showed that self-supervised embeddings
outperform other methods when applied to downstream classification tasks.
Moreover, embeddings are compact and provide additional user privacy
protection
ICAR: Image-based Complementary Auto Reasoning
Scene-aware Complementary Item Retrieval (CIR) is a challenging task which
requires to generate a set of compatible items across domains. Due to the
subjectivity, it is difficult to set up a rigorous standard for both data
collection and learning objectives. To address this challenging task, we
propose a visual compatibility concept, composed of similarity (resembling in
color, geometry, texture, and etc.) and complementarity (different items like
table vs chair completing a group). Based on this notion, we propose a
compatibility learning framework, a category-aware Flexible Bidirectional
Transformer (FBT), for visual "scene-based set compatibility reasoning" with
the cross-domain visual similarity input and auto-regressive complementary item
generation. We introduce a "Flexible Bidirectional Transformer (FBT)"
consisting of an encoder with flexible masking, a category prediction arm, and
an auto-regressive visual embedding prediction arm. And the inputs for FBT are
cross-domain visual similarity invariant embeddings, making this framework
quite generalizable. Furthermore, our proposed FBT model learns the
inter-object compatibility from a large set of scene images in a
self-supervised way. Compared with the SOTA methods, this approach achieves up
to 5.3% and 9.6% in FITB score and 22.3% and 31.8% SFID improvement on fashion
and furniture, respectively