24,955 research outputs found
Improved Robust Algorithms for Learning with Discriminative Feature Feedback
Discriminative Feature Feedback is a setting proposed by Dastupta et al.
(2018), which provides a protocol for interactive learning based on feature
explanations that are provided by a human teacher. The features distinguish
between the labels of pairs of possibly similar instances. That work has shown
that learning in this model can have considerable statistical and computational
advantages over learning in standard label-based interactive learning models.
In this work, we provide new robust interactive learning algorithms for the
Discriminative Feature Feedback model, with mistake bounds that are
significantly lower than those of previous robust algorithms for this setting.
In the adversarial setting, we reduce the dependence on the number of protocol
exceptions from quadratic to linear. In addition, we provide an algorithm for a
slightly more restricted model, which obtains an even smaller mistake bound for
large models with many exceptions.
In the stochastic setting, we provide the first algorithm that converges to
the exception rate with a polynomial sample complexity. Our algorithm and
analysis for the stochastic setting involve a new construction that we call
Feature Influence, which may be of wider applicability.Comment: AISTATS 202
Do text-free diffusion models learn discriminative visual representations?
While many unsupervised learning models focus on one family of tasks, either
generative or discriminative, we explore the possibility of a unified
representation learner: a model which addresses both families of tasks
simultaneously. We identify diffusion models, a state-of-the-art method for
generative tasks, as a prime candidate. Such models involve training a U-Net to
iteratively predict and remove noise, and the resulting model can synthesize
high-fidelity, diverse, novel images. We find that the intermediate feature
maps of the U-Net are diverse, discriminative feature representations. We
propose a novel attention mechanism for pooling feature maps and further
leverage this mechanism as DifFormer, a transformer feature fusion of features
from different diffusion U-Net blocks and noise steps. We also develop DifFeed,
a novel feedback mechanism tailored to diffusion. We find that diffusion models
are better than GANs, and, with our fusion and feedback mechanisms, can compete
with state-of-the-art unsupervised image representation learning methods for
discriminative tasks - image classification with full and semi-supervision,
transfer for fine-grained classification, object detection and segmentation,
and semantic segmentation. Our project website
(https://mgwillia.github.io/diffssl/) and code
(https://github.com/soumik-kanad/diffssl) are available publicly.Comment: Website: see https://mgwillia.github.io/diffssl/ . Code: see
https://github.com/soumik-kanad/diffssl . The first two authors contributed
equally. 15 pages, 9 figures, 15 tables. Submission under review. (this
article supersedes arXiv:2307.08702
Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification
© 2020, Springer Nature Switzerland AG. Zero-shot learning strives to classify unseen categories for which no data is available during training. In the generalized variant, the test samples can further belong to seen or unseen categories. The state-of-the-art relies on Generative Adversarial Networks that synthesize unseen class features by leveraging class-specific semantic embeddings. During training, they generate semantically consistent features, but discard this constraint during feature synthesis and classification. We propose to enforce semantic consistency at all stages of (generalized) zero-shot learning: training, feature synthesis and classification. We first introduce a feedback loop, from a semantic embedding decoder, that iteratively refines the generated features during both the training and feature synthesis stages. The synthesized features together with their corresponding latent embeddings from the decoder are then transformed into discriminative features and utilized during classification to reduce ambiguities among categories. Experiments on (generalized) zero-shot object and action classification reveal the benefit of semantic consistency and iterative feedback, outperforming existing methods on six zero-shot learning benchmarks. Source code at https://github.com/akshitac8/tfvaegan
Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification
Zero-shot learning strives to classify unseen categories for which no data is
available during training. In the generalized variant, the test samples can
further belong to seen or unseen categories. The state-of-the-art relies on
Generative Adversarial Networks that synthesize unseen class features by
leveraging class-specific semantic embeddings. During training, they generate
semantically consistent features, but discard this constraint during feature
synthesis and classification. We propose to enforce semantic consistency at all
stages of (generalized) zero-shot learning: training, feature synthesis and
classification. We first introduce a feedback loop, from a semantic embedding
decoder, that iteratively refines the generated features during both the
training and feature synthesis stages. The synthesized features together with
their corresponding latent embeddings from the decoder are then transformed
into discriminative features and utilized during classification to reduce
ambiguities among categories. Experiments on (generalized) zero-shot object and
action classification reveal the benefit of semantic consistency and iterative
feedback, outperforming existing methods on six zero-shot learning benchmarks.
Source code at https://github.com/akshitac8/tfvaegan.Comment: Accepted for publication at ECCV 202
ECCV (22) - Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification
Zero-shot learning strives to classify unseen categories for which no data is
available during training. In the generalized variant, the test samples can
further belong to seen or unseen categories. The state-of-the-art relies on
Generative Adversarial Networks that synthesize unseen class features by
leveraging class-specific semantic embeddings. During training, they generate
semantically consistent features, but discard this constraint during feature
synthesis and classification. We propose to enforce semantic consistency at all
stages of (generalized) zero-shot learning: training, feature synthesis and
classification. We first introduce a feedback loop, from a semantic embedding
decoder, that iteratively refines the generated features during both the
training and feature synthesis stages. The synthesized features together with
their corresponding latent embeddings from the decoder are then transformed
into discriminative features and utilized during classification to reduce
ambiguities among categories. Experiments on (generalized) zero-shot object and
action classification reveal the benefit of semantic consistency and iterative
feedback, outperforming existing methods on six zero-shot learning benchmarks.
Source code at https://github.com/akshitac8/tfvaegan.Comment: Accepted for publication at ECCV 202
Sparse Transfer Learning for Interactive Video Search Reranking
Visual reranking is effective to improve the performance of the text-based
video search. However, existing reranking algorithms can only achieve limited
improvement because of the well-known semantic gap between low level visual
features and high level semantic concepts. In this paper, we adopt interactive
video search reranking to bridge the semantic gap by introducing user's
labeling effort. We propose a novel dimension reduction tool, termed sparse
transfer learning (STL), to effectively and efficiently encode user's labeling
information. STL is particularly designed for interactive video search
reranking. Technically, it a) considers the pair-wise discriminative
information to maximally separate labeled query relevant samples from labeled
query irrelevant ones, b) achieves a sparse representation for the subspace to
encodes user's intention by applying the elastic net penalty, and c) propagates
user's labeling information from labeled samples to unlabeled samples by using
the data distribution knowledge. We conducted extensive experiments on the
TRECVID 2005, 2006 and 2007 benchmark datasets and compared STL with popular
dimension reduction algorithms. We report superior performance by using the
proposed STL based interactive video search reranking.Comment: 17 page
- …