685 research outputs found
Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets
Visual question answering (Visual QA) has attracted a lot of attention
lately, seen essentially as a form of (visual) Turing test that artificial
intelligence should strive to achieve. In this paper, we study a crucial
component of this task: how can we design good datasets for the task? We focus
on the design of multiple-choice based datasets where the learner has to select
the right answer from a set of candidate ones including the target (\ie the
correct one) and the decoys (\ie the incorrect ones). Through careful analysis
of the results attained by state-of-the-art learning models and human
annotators on existing datasets, we show that the design of the decoy answers
has a significant impact on how and what the learning models learn from the
datasets. In particular, the resulting learner can ignore the visual
information, the question, or both while still doing well on the task. Inspired
by this, we propose automatic procedures to remedy such design deficiencies. We
apply the procedures to re-construct decoy answers for two popular Visual QA
datasets as well as to create a new Visual QA dataset from the Visual Genome
project, resulting in the largest dataset for this task. Extensive empirical
studies show that the design deficiencies have been alleviated in the remedied
datasets and the performance on them is likely a more faithful indicator of the
difference among learning models. The datasets are released and publicly
available via http://www.teds.usc.edu/website_vqa/.Comment: Accepted for Oral Presentation at NAACL-HLT 201
An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild
Zero-shot learning (ZSL) methods have been studied in the unrealistic setting
where test data are assumed to come from unseen classes only. In this paper, we
advocate studying the problem of generalized zero-shot learning (GZSL) where
the test data's class memberships are unconstrained. We show empirically that
naively using the classifiers constructed by ZSL approaches does not perform
well in the generalized setting. Motivated by this, we propose a simple but
effective calibration method that can be used to balance two conflicting
forces: recognizing data from seen classes versus those from unseen ones. We
develop a performance metric to characterize such a trade-off and examine the
utility of this metric in evaluating various ZSL approaches. Our analysis
further shows that there is a large gap between the performance of existing
approaches and an upper bound established via idealized semantic embeddings,
suggesting that improving class semantic embeddings is vital to GZSL.Comment: ECCV2016 camera-read
Large-Margin Determinantal Point Processes
Determinantal point processes (DPPs) offer a powerful approach to modeling
diversity in many applications where the goal is to select a diverse subset. We
study the problem of learning the parameters (the kernel matrix) of a DPP from
labeled training data. We make two contributions. First, we show how to
reparameterize a DPP's kernel matrix with multiple kernel functions, thus
enhancing modeling flexibility. Second, we propose a novel parameter estimation
technique based on the principle of large margin separation. In contrast to the
state-of-the-art method of maximum likelihood estimation, our large-margin loss
function explicitly models errors in selecting the target subsets, and it can
be customized to trade off different types of errors (precision vs. recall).
Extensive empirical studies validate our contributions, including applications
on challenging document and video summarization, where flexibility in modeling
the kernel matrix and balancing different errors is indispensable.Comment: 15 page
How to Train Your MAML to Excel in Few-Shot Classification
Model-agnostic meta-learning (MAML) is arguably the most popular
meta-learning algorithm nowadays, given its flexibility to incorporate various
model architectures and to be applied to different problems. Nevertheless, its
performance on few-shot classification is far behind many recent algorithms
dedicated to the problem. In this paper, we point out several key facets of how
to train MAML to excel in few-shot classification. First, we find that a large
number of gradient steps are needed for the inner loop update, which
contradicts the common usage of MAML for few-shot classification. Second, we
find that MAML is sensitive to the permutation of class assignments in
meta-testing: for a few-shot task of classes, there are exponentially many
ways to assign the learned initialization of the -way classifier to the
classes, leading to an unavoidably huge variance. Third, we investigate several
ways for permutation invariance and find that learning a shared classifier
initialization for all the classes performs the best. On benchmark datasets
such as MiniImageNet and TieredImageNet, our approach, which we name
UNICORN-MAML, performs on a par with or even outperforms state-of-the-art
algorithms, while keeping the simplicity of MAML without adding any extra
sub-networks
- …