41,321 research outputs found
Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets
Visual question answering (Visual QA) has attracted a lot of attention
lately, seen essentially as a form of (visual) Turing test that artificial
intelligence should strive to achieve. In this paper, we study a crucial
component of this task: how can we design good datasets for the task? We focus
on the design of multiple-choice based datasets where the learner has to select
the right answer from a set of candidate ones including the target (\ie the
correct one) and the decoys (\ie the incorrect ones). Through careful analysis
of the results attained by state-of-the-art learning models and human
annotators on existing datasets, we show that the design of the decoy answers
has a significant impact on how and what the learning models learn from the
datasets. In particular, the resulting learner can ignore the visual
information, the question, or both while still doing well on the task. Inspired
by this, we propose automatic procedures to remedy such design deficiencies. We
apply the procedures to re-construct decoy answers for two popular Visual QA
datasets as well as to create a new Visual QA dataset from the Visual Genome
project, resulting in the largest dataset for this task. Extensive empirical
studies show that the design deficiencies have been alleviated in the remedied
datasets and the performance on them is likely a more faithful indicator of the
difference among learning models. The datasets are released and publicly
available via http://www.teds.usc.edu/website_vqa/.Comment: Accepted for Oral Presentation at NAACL-HLT 201
Multiple Instance Learning: A Survey of Problem Characteristics and Applications
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research
Learning and Acting in Peripersonal Space: Moving, Reaching, and Grasping
The young infant explores its body, its sensorimotor system, and the
immediately accessible parts of its environment, over the course of a few
months creating a model of peripersonal space useful for reaching and grasping
objects around it. Drawing on constraints from the empirical literature on
infant behavior, we present a preliminary computational model of this learning
process, implemented and evaluated on a physical robot. The learning agent
explores the relationship between the configuration space of the arm, sensing
joint angles through proprioception, and its visual perceptions of the hand and
grippers. The resulting knowledge is represented as the peripersonal space
(PPS) graph, where nodes represent states of the arm, edges represent safe
movements, and paths represent safe trajectories from one pose to another. In
our model, the learning process is driven by intrinsic motivation. When
repeatedly performing an action, the agent learns the typical result, but also
detects unusual outcomes, and is motivated to learn how to make those unusual
results reliable. Arm motions typically leave the static background unchanged,
but occasionally bump an object, changing its static position. The reach action
is learned as a reliable way to bump and move an object in the environment.
Similarly, once a reliable reach action is learned, it typically makes a
quasi-static change in the environment, moving an object from one static
position to another. The unusual outcome is that the object is accidentally
grasped (thanks to the innate Palmar reflex), and thereafter moves dynamically
with the hand. Learning to make grasps reliable is more complex than for
reaches, but we demonstrate significant progress. Our current results are steps
toward autonomous sensorimotor learning of motion, reaching, and grasping in
peripersonal space, based on unguided exploration and intrinsic motivation.Comment: 35 pages, 13 figure
- …