7,802 research outputs found
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
Rich and dense human labeled datasets are among the main enabling factors for
the recent advance on vision-language understanding. Many seemingly distant
annotations (e.g., semantic segmentation and visual question answering (VQA))
are inherently connected in that they reveal different levels and perspectives
of human understandings about the same visual scenes --- and even the same set
of images (e.g., of COCO). The popularity of COCO correlates those annotations
and tasks. Explicitly linking them up may significantly benefit both individual
tasks and the unified vision and language modeling. We present the preliminary
work of linking the instance segmentations provided by COCO to the questions
and answers (QAs) in the VQA dataset, and name the collected links visual
questions and segmentation answers (VQS). They transfer human supervision
between the previously separate tasks, offer more effective leverage to
existing problems, and also open the door for new research problems and models.
We study two applications of the VQS data in this paper: supervised attention
for VQA and a novel question-focused semantic segmentation task. For the
former, we obtain state-of-the-art results on the VQA real multiple-choice task
by simply augmenting the multilayer perceptrons with some attention features
that are learned using the segmentation-QA links as explicit supervision. To
put the latter in perspective, we study two plausible methods and compare them
to an oracle method assuming that the instance segmentations are given at the
test stage.Comment: To appear on ICCV 201
Recommended from our members
The effect of multiple knowledge sources on learning and teaching
Current paradigms for machine-based learning and teaching tend to perform their task in isolation from a rich context of existing knowledge. In contrast, the research project presented here takes the view that bringing multiple sources of knowledge to bear is of central importance to learning in complex domains. As a consequence teaching must both take advantage of and beware of interactions between new and existing knowledge. The central process which connects learning to its context is reasoning by analogy, a primary concern of this research. In teaching, the connection is provided by the explicit use of a learning model to reason about the choice of teaching actions. In this learning paradigm, new concepts are incrementally refined and integrated into a body of expertise, rather than being evaluated against a static notion of correctness. The domain chosen for this experimentation is that of learning to solve "algebra story problems." A model of acquiring problem solving skills in this domain is described, including: representational structures for background knowledge, a problem solving architecture, learning mechanisms, and the role of analogies in applying existing problem solving abilities to novel problems. Examples of learning are given for representative instances of algebra story problems. After relating our views to the psychological literature, we outline the design of a teaching system. Finally, we insist on the interdependence of learning and teaching and on the synergistic effects of conducting both research efforts in parallel
Physics Inspired Optimization on Semantic Transfer Features: An Alternative Method for Room Layout Estimation
In this paper, we propose an alternative method to estimate room layouts of
cluttered indoor scenes. This method enjoys the benefits of two novel
techniques. The first one is semantic transfer (ST), which is: (1) a
formulation to integrate the relationship between scene clutter and room layout
into convolutional neural networks; (2) an architecture that can be end-to-end
trained; (3) a practical strategy to initialize weights for very deep networks
under unbalanced training data distribution. ST allows us to extract highly
robust features under various circumstances, and in order to address the
computation redundance hidden in these features we develop a principled and
efficient inference scheme named physics inspired optimization (PIO). PIO's
basic idea is to formulate some phenomena observed in ST features into
mechanics concepts. Evaluations on public datasets LSUN and Hedau show that the
proposed method is more accurate than state-of-the-art methods.Comment: To appear in CVPR 2017. Project Page:
https://sites.google.com/view/st-pio
Pathways: Augmenting interoperability across scholarly repositories
In the emerging eScience environment, repositories of papers, datasets,
software, etc., should be the foundation of a global and natively-digital
scholarly communications system. The current infrastructure falls far short of
this goal. Cross-repository interoperability must be augmented to support the
many workflows and value-chains involved in scholarly communication. This will
not be achieved through the promotion of single repository architecture or
content representation, but instead requires an interoperability framework to
connect the many heterogeneous systems that will exist.
We present a simple data model and service architecture that augments
repository interoperability to enable scholarly value-chains to be implemented.
We describe an experiment that demonstrates how the proposed infrastructure can
be deployed to implement the workflow involved in the creation of an overlay
journal over several different repository systems (Fedora, aDORe, DSpace and
arXiv).Comment: 18 pages. Accepted for International Journal on Digital Libraries
special issue on Digital Libraries and eScienc
- …