850,834 research outputs found
Evaluating the Representational Hub of Language and Vision Models
The multimodal models used in the emerging field at the intersection of
computational linguistics and computer vision implement the bottom-up
processing of the `Hub and Spoke' architecture proposed in cognitive science to
represent how the brain processes and combines multi-sensory inputs. In
particular, the Hub is implemented as a neural network encoder. We investigate
the effect on this encoder of various vision-and-language tasks proposed in the
literature: visual question answering, visual reference resolution, and
visually grounded dialogue. To measure the quality of the representations
learned by the encoder, we use two kinds of analyses. First, we evaluate the
encoder pre-trained on the different vision-and-language tasks on an existing
diagnostic task designed to assess multimodal semantic understanding. Second,
we carry out a battery of analyses aimed at studying how the encoder merges and
exploits the two modalities.Comment: Accepted to IWCS 201
Fooling Vision and Language Models Despite Localization and Attention Mechanism
Adversarial attacks are known to succeed on classifiers, but it has been an
open question whether more complex vision systems are vulnerable. In this
paper, we study adversarial examples for vision and language models, which
incorporate natural language understanding and complex structures such as
attention, localization, and modular architectures. In particular, we
investigate attacks on a dense captioning model and on two visual question
answering (VQA) models. Our evaluation shows that we can generate adversarial
examples with a high success rate (i.e., > 90%) for these models. Our work
sheds new light on understanding adversarial attacks on vision systems which
have a language component and shows that attention, bounding box localization,
and compositional internal structures are vulnerable to adversarial attacks.
These observations will inform future work towards building effective defenses.Comment: CVPR 201
Style Transfer in Text: Exploration and Evaluation
Style transfer is an important problem in natural language processing (NLP).
However, the progress in language style transfer is lagged behind other
domains, such as computer vision, mainly because of the lack of parallel data
and principle evaluation metrics. In this paper, we propose to learn style
transfer with non-parallel data. We explore two models to achieve this goal,
and the key idea behind the proposed models is to learn separate content
representations and style representations using adversarial networks. We also
propose novel evaluation metrics which measure two aspects of style transfer:
transfer strength and content preservation. We access our models and the
evaluation metrics on two tasks: paper-news title transfer, and
positive-negative review transfer. Results show that the proposed content
preservation metric is highly correlate to human judgments, and the proposed
models are able to generate sentences with higher style transfer strength and
similar content preservation score comparing to auto-encoder.Comment: To appear in AAAI-1
- …
