7,666 research outputs found
Thinking in Islands; the Portuguese Perception of the Indonesian Archipelago and Particularly of Sunda in Early Texts and Charts
This article discusses various early sources on the Indonesian archipelago. It starts with the status of knowledge before the first voyage of the Portuguese to the Moluccas from accounts of travellers to insular Southeast Asia in the Middle Ages and the picture on world maps European cartographers produced. Comparing that view with text sources and the resulting geographic material of the first expeditions by the Portuguese provides an insight into contemporary mechanisms of knowledge transfer. Certain effects can be traced and are repeated on different levels of access to the original facts mainly because most maps were drawn up in Europe but based on the geographic description provided by text accounts. An abundance and multiplication of failures and mistakes is evident and is partly related to the scarcity of sources and due to reproduction techniques
Context-aware Captions from Context-agnostic Supervision
We introduce an inference technique to produce discriminative context-aware
image captions (captions that describe differences between images or visual
concepts) using only generic context-agnostic training data (captions that
describe a concept or an image in isolation). For example, given images and
captions of "siamese cat" and "tiger cat", we generate language that describes
the "siamese cat" in a way that distinguishes it from "tiger cat". Our key
novelty is that we show how to do joint inference over a language model that is
context-agnostic and a listener which distinguishes closely-related concepts.
We first apply our technique to a justification task, namely to describe why an
image contains a particular fine-grained category as opposed to another
closely-related category of the CUB-200-2011 dataset. We then study
discriminative image captioning to generate language that uniquely refers to
one of two semantically-similar images in the COCO dataset. Evaluations with
discriminative ground truth for justification and human studies for
discriminative image captioning reveal that our approach outperforms baseline
generative and speaker-listener approaches for discrimination.Comment: Accepted to CVPR 2017 (Spotlight
Reinforced Video Captioning with Entailment Rewards
Sequence-to-sequence models have shown promising improvements on the temporal
task of video captioning, but they optimize word-level cross-entropy loss
during training. First, using policy gradient and mixed-loss methods for
reinforcement learning, we directly optimize sentence-level task-based metrics
(as rewards), achieving significant improvements over the baseline, based on
both automatic metrics and human evaluation on multiple datasets. Next, we
propose a novel entailment-enhanced reward (CIDEnt) that corrects
phrase-matching based metrics (such as CIDEr) to only allow for
logically-implied partial matches and avoid contradictions, achieving further
significant improvements over the CIDEr-reward model. Overall, our
CIDEnt-reward model achieves the new state-of-the-art on the MSR-VTT dataset.Comment: EMNLP 2017 (9 pages
Memcapacitive Devices in Logic and Crossbar Applications
Over the last decade, memristive devices have been widely adopted in
computing for various conventional and unconventional applications. While the
integration density, memory property, and nonlinear characteristics have many
benefits, reducing the energy consumption is limited by the resistive nature of
the devices. Memcapacitors would address that limitation while still having all
the benefits of memristors. Recent work has shown that with adjusted parameters
during the fabrication process, a metal-oxide device can indeed exhibit a
memcapacitive behavior. We introduce novel memcapacitive logic gates and
memcapacitive crossbar classifiers as a proof of concept that such applications
can outperform memristor-based architectures. The results illustrate that,
compared to memristive logic gates, our memcapacitive gates consume about 7x
less power. The memcapacitive crossbar classifier achieves similar
classification performance but reduces the power consumption by a factor of
about 1,500x for the MNIST dataset and a factor of about 1,000x for the
CIFAR-10 dataset compared to a memristive crossbar. Our simulation results
demonstrate that memcapacitive devices have great potential for both Boolean
logic and analog low-power applications
Coherent Multi-Sentence Video Description with Variable Level of Detail
Humans can easily describe what they see in a coherent way and at varying
level of detail. However, existing approaches for automatic video description
are mainly focused on single sentence generation and produce descriptions at a
fixed level of detail. In this paper, we address both of these limitations: for
a variable level of detail we produce coherent multi-sentence descriptions of
complex videos. We follow a two-step approach where we first learn to predict a
semantic representation (SR) from video and then generate natural language
descriptions from the SR. To produce consistent multi-sentence descriptions, we
model across-sentence consistency at the level of the SR by enforcing a
consistent topic. We also contribute both to the visual recognition of objects
proposing a hand-centric approach as well as to the robust generation of
sentences using a word lattice. Human judges rate our multi-sentence
descriptions as more readable, correct, and relevant than related work. To
understand the difference between more detailed and shorter descriptions, we
collect and analyze a video description corpus of three levels of detail.Comment: 10 page
- …