17,201 research outputs found
Unified Pragmatic Models for Generating and Following Instructions
We show that explicit pragmatic inference aids in correctly generating and
following natural language instructions for complex, sequential tasks. Our
pragmatics-enabled models reason about why speakers produce certain
instructions, and about how listeners will react upon hearing them. Like
previous pragmatic models, we use learned base listener and speaker models to
build a pragmatic speaker that uses the base listener to simulate the
interpretation of candidate descriptions, and a pragmatic listener that reasons
counterfactually about alternative descriptions. We extend these models to
tasks with sequential structure. Evaluation of language generation and
interpretation shows that pragmatic inference improves state-of-the-art
listener models (at correctly interpreting human instructions) and speaker
models (at producing instructions correctly interpreted by humans) in diverse
settings.Comment: NAACL 2018, camera-ready versio
Semantic Visual Localization
Robust visual localization under a wide range of viewing conditions is a
fundamental problem in computer vision. Handling the difficult cases of this
problem is not only very challenging but also of high practical relevance,
e.g., in the context of life-long localization for augmented reality or
autonomous robots. In this paper, we propose a novel approach based on a joint
3D geometric and semantic understanding of the world, enabling it to succeed
under conditions where previous approaches failed. Our method leverages a novel
generative model for descriptor learning, trained on semantic scene completion
as an auxiliary task. The resulting 3D descriptors are robust to missing
observations by encoding high-level 3D geometric and semantic information.
Experiments on several challenging large-scale localization datasets
demonstrate reliable localization under extreme viewpoint, illumination, and
geometry changes
Learning Generative Models across Incomparable Spaces
Generative Adversarial Networks have shown remarkable success in learning a
distribution that faithfully recovers a reference distribution in its entirety.
However, in some cases, we may want to only learn some aspects (e.g., cluster
or manifold structure), while modifying others (e.g., style, orientation or
dimension). In this work, we propose an approach to learn generative models
across such incomparable spaces, and demonstrate how to steer the learned
distribution towards target properties. A key component of our model is the
Gromov-Wasserstein distance, a notion of discrepancy that compares
distributions relationally rather than absolutely. While this framework
subsumes current generative models in identically reproducing distributions,
its inherent flexibility allows application to tasks in manifold learning,
relational learning and cross-domain learning.Comment: International Conference on Machine Learning (ICML
Cross-Domain Image Retrieval with Attention Modeling
With the proliferation of e-commerce websites and the ubiquitousness of smart
phones, cross-domain image retrieval using images taken by smart phones as
queries to search products on e-commerce websites is emerging as a popular
application. One challenge of this task is to locate the attention of both the
query and database images. In particular, database images, e.g. of fashion
products, on e-commerce websites are typically displayed with other
accessories, and the images taken by users contain noisy background and large
variations in orientation and lighting. Consequently, their attention is
difficult to locate. In this paper, we exploit the rich tag information
available on the e-commerce websites to locate the attention of database
images. For query images, we use each candidate image in the database as the
context to locate the query attention. Novel deep convolutional neural network
architectures, namely TagYNet and CtxYNet, are proposed to learn the attention
weights and then extract effective representations of the images. Experimental
results on public datasets confirm that our approaches have significant
improvement over the existing methods in terms of the retrieval accuracy and
efficiency.Comment: 8 pages with an extra reference pag
Dynamical modeling of collective behavior from pigeon flight data: flock cohesion and dispersion
Several models of flocking have been promoted based on simulations with
qualitatively naturalistic behavior. In this paper we provide the first direct
application of computational modeling methods to infer flocking behavior from
experimental field data. We show that this approach is able to infer general
rules for interaction, or lack of interaction, among members of a flock or,
more generally, any community. Using experimental field measurements of homing
pigeons in flight we demonstrate the existence of a basic distance dependent
attraction/repulsion relationship and show that this rule is sufficient to
explain collective behavior observed in nature. Positional data of individuals
over time are used as input data to a computational algorithm capable of
building complex nonlinear functions that can represent the system behavior.
Topological nearest neighbor interactions are considered to characterize the
components within this model. The efficacy of this method is demonstrated with
simulated noisy data generated from the classical (two dimensional) Vicsek
model. When applied to experimental data from homing pigeon flights we show
that the more complex three dimensional models are capable of predicting and
simulating trajectories, as well as exhibiting realistic collective dynamics.
The simulations of the reconstructed models are used to extract properties of
the collective behavior in pigeons, and how it is affected by changing the
initial conditions of the system. Our results demonstrate that this approach
may be applied to construct models capable of simulating trajectories and
collective dynamics using experimental field measurements of herd movement.
From these models, the behavior of the individual agents (animals) may be
inferred
Geometric Cross-Modal Comparison of Heterogeneous Sensor Data
In this work, we address the problem of cross-modal comparison of aerial data
streams. A variety of simulated automobile trajectories are sensed using two
different modalities: full-motion video, and radio-frequency (RF) signals
received by detectors at various locations. The information represented by the
two modalities is compared using self-similarity matrices (SSMs) corresponding
to time-ordered point clouds in feature spaces of each of these data sources;
we note that these feature spaces can be of entirely different scale and
dimensionality. Several metrics for comparing SSMs are explored, including a
cutting-edge time-warping technique that can simultaneously handle local time
warping and partial matches, while also controlling for the change in geometry
between feature spaces of the two modalities. We note that this technique is
quite general, and does not depend on the choice of modalities. In this
particular setting, we demonstrate that the cross-modal distance between SSMs
corresponding to the same trajectory type is smaller than the cross-modal
distance between SSMs corresponding to distinct trajectory types, and we
formalize this observation via precision-recall metrics in experiments.
Finally, we comment on promising implications of these ideas for future
integration into multiple-hypothesis tracking systems.Comment: 10 pages, 13 figures, Proceedings of IEEE Aeroconf 201
- …