2 research outputs found
Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory
The role of robots in society keeps expanding, bringing with it the necessity
of interacting and communicating with humans. In order to keep such interaction
intuitive, we provide automatic wayfinding based on verbal navigational
instructions. Our first contribution is the creation of a large-scale dataset
with verbal navigation instructions. To this end, we have developed an
interactive visual navigation environment based on Google Street View; we
further design an annotation method to highlight mined anchor landmarks and
local directions between them in order to help annotators formulate typical,
human references to those. The annotation task was crowdsourced on the AMT
platform, to construct a new Talk2Nav dataset with routes. Our second
contribution is a new learning method. Inspired by spatial cognition research
on the mental conceptualization of navigational instructions, we introduce a
soft dual attention mechanism defined over the segmented language instructions
to jointly extract two partial instructions -- one for matching the next
upcoming visual landmark and the other for matching the local directions to the
next landmark. On the similar lines, we also introduce spatial memory scheme to
encode the local directional transitions. Our work takes advantage of the
advance in two lines of research: mental formalization of verbal navigational
instructions and training neural network agents for automatic way finding.
Extensive experiments show that our method significantly outperforms previous
navigation methods. For demo video, dataset and code, please refer to our
project page: https://www.trace.ethz.ch/publications/2019/talk2nav/index.htmlComment: 20 pages, 10 Figures, Demo Video:
https://people.ee.ethz.ch/~arunv/resources/talk2nav.mp
Navigation using special buildings as signposts
Navigation has been greatly improved by positioning systems, but visualization still relies on maps. Yet because they only represent an abstract street network, maps are sometimes difficult to read. Conversely, Tourist Maps, which are enriched with landmark drawings, have been shown to be much more intuitive to understand. However, outside the very centres of cities, major landmarks are too sparse to be helpful. In this work, we present a method to automatically augment maps with most locally prominent such buildings, at multiple scale. Further, we generate a characterization which helps emphasize the special attributes of these buildings. Descriptive features are extracted from facades, analyzed and re-ranked to match human perception. To do so, we collected a total number of over 5900 human annotations to characterize 117 facades across 3 different cities. Finally, the characterizations are also used to produce natural language descriptions of the facades.Weissenberg J., Gygli M., Riemenschneider H., Van Gool L., ''Navigation using special buildings as signposts'', MapInteract 2014 - 2nd ACM SIGSPATIAL workshop on interacting wiht maps, pp. 8-14, November 4-7, 2014, Dallas, Texas, USA.status: publishe