Search CORE

4 research outputs found

Learning to Generate Unambiguous Spatial Referring Expressions for Real-World Environments

Author: Doğan Fethiye Irmak
Kalkan Sinan
Leite Iolanda
Publication venue
Publication date: 01/01/2019
Field of study

Referring to objects in a natural and unambiguous manner is crucial for effective human-robot interaction. Previous research on learning-based referring expressions has focused primarily on comprehension tasks, while generating referring expressions is still mostly limited to rule-based methods. In this work, we propose a two-stage approach that relies on deep learning for estimating spatial relations to describe an object naturally and unambiguously with a referring expression. We compare our method to the state of the art algorithm in ambiguous environments (e.g., environments that include very similar objects with similar relationships). We show that our method generates referring expressions that people find to be more accurate (

\sim

30% better) and would prefer to use (

\sim

32% more often).Comment: International Conference on Intelligent Robots and Systems (IROS 2019), Demo 1: Finding the described object (https://youtu.be/BE6-F6chW0w), Demo 2: Referring to the pointed object (https://youtu.be/nmmv6JUpy8M), Supplementary Video (https://youtu.be/sFjBa_MHS98

arXiv.org e-Print Archive

Crossref

OpenMETU (Middle East Technical University)

Continuous Spatial and Temporal Representations in Machine Vision

Author: Lu Thomas
Publication venue: 'University of Waterloo'
Publication date: 28/05/2021
Field of study

This thesis explores continuous spatial and temporal representations in machine vision. For spatial representations, we explore the Spatial Semantic Pointer as a biologically plausible representation of continuous space its use in performing spatial memory and reasoning tasks. We show that SSPs can be used to encode visual images into high dimensional memory vectors. These vectors can be used to store, retrieve, and manipulate spatial information, as well as perform search and scanning tasks within the vector algebra space. We also demonstrate the psychological plausibility of these representations by qualitatively reproducing Kosslyn's famous map scanning experiment. For temporal representations, we extend the original 1D Legendre Memory Unit to take multi-dimensional input signals and compare its ability to store temporal information against the Long Short-Term Memory Unit on the task of video action recognition. We show that the multi-dimensional LMU is able to match the LSTM in representing visual data over time. In particular, we demonstrate that the LMU is able to achieve much better performance when the total number of parameters is limited and that the LMU architecture allows it to continue operating at with fewer parameters than the LSTM

University of Waterloo's Institutional Repository

Urban Patterns :Using Spatial Arrangement for Vision-Based Place Recognition and Localisation

Author: Panphattarasap Pilailuck
Publication venue
Publication date: 25/06/2019
Field of study

Explore Bristol Research