105,630 research outputs found
Exploiting Scene-specific Features for Object Goal Navigation
Can the intrinsic relation between an object and the room in which it is
usually located help agents in the Visual Navigation Task? We study this
question in the context of Object Navigation, a problem in which an agent has
to reach an object of a specific class while moving in a complex domestic
environment. In this paper, we introduce a new reduced dataset that speeds up
the training of navigation models, a notoriously complex task. Our proposed
dataset permits the training of models that do not exploit online-built maps in
reasonable times even without the use of huge computational resources.
Therefore, this reduced dataset guarantees a significant benchmark and it can
be used to identify promising models that could be then tried on bigger and
more challenging datasets. Subsequently, we propose the SMTSC model, an
attention-based model capable of exploiting the correlation between scenes and
objects contained in them, highlighting quantitatively how the idea is correct.Comment: Accepted at ACVR2020 ECCV2020 Worksho
How hard is it to cross the room? -- Training (Recurrent) Neural Networks to steer a UAV
This work explores the feasibility of steering a drone with a (recurrent)
neural network, based on input from a forward looking camera, in the context of
a high-level navigation task. We set up a generic framework for training a
network to perform navigation tasks based on imitation learning. It can be
applied to both aerial and land vehicles. As a proof of concept we apply it to
a UAV (Unmanned Aerial Vehicle) in a simulated environment, learning to cross a
room containing a number of obstacles. So far only feedforward neural networks
(FNNs) have been used to train UAV control. To cope with more complex tasks, we
propose the use of recurrent neural networks (RNN) instead and successfully
train an LSTM (Long-Short Term Memory) network for controlling UAVs. Vision
based control is a sequential prediction problem, known for its highly
correlated input data. The correlation makes training a network hard,
especially an RNN. To overcome this issue, we investigate an alternative
sampling method during training, namely window-wise truncated backpropagation
through time (WW-TBPTT). Further, end-to-end training requires a lot of data
which often is not available. Therefore, we compare the performance of
retraining only the Fully Connected (FC) and LSTM control layers with networks
which are trained end-to-end. Performing the relatively simple task of crossing
a room already reveals important guidelines and good practices for training
neural control networks. Different visualizations help to explain the behavior
learned.Comment: 12 pages, 30 figure
Virtual Reference for Video Collections: System Infrastructure, User Interface and Pilot User Study
A new video-based Virtual Reference (VR) tool called VideoHelp was designed and developed to support video
navigation escorting, a function that enables librarians to co-navigate a digital video with patrons in the web-based
environment. A client/server infrastructure was adopted for the VideoHelp system and timestamps were used to achieve
the video synchronization between the librarians and patrons. A pilot usability study of using VideoHelp prototype in video seeking was conducted and the preliminary results demonstrated that the system is easy to learn and use, and real-time assistance from virtual librarians in video navigation is desirable on a conditional basis
Navigation: am I really lost or virtually there?
Data is presented from virtual environment (VE) navigation studies that used building- and chessboard-type layouts. Participants learned by repeated navigation, spending several hours in each environment. While some participants quickly learned to navigate efficiently, others remained almost totally disoriented. In the virtual buildings this disorientation was illustrated by mean direction estimate errors of approximately 90°, and in the chessboard VEs disorientation was highlighted by the large number of rooms that some participants visited. Part of the cause of disorientation, and generally slow spatial learning, lies in the difficulty participants had learning the paths they had followed through the VEs
AmIE: An Ambient Intelligent Environment for Assisted Living
In the modern world of technology Internet-of-things (IoT) systems strives to
provide an extensive interconnected and automated solutions for almost every
life aspect. This paper proposes an IoT context-aware system to present an
Ambient Intelligence (AmI) environment; such as an apartment, house, or a
building; to assist blind, visually-impaired, and elderly people. The proposed
system aims at providing an easy-to-utilize voice-controlled system to locate,
navigate and assist users indoors. The main purpose of the system is to provide
indoor positioning, assisted navigation, outside weather information, room
temperature, people availability, phone calls and emergency evacuation when
needed. The system enhances the user's awareness of the surrounding environment
by feeding them with relevant information through a wearable device to assist
them. In addition, the system is voice-controlled in both English and Arabic
languages and the information are displayed as audio messages in both
languages. The system design, implementation, and evaluation consider the
constraints in common types of premises in Kuwait and in challenges, such as
the training needed by the users. This paper presents cost-effective
implementation options by the adoption of a Raspberry Pi microcomputer,
Bluetooth Low Energy devices and an Android smart watch.Comment: 6 pages, 8 figures, 1 tabl
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
A robot that can carry out a natural-language instruction has been a dream
since before the Jetsons cartoon series imagined a life of leisure mediated by
a fleet of attentive robot helpers. It is a dream that remains stubbornly
distant. However, recent advances in vision and language methods have made
incredible progress in closely related areas. This is significant because a
robot interpreting a natural-language navigation instruction on the basis of
what it sees is carrying out a vision and language process that is similar to
Visual Question Answering. Both tasks can be interpreted as visually grounded
sequence-to-sequence translation problems, and many of the same methods are
applicable. To enable and encourage the application of vision and language
methods to the problem of interpreting visually-grounded navigation
instructions, we present the Matterport3D Simulator -- a large-scale
reinforcement learning environment based on real imagery. Using this simulator,
which can in future support a range of embodied vision and language tasks, we
provide the first benchmark dataset for visually-grounded natural language
navigation in real buildings -- the Room-to-Room (R2R) dataset.Comment: CVPR 2018 Spotlight presentatio
- …