105,630 research outputs found

    Exploiting Scene-specific Features for Object Goal Navigation

    Full text link
    Can the intrinsic relation between an object and the room in which it is usually located help agents in the Visual Navigation Task? We study this question in the context of Object Navigation, a problem in which an agent has to reach an object of a specific class while moving in a complex domestic environment. In this paper, we introduce a new reduced dataset that speeds up the training of navigation models, a notoriously complex task. Our proposed dataset permits the training of models that do not exploit online-built maps in reasonable times even without the use of huge computational resources. Therefore, this reduced dataset guarantees a significant benchmark and it can be used to identify promising models that could be then tried on bigger and more challenging datasets. Subsequently, we propose the SMTSC model, an attention-based model capable of exploiting the correlation between scenes and objects contained in them, highlighting quantitatively how the idea is correct.Comment: Accepted at ACVR2020 ECCV2020 Worksho

    How hard is it to cross the room? -- Training (Recurrent) Neural Networks to steer a UAV

    Full text link
    This work explores the feasibility of steering a drone with a (recurrent) neural network, based on input from a forward looking camera, in the context of a high-level navigation task. We set up a generic framework for training a network to perform navigation tasks based on imitation learning. It can be applied to both aerial and land vehicles. As a proof of concept we apply it to a UAV (Unmanned Aerial Vehicle) in a simulated environment, learning to cross a room containing a number of obstacles. So far only feedforward neural networks (FNNs) have been used to train UAV control. To cope with more complex tasks, we propose the use of recurrent neural networks (RNN) instead and successfully train an LSTM (Long-Short Term Memory) network for controlling UAVs. Vision based control is a sequential prediction problem, known for its highly correlated input data. The correlation makes training a network hard, especially an RNN. To overcome this issue, we investigate an alternative sampling method during training, namely window-wise truncated backpropagation through time (WW-TBPTT). Further, end-to-end training requires a lot of data which often is not available. Therefore, we compare the performance of retraining only the Fully Connected (FC) and LSTM control layers with networks which are trained end-to-end. Performing the relatively simple task of crossing a room already reveals important guidelines and good practices for training neural control networks. Different visualizations help to explain the behavior learned.Comment: 12 pages, 30 figure

    Virtual Reference for Video Collections: System Infrastructure, User Interface and Pilot User Study

    Get PDF
    A new video-based Virtual Reference (VR) tool called VideoHelp was designed and developed to support video navigation escorting, a function that enables librarians to co-navigate a digital video with patrons in the web-based environment. A client/server infrastructure was adopted for the VideoHelp system and timestamps were used to achieve the video synchronization between the librarians and patrons. A pilot usability study of using VideoHelp prototype in video seeking was conducted and the preliminary results demonstrated that the system is easy to learn and use, and real-time assistance from virtual librarians in video navigation is desirable on a conditional basis

    Navigation: am I really lost or virtually there?

    Get PDF
    Data is presented from virtual environment (VE) navigation studies that used building- and chessboard-type layouts. Participants learned by repeated navigation, spending several hours in each environment. While some participants quickly learned to navigate efficiently, others remained almost totally disoriented. In the virtual buildings this disorientation was illustrated by mean direction estimate errors of approximately 90°, and in the chessboard VEs disorientation was highlighted by the large number of rooms that some participants visited. Part of the cause of disorientation, and generally slow spatial learning, lies in the difficulty participants had learning the paths they had followed through the VEs

    AmIE: An Ambient Intelligent Environment for Assisted Living

    Full text link
    In the modern world of technology Internet-of-things (IoT) systems strives to provide an extensive interconnected and automated solutions for almost every life aspect. This paper proposes an IoT context-aware system to present an Ambient Intelligence (AmI) environment; such as an apartment, house, or a building; to assist blind, visually-impaired, and elderly people. The proposed system aims at providing an easy-to-utilize voice-controlled system to locate, navigate and assist users indoors. The main purpose of the system is to provide indoor positioning, assisted navigation, outside weather information, room temperature, people availability, phone calls and emergency evacuation when needed. The system enhances the user's awareness of the surrounding environment by feeding them with relevant information through a wearable device to assist them. In addition, the system is voice-controlled in both English and Arabic languages and the information are displayed as audio messages in both languages. The system design, implementation, and evaluation consider the constraints in common types of premises in Kuwait and in challenges, such as the training needed by the users. This paper presents cost-effective implementation options by the adoption of a Raspberry Pi microcomputer, Bluetooth Low Energy devices and an Android smart watch.Comment: 6 pages, 8 figures, 1 tabl

    Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

    Full text link
    A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matterport3D Simulator -- a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -- the Room-to-Room (R2R) dataset.Comment: CVPR 2018 Spotlight presentatio
    • …
    corecore