960 research outputs found
Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal
Model-free reinforcement learning has recently been shown to be effective at
learning navigation policies from complex image input. However, these
algorithms tend to require large amounts of interaction with the environment,
which can be prohibitively costly to obtain on robots in the real world. We
present an approach for efficiently learning goal-directed navigation policies
on a mobile robot, from only a single coverage traversal of recorded data. The
navigation agent learns an effective policy over a diverse action space in a
large heterogeneous environment consisting of more than 2km of travel, through
buildings and outdoor regions that collectively exhibit large variations in
visual appearance, self-similarity, and connectivity. We compare pretrained
visual encoders that enable precomputation of visual embeddings to achieve a
throughput of tens of thousands of transitions per second at training time on a
commodity desktop computer, allowing agents to learn from millions of
trajectories of experience in a matter of hours. We propose multiple forms of
computationally efficient stochastic augmentation to enable the learned policy
to generalise beyond these precomputed embeddings, and demonstrate successful
deployment of the learned policy on the real robot without fine tuning, despite
environmental appearance differences at test time. The dataset and code
required to reproduce these results and apply the technique to other datasets
and robots is made publicly available at rl-navigation.github.io/deployable
A Similarity Measure for Material Appearance
We present a model to measure the similarity in appearance between different
materials, which correlates with human similarity judgments. We first create a
database of 9,000 rendered images depicting objects with varying materials,
shape and illumination. We then gather data on perceived similarity from
crowdsourced experiments; our analysis of over 114,840 answers suggests that
indeed a shared perception of appearance similarity exists. We feed this data
to a deep learning architecture with a novel loss function, which learns a
feature space for materials that correlates with such perceived appearance
similarity. Our evaluation shows that our model outperforms existing metrics.
Last, we demonstrate several applications enabled by our metric, including
appearance-based search for material suggestions, database visualization,
clustering and summarization, and gamut mapping.Comment: 12 pages, 17 figure
ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering
We propose a novel attention based deep learning architecture for visual
question answering task (VQA). Given an image and an image related natural
language question, VQA generates the natural language answer for the question.
Generating the correct answers requires the model's attention to focus on the
regions corresponding to the question, because different questions inquire
about the attributes of different image regions. We introduce an attention
based configurable convolutional neural network (ABC-CNN) to learn such
question-guided attention. ABC-CNN determines an attention map for an
image-question pair by convolving the image feature map with configurable
convolutional kernels derived from the question's semantics. We evaluate the
ABC-CNN architecture on three benchmark VQA datasets: Toronto COCO-QA, DAQUAR,
and VQA dataset. ABC-CNN model achieves significant improvements over
state-of-the-art methods on these datasets. The question-guided attention
generated by ABC-CNN is also shown to reflect the regions that are highly
relevant to the questions
Long-range UAV Thermal Geo-localization with Satellite Imagery
Onboard sensors, such as cameras and thermal sensors, have emerged as
effective alternatives to Global Positioning System (GPS) for geo-localization
in Unmanned Aerial Vehicle (UAV) navigation. Since GPS can suffer from signal
loss and spoofing problems, researchers have explored camera-based techniques
such as Visual Geo-localization (VG) using satellite RGB imagery. Additionally,
thermal geo-localization (TG) has become crucial for long-range UAV flights in
low-illumination environments. This paper proposes a novel thermal
geo-localization framework using satellite RGB imagery, which includes multiple
domain adaptation methods to address the limited availability of paired thermal
and satellite images. The experimental results demonstrate the effectiveness of
the proposed approach in achieving reliable thermal geo-localization
performance, even in thermal images with indistinct self-similar features. We
evaluate our approach on real data collected onboard a UAV. We also release the
code and \textit{Boson-nighttime}, a dataset of paired satellite-thermal and
unpaired satellite images for thermal geo-localization with satellite imagery.
To the best of our knowledge, this work is the first to propose a thermal
geo-localization method using satellite RGB imagery in long-range flights.Comment: 8 pages, 6 figures, IROS 202
Electronic Document Navigation Assistance Using Markings and/or Non-Uniform Scrolling
Generally, the present disclosure is directed to assisting in navigation within an electronic document. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to predict a location within an electronic document to be marked based on user interaction with the electronic document
Deep Learning Perspectives on Efficient Image Matching in Natural Image Databases
With the proliferation of digital content, efficient image matching in natural image databases has become paramount. Traditional image matching techniques, while effective to a certain extent, face challenges in dealing with the high variability inherent in natural images. This research delves into the application of deep learning models, particularly Convolutional Neural Networks (CNNs), Siamese Networks, and Triplet Networks, to address these challenges. We introduce various techniques to enhance efficiency, such as data augmentation, transfer learning, dimensionality reduction, efficient sampling, and the amalgamation of traditional computer vision strategies with deep learning. Our experimental results, garnered from specific dataset, demonstrate significant improvements in image matching efficiency, as quantified by metrics like precision, recall, F1-Score, and matching time. The findings underscore the potential of deep learning as a transformative tool for natural image database matching, setting the stage for further research and optimization in this domain
- …