42,269 research outputs found
Transfer Learning-Based Crack Detection by Autonomous UAVs
Unmanned Aerial Vehicles (UAVs) have recently shown great performance
collecting visual data through autonomous exploration and mapping in building
inspection. Yet, the number of studies is limited considering the post
processing of the data and its integration with autonomous UAVs. These will
enable huge steps onward into full automation of building inspection. In this
regard, this work presents a decision making tool for revisiting tasks in
visual building inspection by autonomous UAVs. The tool is an implementation of
fine-tuning a pretrained Convolutional Neural Network (CNN) for surface crack
detection. It offers an optional mechanism for task planning of revisiting
pinpoint locations during inspection. It is integrated to a quadrotor UAV
system that can autonomously navigate in GPS-denied environments. The UAV is
equipped with onboard sensors and computers for autonomous localization,
mapping and motion planning. The integrated system is tested through
simulations and real-world experiments. The results show that the system
achieves crack detection and autonomous navigation in GPS-denied environments
for building inspection
Revisiting Visual Question Answering Baselines
Visual question answering (VQA) is an interesting learning setting for
evaluating the abilities and shortcomings of current systems for image
understanding. Many of the recently proposed VQA systems include attention or
memory mechanisms designed to support "reasoning". For multiple-choice VQA,
nearly all of these systems train a multi-class classifier on image and
question features to predict an answer. This paper questions the value of these
common practices and develops a simple alternative model based on binary
classification. Instead of treating answers as competing choices, our model
receives the answer as input and predicts whether or not an
image-question-answer triplet is correct. We evaluate our model on the Visual7W
Telling and the VQA Real Multiple Choice tasks, and find that even simple
versions of our model perform competitively. Our best model achieves
state-of-the-art performance on the Visual7W Telling task and compares
surprisingly well with the most complex systems proposed for the VQA Real
Multiple Choice task. We explore variants of the model and study its
transferability between both datasets. We also present an error analysis of our
model that suggests a key problem of current VQA systems lies in the lack of
visual grounding of concepts that occur in the questions and answers. Overall,
our results suggest that the performance of current VQA systems is not
significantly better than that of systems designed to exploit dataset biases.Comment: European Conference on Computer Visio
On human motion prediction using recurrent neural networks
Human motion modelling is a classical problem at the intersection of graphics
and computer vision, with applications spanning human-computer interaction,
motion synthesis, and motion prediction for virtual and augmented reality.
Following the success of deep learning methods in several computer vision
tasks, recent work has focused on using deep recurrent neural networks (RNNs)
to model human motion, with the goal of learning time-dependent representations
that perform tasks such as short-term motion prediction and long-term human
motion synthesis. We examine recent work, with a focus on the evaluation
methodologies commonly used in the literature, and show that, surprisingly,
state-of-the-art performance can be achieved by a simple baseline that does not
attempt to model motion at all. We investigate this result, and analyze recent
RNN methods by looking at the architectures, loss functions, and training
procedures used in state-of-the-art approaches. We propose three changes to the
standard RNN models typically used for human motion, which result in a simple
and scalable RNN architecture that obtains state-of-the-art performance on
human motion prediction.Comment: Accepted at CVPR 1
Semantic bottleneck for computer vision tasks
This paper introduces a novel method for the representation of images that is
semantic by nature, addressing the question of computation intelligibility in
computer vision tasks. More specifically, our proposition is to introduce what
we call a semantic bottleneck in the processing pipeline, which is a crossing
point in which the representation of the image is entirely expressed with
natural language , while retaining the efficiency of numerical representations.
We show that our approach is able to generate semantic representations that
give state-of-the-art results on semantic content-based image retrieval and
also perform very well on image classification tasks. Intelligibility is
evaluated through user centered experiments for failure detection
Adding Cues to Binary Feature Descriptors for Visual Place Recognition
In this paper we propose an approach to embed continuous and selector cues in
binary feature descriptors used for visual place recognition. The embedding is
achieved by extending each feature descriptor with a binary string that encodes
a cue and supports the Hamming distance metric. Augmenting the descriptors in
such a way has the advantage of being transparent to the procedure used to
compare them. We present two concrete applications of our methodology,
demonstrating the two considered types of cues. In addition to that, we
conducted on these applications a broad quantitative and comparative evaluation
covering five benchmark datasets and several state-of-the-art image retrieval
approaches in combination with various binary descriptor types.Comment: 8 pages, 8 figures, source: www.gitlab.com/srrg-software/srrg_bench,
submitted to ICRA 201
- …