34 research outputs found
Underwater Multi-Robot Convoying using Visual Tracking by Detection
We present a robust multi-robot convoying approach that relies on visual
detection of the leading agent, thus enabling target following in unstructured
3-D environments. Our method is based on the idea of tracking-by-detection,
which interleaves efficient model-based object detection with temporal
filtering of image-based bounding box estimation. This approach has the
important advantage of mitigating tracking drift (i.e. drifting away from the
target object), which is a common symptom of model-free trackers and is
detrimental to sustained convoying in practice. To illustrate our solution, we
collected extensive footage of an underwater robot in ocean settings, and
hand-annotated its location in each frame. Based on this dataset, we present an
empirical comparison of multiple tracker variants, including the use of several
convolutional neural networks, both with and without recurrent connections, as
well as frequency-based model-free trackers. We also demonstrate the
practicality of this tracking-by-detection strategy in real-world scenarios by
successfully controlling a legged underwater robot in five degrees of freedom
to follow another robot's independent motion.Comment: Accepted to IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS), 201
Visual Diver Recognition for Underwater Human-Robot Collaboration
This paper presents an approach for autonomous underwater robots to visually
detect and identify divers. The proposed approach enables an autonomous
underwater robot to detect multiple divers in a visual scene and distinguish
between them. Such methods are useful for robots to identify a human leader,
for example, in multi-human/robot teams where only designated individuals are
allowed to command or lean a team of robots. Initial diver identification is
performed using the Faster R-CNN algorithm with a region proposal network which
produces bounding boxes around the divers' locations. Subsequently, a suite of
spatial and frequency domain descriptors are extracted from the bounding boxes
to create a feature vector. A K-Means clustering algorithm, with k set to the
number of detected bounding boxes, thereafter identifies the detected divers
based on these feature vectors. We evaluate the performance of the proposed
approach on video footage of divers swimming in front of a mobile robot and
demonstrate its accuracy.Comment: submitted for ICRA 201
Towards a Generic Diver-Following Algorithm: Balancing Robustness and Efficiency in Deep Visual Detection
This paper explores the design and development of a class of robust
diver-following algorithms for autonomous underwater robots. By considering the
operational challenges for underwater visual tracking in diverse real-world
settings, we formulate a set of desired features of a generic diver following
algorithm. We attempt to accommodate these features and maximize general
tracking performance by exploiting the state-of-the-art deep object detection
models. We fine-tune the building blocks of these models with a goal of
balancing the trade-off between robustness and efficiency in an onboard setting
under real-time constraints. Subsequently, we design an architecturally simple
Convolutional Neural Network (CNN)-based diver-detection model that is much
faster than the state-of-the-art deep models yet provides comparable detection
performances. In addition, we validate the performance and effectiveness of the
proposed diver-following modules through a number of field experiments in
closed-water and open-water environments
DeepURL: Deep Pose Estimation Framework for Underwater Relative Localization
In this paper, we propose a real-time deep learning approach for determining
the 6D relative pose of Autonomous Underwater Vehicles (AUV) from a single
image. A team of autonomous robots localizing themselves in a
communication-constrained underwater environment is essential for many
applications such as underwater exploration, mapping, multi-robot convoying,
and other multi-robot tasks. Due to the profound difficulty of collecting
ground truth images with accurate 6D poses underwater, this work utilizes
rendered images from the Unreal Game Engine simulation for training. An
image-to-image translation network is employed to bridge the gap between the
rendered and the real images producing synthetic images for training. The
proposed method predicts the 6D pose of an AUV from a single image as 2D image
keypoints representing 8 corners of the 3D model of the AUV, and then the 6D
pose in the camera coordinates is determined using RANSAC-based PnP.
Experimental results in real-world underwater environments (swimming pool and
ocean) with different cameras demonstrate the robustness and accuracy of the
proposed technique in terms of translation error and orientation error over the
state-of-the-art methods. The code is publicly available
Person Following by Autonomous Robots: A Categorical Overview
A wide range of human-robot collaborative applications in diverse domains
such as manufacturing, health care, the entertainment industry, and social
interactions, require an autonomous robot to follow its human companion.
Different working environments and applications pose diverse challenges by
adding constraints on the choice of sensors, the degree of autonomy, and
dynamics of a person-following robot. Researchers have addressed these
challenges in many ways and contributed to the development of a large body of
literature. This paper provides a comprehensive overview of the literature by
categorizing different aspects of person-following by autonomous robots. Also,
the corresponding operational challenges are identified based on various design
choices for ground, underwater, and aerial scenarios. In addition,
state-of-the-art methods for perception, planning, control, and interaction are
elaborately discussed and their applicability in varied operational scenarios
are presented. Then, some of the prominent methods are qualitatively compared,
corresponding practicalities are illustrated, and their feasibility is analyzed
for various use-cases. Furthermore, several prospective application areas are
identified, and open problems are highlighted for future research
Understanding Human Motion and Gestures for Underwater Human-Robot Collaboration
In this paper, we present a number of robust methodologies for an underwater
robot to visually detect, follow, and interact with a diver for collaborative
task execution. We design and develop two autonomous diver-following
algorithms, the first of which utilizes both spatial- and frequency-domain
features pertaining to human swimming patterns in order to visually track a
diver. The second algorithm uses a convolutional neural network-based model for
robust tracking-by-detection. In addition, we propose a hand gesture-based
human-robot communication framework that is syntactically simpler and
computationally more efficient than the existing grammar-based frameworks. In
the proposed interaction framework, deep visual detectors are used to provide
accurate hand gesture recognition; subsequently, a finite-state machine
performs robust and efficient gesture-to-instruction mapping. The
distinguishing feature of this framework is that it can be easily adopted by
divers for communicating with underwater robots without using artificial
markers or requiring memorization of complex language rules. Furthermore, we
validate the performance and effectiveness of the proposed methodologies
through extensive field experiments in closed- and open-water environments.
Finally, we perform a user interaction study to demonstrate the usability
benefits of our proposed interaction framework compared to existing methods.Comment: arXiv admin note: text overlap with arXiv:1709.0877
Semantic Segmentation of Underwater Imagery: Dataset and Benchmark
In this paper, we present the first large-scale dataset for semantic
Segmentation of Underwater IMagery (SUIM). It contains over 1500 images with
pixel annotations for eight object categories: fish (vertebrates), reefs
(invertebrates), aquatic plants, wrecks/ruins, human divers, robots, and
sea-floor. The images have been rigorously collected during oceanic
explorations and human-robot collaborative experiments, and annotated by human
participants. We also present a benchmark evaluation of state-of-the-art
semantic segmentation approaches based on standard performance metrics. In
addition, we present SUIM-Net, a fully-convolutional encoder-decoder model that
balances the trade-off between performance and computational efficiency. It
offers competitive performance while ensuring fast end-to-end inference, which
is essential for its use in the autonomy pipeline of visually-guided underwater
robots. In particular, we demonstrate its usability benefits for visual
servoing, saliency prediction, and detailed scene understanding. With a variety
of use cases, the proposed model and benchmark dataset open up promising
opportunities for future research in underwater robot vision
Physical Adversarial Textures that Fool Visual Object Tracking
We present a system for generating inconspicuous-looking textures that, when
displayed in the physical world as digital or printed posters, cause visual
object tracking systems to become confused. For instance, as a target being
tracked by a robot's camera moves in front of such a poster, our generated
texture makes the tracker lock onto it and allows the target to evade. This
work aims to fool seldom-targeted regression tasks, and in particular compares
diverse optimization strategies: non-targeted, targeted, and a new family of
guided adversarial losses. While we use the Expectation Over Transformation
(EOT) algorithm to generate physical adversaries that fool tracking models when
imaged under diverse conditions, we compare the impacts of different
conditioning variables, including viewpoint, lighting, and appearances, to find
practical attack setups with high resulting adversarial strength and
convergence speed. We further showcase textures optimized solely using
simulated scenes can confuse real-world tracking systems.Comment: Accepted to the International Conference on Computer Vision (ICCV)
201
SVAM: Saliency-guided Visual Attention Modeling by Autonomous Underwater Robots
This paper presents a holistic approach to saliency-guided visual attention
modeling (SVAM) for use by autonomous underwater robots. Our proposed model,
named SVAM-Net, integrates deep visual features at various scales and semantics
for effective salient object detection (SOD) in natural underwater images. The
SVAM-Net architecture is configured in a unique way to jointly accommodate
bottom-up and top-down learning within two separate branches of the network
while sharing the same encoding layers. We design dedicated spatial attention
modules (SAMs) along these learning pathways to exploit the coarse-level and
fine-level semantic features for SOD at four stages of abstractions. The
bottom-up branch performs a rough yet reasonably accurate saliency estimation
at a fast rate, whereas the deeper top-down branch incorporates a residual
refinement module (RRM) that provides fine-grained localization of the salient
objects. Extensive performance evaluation of SVAM-Net on benchmark datasets
clearly demonstrates its effectiveness for underwater SOD. We also validate its
generalization performance by several ocean trials' data that include test
images of diverse underwater scenes and waterbodies, and also images with
unseen natural objects. Moreover, we analyze its computational feasibility for
robotic deployments and demonstrate its utility in several important use cases
of visual attention modeling
Morphology-Agnostic Visual Robotic Control
Existing approaches for visuomotor robotic control typically require
characterizing the robot in advance by calibrating the camera or performing
system identification. We propose MAVRIC, an approach that works with minimal
prior knowledge of the robot's morphology, and requires only a camera view
containing the robot and its environment and an unknown control interface.
MAVRIC revolves around a mutual information-based method for self-recognition,
which discovers visual "control points" on the robot body within a few seconds
of exploratory interaction, and these control points in turn are then used for
visual servoing. MAVRIC can control robots with imprecise actuation, no
proprioceptive feedback, unknown morphologies including novel tools, unknown
camera poses, and even unsteady handheld cameras. We demonstrate our method on
visually-guided 3D point reaching, trajectory following, and robot-to-robot
imitation