140 research outputs found
Evaluation of Object Detection Proposals Under Condition Variations
Object detection is a fundamental task in many computer vision applications,
therefore the importance of evaluating the quality of object detection is well
acknowledged in this domain. This process gives insight into the capabilities
of methods in handling environmental changes. In this paper, a new method for
object detection is introduced that combines the Selective Search and
EdgeBoxes. We tested these three methods under environmental variations. Our
experiments demonstrate the outperformance of the combination method under
illumination and view point variations.Comment: 2 pages, 6 figures, CVPR Workshop, 201
Multi-Modal Trip Hazard Affordance Detection On Construction Sites
Trip hazards are a significant contributor to accidents on construction and
manufacturing sites, where over a third of Australian workplace injuries occur
[1]. Current safety inspections are labour intensive and limited by human
fallibility,making automation of trip hazard detection appealing from both a
safety and economic perspective. Trip hazards present an interesting challenge
to modern learning techniques because they are defined as much by affordance as
by object type; for example wires on a table are not a trip hazard, but can be
if lying on the ground. To address these challenges, we conduct a comprehensive
investigation into the performance characteristics of 11 different colour and
depth fusion approaches, including 4 fusion and one non fusion approach; using
colour and two types of depth images. Trained and tested on over 600 labelled
trip hazards over 4 floors and 2000m in an active construction
site,this approach was able to differentiate between identical objects in
different physical configurations (see Figure 1). Outperforming a colour-only
detector, our multi-modal trip detector fuses colour and depth information to
achieve a 4% absolute improvement in F1-score. These investigative results and
the extensive publicly available dataset moves us one step closer to assistive
or fully automated safety inspection systems on construction sites.Comment: 9 Pages, 12 Figures, 2 Tables, Accepted to Robotics and Automation
Letters (RA-L
Scene signatures: localised and point-less features for localisation
This paper is about localising across extreme lighting and weather conditions. We depart from the traditional point-feature-based approach as matching under dramatic appearance changes is a brittle and hard thing. Point feature detectors are fixed and rigid procedures which pass over an image examining small, low-level structure such as corners or blobs. They apply the same criteria applied all images of all places. This paper takes a contrary view and asks what is possible if instead we learn a bespoke detector for every place. Our localisation task then turns into curating a large bank of spatially indexed detectors and we show that this yields vastly superior performance in terms of robustness in exchange for a reduced but tolerable metric precision. We present an unsupervised system that produces broad-region detectors for distinctive visual elements, called scene signatures, which can be associated across almost all appearance changes. We show, using 21km of data collected over a period of 3 months, that our system is capable of producing metric localisation estimates from night-to-day or summer-to-winter conditions
Place Categorization and Semantic Mapping on a Mobile Robot
In this paper we focus on the challenging problem of place categorization and
semantic mapping on a robot without environment-specific training. Motivated by
their ongoing success in various visual recognition tasks, we build our system
upon a state-of-the-art convolutional network. We overcome its closed-set
limitations by complementing the network with a series of one-vs-all
classifiers that can learn to recognize new semantic classes online. Prior
domain knowledge is incorporated by embedding the classification system into a
Bayesian filter framework that also ensures temporal coherence. We evaluate the
classification accuracy of the system on a robot that maps a variety of places
on our campus in real-time. We show how semantic information can boost robotic
object detection performance and how the semantic map can be used to modulate
the robot's behaviour during navigation tasks. The system is made available to
the community as a ROS module
Development of a dragline in-bucket bulk density monitor
This paper details the implementation and trialling of a prototype in-bucket bulk density monitor on a production dragline. Bulk density information can provide feedback to mine planning and scheduling to improve blasting and consequently facilitating optimal bucket sizing. The bulk density measurement builds upon outcomes presented in the AMTC2009 paper titled ‘Automatic In-Bucket Volume Estimation for Dragline Operations’ and utilises payload information from a commercial dragline monitor. While the previous paper explains the algorithms and theoretical basis for the system design and scaled model testing this paper will focus on the full scale implementation and the challenges involved
Unaided stereo vision based pose estimation
This paper presents the development of a low-cost sensor platform for use in ground-based visual pose estimation and scene mapping tasks. We seek to develop a technical solution using low-cost vision hardware that allows us to accurately estimate robot position for SLAM tasks. We present results from the application of a vision based pose estimation technique to simultaneously determine camera poses and scene structure. The results are generated from a dataset gathered traversing a local road at the St Lucia Campus of the University of Queensland. We show the accuracy of the pose estimation over a 1.6km trajectory in relation to GPS ground truth
Action Recognition: From Static Datasets to Moving Robots
Deep learning models have achieved state-of-the- art performance in
recognizing human activities, but often rely on utilizing background cues
present in typical computer vision datasets that predominantly have a
stationary camera. If these models are to be employed by autonomous robots in
real world environments, they must be adapted to perform independently of
background cues and camera motion effects. To address these challenges, we
propose a new method that firstly generates generic action region proposals
with good potential to locate one human action in unconstrained videos
regardless of camera motion and then uses action proposals to extract and
classify effective shape and motion features by a ConvNet framework. In a range
of experiments, we demonstrate that by actively proposing action regions during
both training and testing, state-of-the-art or better performance is achieved
on benchmarks. We show the outperformance of our approach compared to the
state-of-the-art in two new datasets; one emphasizes on irrelevant background,
the other highlights the camera motion. We also validate our action recognition
method in an abnormal behavior detection scenario to improve workplace safety.
The results verify a higher success rate for our method due to the ability of
our system to recognize human actions regardless of environment and camera
motion
Deep Learning Features at Scale for Visual Place Recognition
The success of deep learning techniques in the computer vision domain has
triggered a range of initial investigations into their utility for visual place
recognition, all using generic features from networks that were trained for
other types of recognition tasks. In this paper, we train, at large scale, two
CNN architectures for the specific place recognition task and employ a
multi-scale feature encoding method to generate condition- and
viewpoint-invariant features. To enable this training to occur, we have
developed a massive Specific PlacEs Dataset (SPED) with hundreds of examples of
place appearance change at thousands of different places, as opposed to the
semantic place type datasets currently available. This new dataset enables us
to set up a training regime that interprets place recognition as a
classification problem. We comprehensively evaluate our trained networks on
several challenging benchmark place recognition datasets and demonstrate that
they achieve an average 10% increase in performance over other place
recognition algorithms and pre-trained CNNs. By analyzing the network responses
and their differences from pre-trained networks, we provide insights into what
a network learns when training for place recognition, and what these results
signify for future research in this area.Comment: 8 pages, 10 figures. Accepted by International Conference on Robotics
and Automation (ICRA) 2017. This is the submitted version. The final
published version may be slightly differen
Robust Scale Initialization for Long-Range Stereo Visual Odometry
Abstract-Achieving a robust, accurately scaled pose estimate in long-range stereo presents significant challenges. For large scene depths, triangulation from a single stereo pair is inadequate and noisy. Additionally, vibration and flexible rigs in airborne applications mean accurate calibrations are often compromised. This paper presents a technique for accurately initializing a long-range stereo VO algorithm at large scene depth, with accurate scale, without explicitly computing structure from rigidly fixed camera pairs. By performing a monocular pose estimate over a window of frames from a single camera, followed by adding the secondary camera frames in a modified bundle adjustment, an accurate, metrically scaled pose estimate can be found. To achieve this the scale of the stereo pair is included in the optimization as an additional parameter. Results are presented both on simulated and field gathered data from a fixed-wing UAV flying at significant altitude, where the epipolar geometry is inaccurate due to structural deformation and triangulation from a single pair is insufficient. Comparisons are made with more conventional VO techniques where the scale is not explicitly optimized, and demonstrated over repeated trials to indicate robustness
Towards vision-based deep reinforcement learning for robotic motion control
This paper introduces a machine learning based system for controlling a robotic manipulator with visual perception only. The capability to autonomously learn robot controllers solely from raw-pixel images and without any prior knowledge of configuration is shown for the first time. We build upon the success of recent deep reinforcement learning and develop a system for learning target reaching with a three-joint robot manipulator using external visual observation. A Deep Q Network (DQN) was demonstrated to perform target reaching after training in simulation. Transferring the network to real hardware and real observation in a naive approach failed, but experiments show that the network works when replacing camera images with synthetic images
- …