32,476 research outputs found
Deep learning based RGB-D vision tasks
Depth is an important source of information in computer vision. However, depth is
usually discarded in most vision tasks. In this thesis, we study the tasks of estimating depth from single monocular images, and incorporating depth for object detection and semantic segmentation. Recently, a significant number of breakthroughs have been introduced to the vision community by deep convolutional neural networks (CNNs). All of our algorithms in this thesis are built upon deep CNNs.
The first part of this thesis addresses the task of incorporating depth for object detection and semantic segmentation. The aim is to improve the performance of vision tasks that are only based on RGB data. Two approaches for object detection and two approaches for semantic segmentation are presented. These approaches are based on existing depth estimation, object detection and semantic segmentation algorithms.
The second part of this thesis addresses the task of depth estimation. Depth estimation is often formulated as a regression task due to the continuous property of depths. Deep CNNs for depth estimation are trained by iteratively minimizing regression errors between predicted and ground-truth depths. A drawback of regression is that it predicts depths without confidence. In this thesis, we propose to formulate depth estimation as a classification task which naturally predicts depths with confidence. The confidence can be used during training and post-processing. We also propose to exploit ordinal depth relationships from stereo videos to improve the performance of metric depth estimation. By doing so we propose a Relative Depth in Stereo (RDIS) dataset that is densely annotated with relative depths.Thesis (Ph.D.) -- University of Adelaide,School of Computer Science , 201
DEVELOPMENT OF AN INDUSTRIAL ROBOTIC ARM EDUCATION KIT BASED ON OBJECT RECOGNITION AND ROBOT KINEMATICS FOR ENGINEERS
DEVELOPMENT OF AN INDUSTRIAL ROBOTIC ARM EDUCATION KIT BASED ON OBJECT RECOGNITION AND ROBOT KINEMATICS FOR ENGINEERSAbstractRobotic vision makes systems in the industry more advantageous regarding practicality and flexibility. For this reason, it is essential to provide the necessary training for the standard use of vision based robotic systems on production lines. In this article, it is aimed to design a low cost computer vision based industrial robotic arm education kit with eye-to-hand configuration. This kit is based on classifying and stacking products in random locations in a short time, making them ready for industrial operations or logistics. In the development phase of the system, firstly, motion simulation of the robotic arm was performed and then, experimental setup was established, and the performance of the system was tested by experimental studies. This system, which operates with a great success rate, has been made available for use within the scope of education. Regarding the use of the system for educational purposes, this kit supports theoretical lessons by reviewing object recognition (vision systems), forward - inverse kinematics, and trajectory planning (robot kinematics) and running the system several times. Thus, engineering students are expected to approach the industry more consciously and to develop the industry. It can also be used for training of relevant engineers in the institution where vision based robotic systems are available.Keywords: Education Kit, Stereo Vision, Robotic Arm, Object Recognition and Classification, Pick-and-Place Tas
Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving
We propose a stereo vision-based approach for tracking the camera ego-motion
and 3D semantic objects in dynamic autonomous driving scenarios. Instead of
directly regressing the 3D bounding box using end-to-end approaches, we propose
to use the easy-to-labeled 2D detection and discrete viewpoint classification
together with a light-weight semantic inference method to obtain rough 3D
object measurements. Based on the object-aware-aided camera pose tracking which
is robust in dynamic environments, in combination with our novel dynamic object
bundle adjustment (BA) approach to fuse temporal sparse feature correspondences
and the semantic 3D measurement model, we obtain 3D object pose, velocity and
anchored dynamic point cloud estimation with instance accuracy and temporal
consistency. The performance of our proposed method is demonstrated in diverse
scenarios. Both the ego-motion estimation and object localization are compared
with the state-of-of-the-art solutions.Comment: 14 pages, 9 figures, eccv201
Deep learning in remote sensing: a review
Standing at the paradigm shift towards data-intensive science, machine
learning techniques are becoming increasingly important. In particular, as a
major breakthrough in the field, deep learning has proven as an extremely
powerful tool in many fields. Shall we embrace deep learning as the key to all?
Or, should we resist a 'black-box' solution? There are controversial opinions
in the remote sensing community. In this article, we analyze the challenges of
using deep learning for remote sensing data analysis, review the recent
advances, and provide resources to make deep learning in remote sensing
ridiculously simple to start with. More importantly, we advocate remote sensing
scientists to bring their expertise into deep learning, and use it as an
implicit general model to tackle unprecedented large-scale influential
challenges, such as climate change and urbanization.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin
Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking
The most common paradigm for vision-based multi-object tracking is
tracking-by-detection, due to the availability of reliable detectors for
several important object categories such as cars and pedestrians. However,
future mobile systems will need a capability to cope with rich human-made
environments, in which obtaining detectors for every possible object category
would be infeasible. In this paper, we propose a model-free multi-object
tracking approach that uses a category-agnostic image segmentation method to
track objects. We present an efficient segmentation mask-based tracker which
associates pixel-precise masks reported by the segmentation. Our approach can
utilize semantic information whenever it is available for classifying objects
at the track level, while retaining the capability to track generic unknown
objects in the absence of such information. We demonstrate experimentally that
our approach achieves performance comparable to state-of-the-art
tracking-by-detection methods for popular object categories such as cars and
pedestrians. Additionally, we show that the proposed method can discover and
robustly track a large variety of other objects.Comment: ICRA'18 submissio
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
- …