3,764 research outputs found
Real-time Visual Flow Algorithms for Robotic Applications
Vision offers important sensor cues to modern robotic platforms.
Applications such as control of aerial vehicles, visual servoing,
simultaneous localization and mapping, navigation and more
recently, learning, are examples where visual information is
fundamental to accomplish tasks. However, the use of computer
vision algorithms carries the computational cost of extracting
useful information from the stream of raw pixel data. The most
sophisticated algorithms use complex mathematical formulations
leading typically to computationally expensive, and consequently,
slow implementations. Even with modern computing resources,
high-speed and high-resolution video feed can only be used for
basic image processing operations. For a vision algorithm to be
integrated on a robotic system, the output of the algorithm
should be provided in real time, that is, at least at the same
frequency as the control logic of the robot. With robotic
vehicles becoming more dynamic and ubiquitous, this places higher
requirements to the vision processing pipeline.
This thesis addresses the problem of estimating dense visual flow
information in real time. The contributions of this work are
threefold. First, it introduces a new filtering algorithm for the
estimation of dense optical flow at frame rates as fast as 800 Hz
for 640x480 image resolution. The algorithm follows a
update-prediction architecture to estimate dense optical flow
fields incrementally over time. A fundamental component of the
algorithm is the modeling of the spatio-temporal evolution of the
optical flow field by means of partial differential equations.
Numerical predictors can implement such PDEs to propagate current
estimation of flow forward in time. Experimental validation of
the algorithm is provided using high-speed ground truth image
dataset as well as real-life video data at 300 Hz.
The second contribution is a new type of visual flow named
structure flow. Mathematically, structure flow is the
three-dimensional scene flow scaled by the inverse depth at each
pixel in the image. Intuitively, it is the complete velocity
field associated with image motion, including both optical flow
and scale-change or apparent divergence of the image. Analogously
to optic flow, structure flow provides a robotic vehicle with
perception of the motion of the environment as seen by the
camera. However, structure flow encodes the full 3D image motion
of the scene whereas optic flow only encodes the component on the
image plane. An algorithm to estimate structure flow from image
and depth measurements is proposed based on the same filtering
idea used to estimate optical flow.
The final contribution is the spherepix data structure for
processing spherical images. This data structure is the numerical
back-end used for the real-time implementation of the structure
flow filter. It consists of a set of overlapping patches covering
the surface of the sphere. Each individual patch approximately
holds properties such as orthogonality and equidistance of
points, thus allowing efficient implementations of low-level
classical 2D convolution based image processing routines such as
Gaussian filters and numerical derivatives.
These algorithms are implemented on GPU hardware and can be
integrated to future Robotic Embedded Vision systems to provide
fast visual information to robotic vehicles
Learning Blind Motion Deblurring
As handheld video cameras are now commonplace and available in every
smartphone, images and videos can be recorded almost everywhere at anytime.
However, taking a quick shot frequently yields a blurry result due to unwanted
camera shake during recording or moving objects in the scene. Removing these
artifacts from the blurry recordings is a highly ill-posed problem as neither
the sharp image nor the motion blur kernel is known. Propagating information
between multiple consecutive blurry observations can help restore the desired
sharp image or video. Solutions for blind deconvolution based on neural
networks rely on a massive amount of ground-truth data which is hard to
acquire. In this work, we propose an efficient approach to produce a
significant amount of realistic training data and introduce a novel recurrent
network architecture to deblur frames taking temporal information into account,
which can efficiently handle arbitrary spatial and temporal input sizes. We
demonstrate the versatility of our approach in a comprehensive comparison on a
number of challening real-world examples.Comment: International Conference on Computer Vision (ICCV) (2017
Data mining based learning algorithms for semi-supervised object identification and tracking
Sensor exploitation (SE) is the crucial step in surveillance applications such as airport security and search and rescue operations. It allows localization and identification of movement in urban settings and can significantly boost knowledge gathering, interpretation and action. Data mining techniques offer the promise of precise and accurate knowledge acquisition techniques in high-dimensional data domains (and diminishing the “curse of dimensionality” prevalent in such datasets), coupled by algorithmic design in feature extraction, discriminative ranking, feature fusion and supervised learning (classification). Consequently, data mining techniques and algorithms can be used to refine and process captured data and to detect, recognize, classify, and track objects with predictable high degrees of specificity and sensitivity.
Automatic object detection and tracking algorithms face several obstacles, such as large and incomplete datasets, ill-defined regions of interest (ROIs), variable scalability, lack of compactness, angular regions, partial occlusions, environmental variables, and unknown potential object classes, which work against their ability to achieve accurate real-time results. Methods must produce fast and accurate results by streamlining image processing, data compression and reduction, feature extraction, classification, and tracking algorithms. Data mining techniques can sufficiently address these challenges by implementing efficient and accurate dimensionality reduction with feature extraction to refine incomplete (ill-partitioning) data-space and addressing challenges related to object classification, intra-class variability, and inter-class dependencies.
A series of methods have been developed to combat many of the challenges for the purpose of creating a sensor exploitation and tracking framework for real time image sensor inputs. The framework has been broken down into a series of sub-routines, which work in both series and parallel to accomplish tasks such as image pre-processing, data reduction, segmentation, object detection, tracking, and classification. These methods can be implemented either independently or together to form a synergistic solution to object detection and tracking.
The main contributions to the SE field include novel feature extraction methods for highly discriminative object detection, classification, and tracking. Also, a new supervised classification scheme is presented for detecting objects in urban environments. This scheme incorporates both novel features and non-maximal suppression to reduce false alarms, which can be abundant in cluttered environments such as cities. Lastly, a performance evaluation of Graphical Processing Unit (GPU) implementations of the subtask algorithms is presented, which provides insight into speed-up gains throughout the SE framework to improve design for real time applications.
The overall framework provides a comprehensive SE system, which can be tailored for integration into a layered sensing scheme to provide the war fighter with automated assistance and support. As more sensor technology and integration continues to advance, this SE framework can provide faster and more accurate decision support for both intelligence and civilian applications
Theory, Design, and Implementation of Landmark Promotion Cooperative Simultaneous Localization and Mapping
Simultaneous Localization and Mapping (SLAM) is a challenging problem in practice, the use of multiple robots and inexpensive sensors poses even more demands on the designer. Cooperative SLAM poses specific challenges in the areas of computational efficiency, software/network performance, and robustness to errors. New methods in image processing, recursive filtering, and SLAM have been developed to implement practical algorithms for cooperative SLAM on a set of inexpensive robots.
The Consolidated Unscented Mixed Recursive Filter (CUMRF) is designed to handle non-linear systems with non-Gaussian noise. This is accomplished using the Unscented Transform combined with Gaussian Mixture Models. The Robust Kalman Filter is an extension of the Kalman Filter algorithm that improves the ability to remove erroneous observations using Principal Component Analysis (PCA) and the X84 outlier rejection rule. Forgetful SLAM is a local SLAM technique that runs in nearly constant time relative to the number of visible landmarks and improves poor performing sensors through sensor fusion and outlier rejection. Forgetful SLAM correlates all measured observations, but stops the state from growing over time. Hierarchical Active Ripple SLAM (HAR-SLAM) is a new SLAM architecture that breaks the traditional state space of SLAM into a chain of smaller state spaces, allowing multiple robots, multiple sensors, and multiple updates to occur in linear time with linear storage with respect to the number of robots, landmarks, and robots poses. This dissertation presents explicit methods for closing-the-loop, joining multiple robots, and active updates. Landmark Promotion SLAM is a hierarchy of new SLAM methods, using the Robust Kalman Filter, Forgetful SLAM, and HAR-SLAM.
Practical aspects of SLAM are a focus of this dissertation. LK-SURF is a new image processing technique that combines Lucas-Kanade feature tracking with Speeded-Up Robust Features to perform spatial and temporal tracking. Typical stereo correspondence techniques fail at providing descriptors for features, or fail at temporal tracking. Several calibration and modeling techniques are also covered, including calibrating stereo cameras, aligning stereo cameras to an inertial system, and making neural net system models. These methods are important to improve the quality of the data and images acquired for the SLAM process
SELF-ADAPTING PARALLEL FRAMEWORK FOR LONG-TERM OBJECT TRACKING
Object tracking is a crucial field in computer vision that has many uses in human-computer interaction, security and surveillance, video communication and compression, augmented reality, traffic control, etc. Many implementations are introduced in practice, and yet recent methods emphasize on tracking objects adaptively by learning the object’s perspectives and rediscovering it when it becomes untraceable, so that object’s absence problem (in case of occlusion, cluttering or blurring) is resolved. Most of these algorithms have high computational burden on the computational units and need powerful CPUs to attain real-time tracking and high bitrate video processing. These computational units may handle no more than a single video source, making it unsuitable for large-scale implementations like multiple sources or higher resolution videos. In this thesis, we choose one popular algorithm called TLD, Tracking-Learning-Detection, study the core components of the algorithm that impede its performance, and implement these components in a parallel computational environment such as multi-core CPUs, GPUs, etc., also known as heterogeneous computing. OpenCL is used as a development platform to produce parallel kernels for the algorithm. The goals are to create an acceptable heterogeneous computing environment through utilizing current computer technologies, to imbue real-time applications with an alternative implementation methodology, and to circumvent the upcoming limitations of hardware in terms of cost, power, and speedup. We are able to bring true parallel speedup to the existing implementations, which greatly improves the frame rate for long-term object tracking and with some algorithm parameter modification, it provides more accurate object tracking. According to the experiments, developed kernels have achieved a range of performance improvement. As for reduction based kernels, a maximum of 78X speedup is achieved. While for window based kernels, a range of couple hundreds to 2000X speedup is achieved. And for the optical flow tracking kernel, a maximum of 5.7X speedup is recorded. Global speedup is highly dependent on the hardware specifications, especially for memory transfers. With the use of a medium sized input, the self-adapting parallel framework has successfully obtained a fast learning curve and converged to an average of 1.6X speedup compared to the original implementation. Lastly, for future programming convenience, an OpenCL based library is built to facilitate the use of OpenCL programming on parallel hardware devices, hide the complexity of building and compiling OpenCL kernels, and provide a C-based latency measurement tool that is compatible with several operating systems
- …