12,539 research outputs found
Faster than FAST: GPU-Accelerated Frontend for High-Speed VIO
The recent introduction of powerful embedded graphics processing units (GPUs)
has allowed for unforeseen improvements in real-time computer vision
applications. It has enabled algorithms to run onboard, well above the standard
video rates, yielding not only higher information processing capability, but
also reduced latency. This work focuses on the applicability of efficient
low-level, GPU hardware-specific instructions to improve on existing computer
vision algorithms in the field of visual-inertial odometry (VIO). While most
steps of a VIO pipeline work on visual features, they rely on image data for
detection and tracking, of which both steps are well suited for
parallelization. Especially non-maxima suppression and the subsequent feature
selection are prominent contributors to the overall image processing latency.
Our work first revisits the problem of non-maxima suppression for feature
detection specifically on GPUs, and proposes a solution that selects local
response maxima, imposes spatial feature distribution, and extracts features
simultaneously. Our second contribution introduces an enhanced FAST feature
detector that applies the aforementioned non-maxima suppression method.
Finally, we compare our method to other state-of-the-art CPU and GPU
implementations, where we always outperform all of them in feature tracking and
detection, resulting in over 1000fps throughput on an embedded Jetson TX2
platform. Additionally, we demonstrate our work integrated in a VIO pipeline
achieving a metric state estimation at ~200fps.Comment: IEEE International Conference on Intelligent Robots and Systems
(IROS), 2020. Open-source implementation available at
https://github.com/uzh-rpg/vili
Event-Based Noise Filtration with Point-of-Interest Detection and Tracking for Space Situational Awareness
This thesis explores an asynchronous noise-suppression technique to be used in conjunction with asynchronous, Gaussian-blob tracking on dynamic vision sensor (DVS) data. This type of sensor is a member of a relatively new class of neuromorphic sensing devices that emulate the change-based detection properties of the human eye. By leveraging a biologically inspired mode of operation, these sensors can achieve significantly higher sampling rates as compared to conventional cameras, while also eliminating redundant data generated by static backgrounds. The resulting high dynamic range and fast acquisition time of DVS recordings enables the imaging of high-velocity targets despite ordinarily problematic lighting conditions. The technique presented here relies on treating each pixel of the sensor as a spiking cell keeping track of its own activity over time, which in turn can be filtered out of the resulting sensor event stream by user-configurable threshold values that form a temporal bandpass filter. In addition, asynchronous blob-tracking is supplemented with double-exponential smoothing prediction and Bezier curve-fitting in order to smooth tracker movement and interpolate target trajectory respectively. This overall scheme is intended to achieve asynchronous point-source tracking using a DVS for space-based applications, particularly in tracking distant, dim satellites. In the space environment, radiation effects are expected to introduce transient, and possibly persistent, noise into the asynchronous event-stream of the DVS. Given the large distances between objects in space, targets of interest may be no larger than a single pixel and can therefore appear similar to such noise-induced events. In this thesis, the asynchronous approach is experimentally compared to a more traditional approach applied to reconstructed frame data for both performance and accuracy metrics. The results of this research show that the asynchronous approach can produce comparable or even better tracking accuracy, while also drastically reducing the execution time of the process by seven times on average
DeepProposals: Hunting Objects and Actions by Cascading Deep Convolutional Layers
In this paper, a new method for generating object and action proposals in
images and videos is proposed. It builds on activations of different
convolutional layers of a pretrained CNN, combining the localization accuracy
of the early layers with the high informative-ness (and hence recall) of the
later layers. To this end, we build an inverse cascade that, going backward
from the later to the earlier convolutional layers of the CNN, selects the most
promising locations and refines them in a coarse-to-fine manner. The method is
efficient, because i) it re-uses the same features extracted for detection, ii)
it aggregates features using integral images, and iii) it avoids a dense
evaluation of the proposals thanks to the use of the inverse coarse-to-fine
cascade. The method is also accurate. We show that our DeepProposals outperform
most of the previously proposed object proposal and action proposal approaches
and, when plugged into a CNN-based object detector, produce state-of-the-art
detection performance.Comment: 15 page
Segmentation-assisted detection of dirt impairments in archived film sequences
A novel segmentation-assisted method for film dirt detection is proposed. We exploit the fact that film dirt manifests in the spatial domain as a cluster of connected pixels whose intensity differs substantially from that of its neighborhood and we employ a segmentation-based approach to identify this type of structure. A key feature of our approach is the computation of a measure of confidence attached to detected dirt regions which can be utilized for performance fine tuning. Another important feature of our algorithm is the avoidance of the computational complexity associated with motion estimation. Our experimental framework benefits from the availability of manually derived as well as objective ground truth data obtained using infrared scanning. Our results demonstrate that the proposed method compares favorably with standard spatial, temporal and multistage median filtering approaches and provides efficient and robust detection for a wide variety of test material
Repulsion Loss: Detecting Pedestrians in a Crowd
Detecting individual pedestrians in a crowd remains a challenging problem
since the pedestrians often gather together and occlude each other in
real-world scenarios. In this paper, we first explore how a state-of-the-art
pedestrian detector is harmed by crowd occlusion via experimentation, providing
insights into the crowd occlusion problem. Then, we propose a novel bounding
box regression loss specifically designed for crowd scenes, termed repulsion
loss. This loss is driven by two motivations: the attraction by target, and the
repulsion by other surrounding objects. The repulsion term prevents the
proposal from shifting to surrounding objects thus leading to more crowd-robust
localization. Our detector trained by repulsion loss outperforms all the
state-of-the-art methods with a significant improvement in occlusion cases.Comment: Accepted to IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 201
- …