18,881 research outputs found
How to Train a CAT: Learning Canonical Appearance Transformations for Direct Visual Localization Under Illumination Change
Direct visual localization has recently enjoyed a resurgence in popularity
with the increasing availability of cheap mobile computing power. The
competitive accuracy and robustness of these algorithms compared to
state-of-the-art feature-based methods, as well as their natural ability to
yield dense maps, makes them an appealing choice for a variety of mobile
robotics applications. However, direct methods remain brittle in the face of
appearance change due to their underlying assumption of photometric
consistency, which is commonly violated in practice. In this paper, we propose
to mitigate this problem by training deep convolutional encoder-decoder models
to transform images of a scene such that they correspond to a previously-seen
canonical appearance. We validate our method in multiple environments and
illumination conditions using high-fidelity synthetic RGB-D datasets, and
integrate the trained models into a direct visual localization pipeline,
yielding improvements in visual odometry (VO) accuracy through time-varying
illumination conditions, as well as improved metric relocalization performance
under illumination change, where conventional methods normally fail. We further
provide a preliminary investigation of transfer learning from synthetic to real
environments in a localization context. An open-source implementation of our
method using PyTorch is available at https://github.com/utiasSTARS/cat-net.Comment: In IEEE Robotics and Automation Letters (RA-L) and presented at the
IEEE International Conference on Robotics and Automation (ICRA'18), Brisbane,
Australia, May 21-25, 201
Dynamic Vision Sensor integration on FPGA-based CNN accelerators for high-speed visual classification
Deep-learning is a cutting edge theory that is being applied to many fields.
For vision applications the Convolutional Neural Networks (CNN) are demanding
significant accuracy for classification tasks. Numerous hardware accelerators
have populated during the last years to improve CPU or GPU based solutions.
This technology is commonly prototyped and tested over FPGAs before being
considered for ASIC fabrication for mass production. The use of commercial
typical cameras (30fps) limits the capabilities of these systems for high speed
applications. The use of dynamic vision sensors (DVS) that emulate the behavior
of a biological retina is taking an incremental importance to improve this
applications due to its nature, where the information is represented by a
continuous stream of spikes and the frames to be processed by the CNN are
constructed collecting a fixed number of these spikes (called events). The
faster an object is, the more events are produced by DVS, so the higher is the
equivalent frame rate. Therefore, these DVS utilization allows to compute a
frame at the maximum speed a CNN accelerator can offer. In this paper we
present a VHDL/HLS description of a pipelined design for FPGA able to collect
events from an Address-Event-Representation (AER) DVS retina to obtain a
normalized histogram to be used by a particular CNN accelerator, called
NullHop. VHDL is used to describe the circuit, and HLS for computation blocks,
which are used to perform the normalization of a frame needed for the CNN.
Results outperform previous implementations of frames collection and
normalization using ARM processors running at 800MHz on a Zynq7100 in both
latency and power consumption. A measured 67% speedup factor is presented for a
Roshambo CNN real-time experiment running at 160fps peak rate.Comment: 7 page
Distributed OpenGL Rendering in Network Bandwidth Constrained Environments
Display walls made from multiple monitors are often used when very high resolution images are required. To utilise a display wall, rendering information must be sent to each computer that the monitors are connect to. The network is often the performance bottleneck for demanding applications, like high performance 3D animations. This paper introduces ClusterGL; a distribution library for OpenGL applications. ClusterGL reduces network traffic by using compression, frame differencing and multi-cast. Existing applications can use ClusterGL without recompilation. Benchmarks show that, for most applications, ClusterGL outperforms other systems that support unmodified OpenGL applications including Chromium and BroadcastGL. The difference is larger for more complex scene geometries and when there are more display machines. For example, when rendering OpenArena, ClusterGL outperforms Chromium by over 300% on the Symphony display wall at The University of Waikato, New Zealand. This display has 20 monitors supported by five computers connected by gigabit Ethernet, with a full resolution of over 35 megapixels. ClusterGL is freely available via Google Code
Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP
With ever-increasing volumes of scientific data produced by HPC applications,
significantly reducing data size is critical because of limited capacity of
storage space and potential bottlenecks on I/O or networks in writing/reading
or transferring data. SZ and ZFP are the two leading lossy compressors
available to compress scientific data sets. However, their performance is not
consistent across different data sets and across different fields of some data
sets: for some fields SZ provides better compression performance, while other
fields are better compressed with ZFP. This situation raises the need for an
automatic online (during compression) selection between SZ and ZFP, with a
minimal overhead. In this paper, the automatic selection optimizes the
rate-distortion, an important statistical quality metric based on the
signal-to-noise ratio. To optimize for rate-distortion, we investigate the
principles of SZ and ZFP. We then propose an efficient online, low-overhead
selection algorithm that predicts the compression quality accurately for two
compressors in early processing stages and selects the best-fit compressor for
each data field. We implement the selection algorithm into an open-source
library, and we evaluate the effectiveness of our proposed solution against
plain SZ and ZFP in a parallel environment with 1,024 cores. Evaluation results
on three data sets representing about 100 fields show that our selection
algorithm improves the compression ratio up to 70% with the same level of data
distortion because of very accurate selection (around 99%) of the best-fit
compressor, with little overhead (less than 7% in the experiments).Comment: 14 pages, 9 figures, first revisio
Approximation algorithms for wavelet transform coding of data streams
This paper addresses the problem of finding a B-term wavelet representation
of a given discrete function whose distance from f is
minimized. The problem is well understood when we seek to minimize the
Euclidean distance between f and its representation. The first known algorithms
for finding provably approximate representations minimizing general
distances (including ) under a wide variety of compactly supported
wavelet bases are presented in this paper. For the Haar basis, a polynomial
time approximation scheme is demonstrated. These algorithms are applicable in
the one-pass sublinear-space data stream model of computation. They generalize
naturally to multiple dimensions and weighted norms. A universal representation
that provides a provable approximation guarantee under all p-norms
simultaneously; and the first approximation algorithms for bit-budget versions
of the problem, known as adaptive quantization, are also presented. Further, it
is shown that the algorithms presented here can be used to select a basis from
a tree-structured dictionary of bases and find a B-term representation of the
given function that provably approximates its best dictionary-basis
representation.Comment: Added a universal representation that provides a provable
approximation guarantee under all p-norms simultaneousl
- …