608 research outputs found
DarSwin: Distortion Aware Radial Swin Transformer
Wide-angle lenses are commonly used in perception tasks requiring a large
field of view. Unfortunately, these lenses produce significant distortions
making conventional models that ignore the distortion effects unable to adapt
to wide-angle images. In this paper, we present a novel transformer-based model
that automatically adapts to the distortion produced by wide-angle lenses. We
leverage the physical characteristics of such lenses, which are analytically
defined by the radial distortion profile (assumed to be known), to develop a
distortion aware radial swin transformer (DarSwin). In contrast to conventional
transformer-based architectures, DarSwin comprises a radial patch partitioning,
a distortion-based sampling technique for creating token embeddings, and a
polar position encoding for radial patch merging. We validate our method on
classification tasks using synthetically distorted ImageNet data and show
through extensive experiments that DarSwin can perform zero-shot adaptation to
unseen distortions of different wide-angle lenses. Compared to other baselines,
DarSwin achieves the best results (in terms of Top-1 and -5 accuracy), when
tested on in-distribution data, with almost 2% (6%) gain in Top-1 accuracy
under medium (high) distortion levels, and comparable to the state-of-the-art
under low and very low distortion levels (perspective-like images).Comment: 8 pages, 8 figure
PDO-eCNNs: Partial Differential Operator Based Equivariant Spherical CNNs
Spherical signals exist in many applications, e.g., planetary data, LiDAR
scans and digitalization of 3D objects, calling for models that can process
spherical data effectively. It does not perform well when simply projecting
spherical data into the 2D plane and then using planar convolution neural
networks (CNNs), because of the distortion from projection and ineffective
translation equivariance. Actually, good principles of designing spherical CNNs
are avoiding distortions and converting the shift equivariance property in
planar CNNs to rotation equivariance in the spherical domain. In this work, we
use partial differential operators (PDOs) to design a spherical equivariant
CNN, PDO-eCNN, which is exactly rotation equivariant in the
continuous domain. We then discretize PDO-eCNNs, and analyze
the equivariance error resulted from discretization. This is the first time
that the equivariance error is theoretically analyzed in the spherical domain.
In experiments, PDO-eCNNs show greater parameter efficiency
and outperform other spherical CNNs significantly on several tasks.Comment: Accepted by AAAI202
Unsupervised Learning of Depth and Ego-Motion from Cylindrical Panoramic Video
We introduce a convolutional neural network model for unsupervised learning
of depth and ego-motion from cylindrical panoramic video. Panoramic depth
estimation is an important technology for applications such as virtual reality,
3D modeling, and autonomous robotic navigation. In contrast to previous
approaches for applying convolutional neural networks to panoramic imagery, we
use the cylindrical panoramic projection which allows for the use of the
traditional CNN layers such as convolutional filters and max pooling without
modification. Our evaluation of synthetic and real data shows that unsupervised
learning of depth and ego-motion on cylindrical panoramic images can produce
high-quality depth maps and that an increased field-of-view improves ego-motion
estimation accuracy. We also introduce Headcam, a novel dataset of panoramic
video collected from a helmet-mounted camera while biking in an urban setting.Comment: Accepted to IEEE AIVR 201
- …