4 research outputs found
DarSwin: Distortion Aware Radial Swin Transformer
Wide-angle lenses are commonly used in perception tasks requiring a large
field of view. Unfortunately, these lenses produce significant distortions
making conventional models that ignore the distortion effects unable to adapt
to wide-angle images. In this paper, we present a novel transformer-based model
that automatically adapts to the distortion produced by wide-angle lenses. We
leverage the physical characteristics of such lenses, which are analytically
defined by the radial distortion profile (assumed to be known), to develop a
distortion aware radial swin transformer (DarSwin). In contrast to conventional
transformer-based architectures, DarSwin comprises a radial patch partitioning,
a distortion-based sampling technique for creating token embeddings, and a
polar position encoding for radial patch merging. We validate our method on
classification tasks using synthetically distorted ImageNet data and show
through extensive experiments that DarSwin can perform zero-shot adaptation to
unseen distortions of different wide-angle lenses. Compared to other baselines,
DarSwin achieves the best results (in terms of Top-1 and -5 accuracy), when
tested on in-distribution data, with almost 2% (6%) gain in Top-1 accuracy
under medium (high) distortion levels, and comparable to the state-of-the-art
under low and very low distortion levels (perspective-like images).Comment: 8 pages, 8 figure
Equirectangular image construction method for standard CNNs for Semantic Segmentation
360{\deg} spherical images have advantages of wide view field, and are
typically projected on a planar plane for processing, which is known as
equirectangular image. The object shape in equirectangular images can be
distorted and lack translation invariance. In addition, there are few publicly
dataset of equirectangular images with labels, which presents a challenge for
standard CNNs models to process equirectangular images effectively. To tackle
this problem, we propose a methodology for converting a perspective image into
equirectangular image. The inverse transformation of the spherical center
projection and the equidistant cylindrical projection are employed. This
enables the standard CNNs to learn the distortion features at different
positions in the equirectangular image and thereby gain the ability to
semantically the equirectangular image. The parameter, {\phi}, which determines
the projection position of the perspective image, has been analyzed using
various datasets and models, such as UNet, UNet++, SegNet, PSPNet, and DeepLab
v3+. The experiments demonstrate that an optimal value of {\phi} for effective
semantic segmentation of equirectangular images is 6{\pi}/16 for standard CNNs.
Compared with the other three types of methods (supervised learning,
unsupervised learning and data augmentation), the method proposed in this paper
has the best average IoU value of 43.76%. This value is 23.85%, 10.7% and
17.23% higher than those of other three methods, respectively
A Review of Environmental Context Detection for Navigation Based on Multiple Sensors
Current navigation systems use multi-sensor data to improve the localization accuracy, but often without certitude on the quality of those measurements in certain situations. The context detection will enable us to build an adaptive navigation system to improve the precision and the robustness of its localization solution by anticipating possible degradation in sensor signal quality (GNSS in urban canyons for instance or camera-based navigation in a non-textured environment). That is why context detection is considered the future of navigation systems. Thus, it is important firstly to define this concept of context for navigation and to find a way to extract it from available information. This paper overviews existing GNSS and on-board vision-based solutions of environmental context detection. This review shows that most of the state-of-the art research works focus on only one type of data. It confirms that the main perspective of this problem is to combine different indicators from multiple sensors