4 research outputs found

    DarSwin: Distortion Aware Radial Swin Transformer

    Full text link
    Wide-angle lenses are commonly used in perception tasks requiring a large field of view. Unfortunately, these lenses produce significant distortions making conventional models that ignore the distortion effects unable to adapt to wide-angle images. In this paper, we present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses. We leverage the physical characteristics of such lenses, which are analytically defined by the radial distortion profile (assumed to be known), to develop a distortion aware radial swin transformer (DarSwin). In contrast to conventional transformer-based architectures, DarSwin comprises a radial patch partitioning, a distortion-based sampling technique for creating token embeddings, and a polar position encoding for radial patch merging. We validate our method on classification tasks using synthetically distorted ImageNet data and show through extensive experiments that DarSwin can perform zero-shot adaptation to unseen distortions of different wide-angle lenses. Compared to other baselines, DarSwin achieves the best results (in terms of Top-1 and -5 accuracy), when tested on in-distribution data, with almost 2% (6%) gain in Top-1 accuracy under medium (high) distortion levels, and comparable to the state-of-the-art under low and very low distortion levels (perspective-like images).Comment: 8 pages, 8 figure

    Equirectangular image construction method for standard CNNs for Semantic Segmentation

    Full text link
    360{\deg} spherical images have advantages of wide view field, and are typically projected on a planar plane for processing, which is known as equirectangular image. The object shape in equirectangular images can be distorted and lack translation invariance. In addition, there are few publicly dataset of equirectangular images with labels, which presents a challenge for standard CNNs models to process equirectangular images effectively. To tackle this problem, we propose a methodology for converting a perspective image into equirectangular image. The inverse transformation of the spherical center projection and the equidistant cylindrical projection are employed. This enables the standard CNNs to learn the distortion features at different positions in the equirectangular image and thereby gain the ability to semantically the equirectangular image. The parameter, {\phi}, which determines the projection position of the perspective image, has been analyzed using various datasets and models, such as UNet, UNet++, SegNet, PSPNet, and DeepLab v3+. The experiments demonstrate that an optimal value of {\phi} for effective semantic segmentation of equirectangular images is 6{\pi}/16 for standard CNNs. Compared with the other three types of methods (supervised learning, unsupervised learning and data augmentation), the method proposed in this paper has the best average IoU value of 43.76%. This value is 23.85%, 10.7% and 17.23% higher than those of other three methods, respectively

    A Review of Environmental Context Detection for Navigation Based on Multiple Sensors

    Get PDF
    Current navigation systems use multi-sensor data to improve the localization accuracy, but often without certitude on the quality of those measurements in certain situations. The context detection will enable us to build an adaptive navigation system to improve the precision and the robustness of its localization solution by anticipating possible degradation in sensor signal quality (GNSS in urban canyons for instance or camera-based navigation in a non-textured environment). That is why context detection is considered the future of navigation systems. Thus, it is important firstly to define this concept of context for navigation and to find a way to extract it from available information. This paper overviews existing GNSS and on-board vision-based solutions of environmental context detection. This review shows that most of the state-of-the art research works focus on only one type of data. It confirms that the main perspective of this problem is to combine different indicators from multiple sensors
    corecore