56 research outputs found
Panoramic Annular Localizer: Tackling the Variation Challenges of Outdoor Localization Using Panoramic Annular Images and Active Deep Descriptors
Visual localization is an attractive problem that estimates the camera
localization from database images based on the query image. It is a crucial
task for various applications, such as autonomous vehicles, assistive
navigation and augmented reality. The challenging issues of the task lie in
various appearance variations between query and database images, including
illumination variations, dynamic object variations and viewpoint variations. In
order to tackle those challenges, Panoramic Annular Localizer into which
panoramic annular lens and robust deep image descriptors are incorporated is
proposed in this paper. The panoramic annular images captured by the single
camera are processed and fed into the NetVLAD network to form the active deep
descriptor, and sequential matching is utilized to generate the localization
result. The experiments carried on the public datasets and in the field
illustrate the validation of the proposed system.Comment: Accepted by ITSC 201
Quantitative comparison of the distribution of densities in three Swedish cities
[EN] Typologies play a role in urban studies since a long time, but definitions are often rather abstract, ill-defined and at worst end in fixed stereotypes hiding underlying spatial complexity. Traditional typologies are focussing on separate elements, which allow for understanding crucial differences of one spatial feature in greater detail, but lack the capacity to capture the interrelation between elements. Further, they often focus on one scale level and therefore lack to acknowledge for interscalarity. Recent publications define morphological typologies based on quantitative variables, building on the seminal book ´Urban Space and Structures´ by Martin and March, published in 1972, but using more advanced spatial analysis and statistics. These approaches contribute to the discussion of types in two ways: firstly, they define types in a precise and repeatable manner allowing for city-scale comparisons; secondly, they acknowledge cross-scale dynamics important for e.g. living qualities and economic processes where not only the local conditions are important, but also the qualities in proximity. This paper focuses on the comparison of building types in three Swedish cities, using the multi-variable and multi-scalar density definition. A statistical clustering method is used to classify cases according to their measured similarity across the scales. The results show that working with types is a fruitful way to reveal the individual identity of these types, compare cities and highlight some differences in the way the three cities are structured.Berghauser Pont, M.; Stavroulaki, G.; Sun, K.; Abshirini, E.; Olsson, J.; Marcus, L. (2018). Quantitative comparison of the distribution of densities in three Swedish cities. En 24th ISUF International Conference. Book of Papers. Editorial Universitat Politècnica de València. 1327-1336. https://doi.org/10.4995/ISUF2017.2017.5317OCS1327133
Computational Optics Meet Domain Adaptation: Transferring Semantic Segmentation Beyond Aberrations
Semantic scene understanding with Minimalist Optical Systems (MOS) in mobile
and wearable applications remains a challenge due to the corrupted imaging
quality induced by optical aberrations. However, previous works only focus on
improving the subjective imaging quality through computational optics, i.e.
Computational Imaging (CI) technique, ignoring the feasibility in semantic
segmentation. In this paper, we pioneer to investigate Semantic Segmentation
under Optical Aberrations (SSOA) of MOS. To benchmark SSOA, we construct
Virtual Prototype Lens (VPL) groups through optical simulation, generating
Cityscapes-ab and KITTI-360-ab datasets under different behaviors and levels of
aberrations. We look into SSOA via an unsupervised domain adaptation
perspective to address the scarcity of labeled aberration data in real-world
scenarios. Further, we propose Computational Imaging Assisted Domain Adaptation
(CIADA) to leverage prior knowledge of CI for robust performance in SSOA. Based
on our benchmark, we conduct experiments on the robustness of state-of-the-art
segmenters against aberrations. In addition, extensive evaluations of possible
solutions to SSOA reveal that CIADA achieves superior performance under all
aberration distributions, paving the way for the applications of MOS in
semantic scene understanding. Code and dataset will be made publicly available
at https://github.com/zju-jiangqi/CIADA.Comment: Code and dataset will be made publicly available at
https://github.com/zju-jiangqi/CIAD
Event-Based Fusion for Motion Deblurring with Cross-modal Attention
Traditional frame-based cameras inevitably suffer from motion blur due to long exposure times. As a kind of bio-inspired camera, the event camera records the intensity changes in an asynchronous way with high temporal resolution, providing valid image degradation information within the exposure time. In this paper, we rethink the event-based image deblurring problem and unfold it into an end-to-end two-stage image restoration network. To effectively fuse event and image features, we design an event-image cross-modal attention module applied at multiple levels of our network, which allows to focus on relevant features from the event branch and filter out noise. We also introduce a novel symmetric cumulative event representation specifically for image deblurring as well as an event mask gated connection between the two stages of our network which helps avoid information loss. At the dataset level, to foster event-based motion deblurring and to facilitate evaluation on challenging real-world images, we introduce the Real Event Blur (REBlur) dataset, captured with an event camera in an illumination controlled optical laboratory. Our Event Fusion Network (EFNet) sets the new state of the art in motion deblurring, surpassing both the prior best-performing image-based method and all event-based methods with public implementations on the GoPro dataset (by up to 2.47dB) and on our REBlur dataset, even in extreme blurry conditions. The code and our REBlur dataset will be made publicly available
Event-Based Fusion for Motion Deblurring with Cross-modal Attention
Traditional frame-based cameras inevitably suffer from motion blur due to long exposure times. As a kind of bio-inspired camera, the event camera records the intensity changes in an asynchronous way with high temporal resolution, providing valid image degradation information within the exposure time. In this paper, we rethink the event-based image deblurring problem and unfold it into an end-to-end two-stage image restoration network. To effectively fuse event and image features, we design an event-image cross-modal attention module applied at multiple levels of our network, which allows to focus on relevant features from the event branch and filter out noise. We also introduce a novel symmetric cumulative event representation specifically for image deblurring as well as an event mask gated connection between the two stages of our network which helps avoid information loss. At the dataset level, to foster event-based motion deblurring and to facilitate evaluation on challenging real-world images, we introduce the Real Event Blur (REBlur) dataset, captured with an event camera in an illumination controlled optical laboratory. Our Event Fusion Network (EFNet) sets the new state of the art in motion deblurring, surpassing both the prior best-performing image-based method and all event-based methods with public implementations on the GoPro dataset (by up to 2.47dB) and on our REBlur dataset, even in extreme blurry conditions. The code and our REBlur dataset will be made publicly available
Minimalist and High-Quality Panoramic Imaging with PSF-aware Transformers
High-quality panoramic images with a Field of View (FoV) of 360-degree are
essential for contemporary panoramic computer vision tasks. However,
conventional imaging systems come with sophisticated lens designs and heavy
optical components. This disqualifies their usage in many mobile and wearable
applications where thin and portable, minimalist imaging systems are desired.
In this paper, we propose a Panoramic Computational Imaging Engine (PCIE) to
address minimalist and high-quality panoramic imaging. With less than three
spherical lenses, a Minimalist Panoramic Imaging Prototype (MPIP) is
constructed based on the design of the Panoramic Annular Lens (PAL), but with
low-quality imaging results due to aberrations and small image plane size. We
propose two pipelines, i.e. Aberration Correction (AC) and Super-Resolution and
Aberration Correction (SR&AC), to solve the image quality problems of MPIP,
with imaging sensors of small and large pixel size, respectively. To provide a
universal network for the two pipelines, we leverage the information from the
Point Spread Function (PSF) of the optical system and design a PSF-aware
Aberration-image Recovery Transformer (PART), in which the self-attention
calculation and feature extraction are guided via PSF-aware mechanisms. We
train PART on synthetic image pairs from simulation and put forward the PALHQ
dataset to fill the gap of real-world high-quality PAL images for low-level
vision. A comprehensive variety of experiments on synthetic and real-world
benchmarks demonstrates the impressive imaging results of PCIE and the
effectiveness of plug-and-play PSF-aware mechanisms. We further deliver
heuristic experimental findings for minimalist and high-quality panoramic
imaging. Our dataset and code will be available at
https://github.com/zju-jiangqi/PCIE-PART.Comment: The dataset and code will be available at
https://github.com/zju-jiangqi/PCIE-PAR
- …