7 research outputs found
Multi-view Self-supervised Disentanglement for General Image Denoising
With its significant performance improvements, the deep learning paradigm has become a standard tool for modern image denoisers. While promising performance has been shown on seen noise distributions, existing approaches often suffer from generalisation to unseen noise types or general and real noise. It is understandable as the model is designed to learn paired mapping (e.g. from a noisy image to its clean version). In this paper, we instead propose to learn to disentangle the noisy image, under the intuitive assumption that different corrupted versions of the same clean image share a common latent space. A self-supervised learning framework is proposed to achieve the goal, without looking at the latent clean image. By taking two different corrupted versions of the same image as input, the proposed Multi-view Self-supervised Disentanglement (MeD) approach learns to disentangle the latent clean features from the corruptions and recover the clean image consequently. Extensive experimental analysis on both synthetic and real noise shows the superiority of the proposed method over prior self-supervised approaches, especially on unseen novel noise types. On real noise, the proposed method even outperforms its supervised counterparts by over 3 dB
Multi-view Self-supervised Disentanglement for General Image Denoising
With its significant performance improvements, the deep learning paradigm has
become a standard tool for modern image denoisers. While promising performance
has been shown on seen noise distributions, existing approaches often suffer
from generalisation to unseen noise types or general and real noise. It is
understandable as the model is designed to learn paired mapping (e.g. from a
noisy image to its clean version). In this paper, we instead propose to learn
to disentangle the noisy image, under the intuitive assumption that different
corrupted versions of the same clean image share a common latent space. A
self-supervised learning framework is proposed to achieve the goal, without
looking at the latent clean image. By taking two different corrupted versions
of the same image as input, the proposed Multi-view Self-supervised
Disentanglement (MeD) approach learns to disentangle the latent clean features
from the corruptions and recover the clean image consequently. Extensive
experimental analysis on both synthetic and real noise shows the superiority of
the proposed method over prior self-supervised approaches, especially on unseen
novel noise types. On real noise, the proposed method even outperforms its
supervised counterparts by over 3 dB.Comment: International Conference on Computer Vision 2023 (ICCV 2023
360 + x:A Panoptic Multi-modal Scene Understanding Dataset
Human perception of the world is shaped by a multitude of viewpoints and modalities. While many existing datasets focus on scene understanding from a certain perspective (e.g. egocentric or third-person views), our dataset offers a panoptic perspective (i.e. multiple viewpoints with multiple data modalities). Specifically, we encapsulate third-person panoramic and front views, as well as egocentric monocular/binocular views with rich modalities including video, multi-channel audio, directional binaural delay, location data and textual scene descriptions within each scene captured, presenting comprehensive observation of the world. To the best of our knowledge, this is the first database that covers multiple viewpoints with multiple data modalities to mimic how daily information is accessed in the real world. Through our benchmark analysis, we presented 5 different scene understanding tasks on the proposed 360+x dataset to evaluate the impact and benefit of each data modality and perspective in panoptic scene understanding. We hope this unique dataset could broaden the scope of comprehensive scene understanding and encourage the community to approach these problems from more diverse perspectives
360+x : A Panoptic Multi-modal Scene Understanding Dataset
360+x dataset introduces a unique panoptic perspective to scene understanding, differentiating itself from existing datasets, by offering multiple viewpoints and modalities, captured from a variety of scenes. Our dataset contains:
1. 2,152 multi-model videos captured by 360° cameras and Spectacles cameras (8,579k frames in total)
2. Capture in 17 cities across 5 countries.
3. Capture in 28 Scenes from Artistic Spaces to Natural Landscapes.
4. Temporal Activity Localisation Labels for 38 action instances for each video.
IMPORTANT NOTICE: Due to the large volume of the data files, they have been stored in the BEAR Research Data Store space. If you wish to access the data, please contact [email protected] to request and receive the relevant link to the data folder
360+x : A Panoptic Multi-modal Scene Understanding Dataset
60+x dataset introduces a unique panoptic perspective to scene understanding, differentiating itself from existing datasets, by offering multiple viewpoints and modalities, captured from a variety of scenes. Our dataset contains: 1. 2,152 multi-model videos captured by 360° cameras and Spectacles cameras (8,579k frames in total) 2. Capture in 17 cities across 5 countries. 3. Capture in 28 Scenes from Artistic Spaces to Natural Landscapes. 4. Temporal Activity Localisation Labels for 38 action instances for each video. IMPORTANT NOTICE: Due to the large volume of the data files, they have been stored in the BEAR Research Data Store space. If you wish to access the data, please contact [email protected] to request and receive the relevant link to the data folder