11 research outputs found
Noise transfer for unsupervised domain adaptation of retinal OCT images
Optical coherence tomography (OCT) imaging from different camera devices
causes challenging domain shifts and can cause a severe drop in accuracy for
machine learning models. In this work, we introduce a minimal noise adaptation
method based on a singular value decomposition (SVDNA) to overcome the domain
gap between target domains from three different device manufacturers in retinal
OCT imaging. Our method utilizes the difference in noise structure to
successfully bridge the domain gap between different OCT devices and transfer
the style from unlabeled target domain images to source images for which manual
annotations are available. We demonstrate how this method, despite its
simplicity, compares or even outperforms state-of-the-art unsupervised domain
adaptation methods for semantic segmentation on a public OCT dataset. SVDNA can
be integrated with just a few lines of code into the augmentation pipeline of
any network which is in contrast to many state-of-the-art domain adaptation
methods which often need to change the underlying model architecture or train a
separate style transfer model. The full code implementation for SVDNA is
available at https://github.com/ValentinKoch/SVDNA.Comment: published at MICCAI 202
Learning Part Segmentation from Synthetic Animals
Semantic part segmentation provides an intricate and interpretable
understanding of an object, thereby benefiting numerous downstream tasks.
However, the need for exhaustive annotations impedes its usage across diverse
object types. This paper focuses on learning part segmentation from synthetic
animals, leveraging the Skinned Multi-Animal Linear (SMAL) models to scale up
existing synthetic data generated by computer-aided design (CAD) animal models.
Compared to CAD models, SMAL models generate data with a wider range of poses
observed in real-world scenarios. As a result, our first contribution is to
construct a synthetic animal dataset of tigers and horses with more pose
diversity, termed Synthetic Animal Parts (SAP). We then benchmark Syn-to-Real
animal part segmentation from SAP to PartImageNet, namely SynRealPart, with
existing semantic segmentation domain adaptation methods and further improve
them as our second contribution. Concretely, we examine three Syn-to-Real
adaptation methods but observe relative performance drop due to the innate
difference between the two tasks. To address this, we propose a simple yet
effective method called Class-Balanced Fourier Data Mixing (CB-FDM). Fourier
Data Mixing aligns the spectral amplitudes of synthetic images with real
images, thereby making the mixed images have more similar frequency content to
real images. We further use Class-Balanced Pseudo-Label Re-Weighting to
alleviate the imbalanced class distribution. We demonstrate the efficacy of
CB-FDM on SynRealPart over previous methods with significant performance
improvements. Remarkably, our third contribution is to reveal that the learned
parts from synthetic tiger and horse are transferable across all quadrupeds in
PartImageNet, further underscoring the utility and potential applications of
animal part segmentation
Unsupervised Model Adaptation for Continual Semantic Segmentation
We develop an algorithm for adapting a semantic segmentation model that is
trained using a labeled source domain to generalize well in an unlabeled target
domain. A similar problem has been studied extensively in the unsupervised
domain adaptation (UDA) literature, but existing UDA algorithms require access
to both the source domain labeled data and the target domain unlabeled data for
training a domain agnostic semantic segmentation model. Relaxing this
constraint enables a user to adapt pretrained models to generalize in a target
domain, without requiring access to source data. To this end, we learn a
prototypical distribution for the source domain in an intermediate embedding
space. This distribution encodes the abstract knowledge that is learned from
the source domain. We then use this distribution for aligning the target domain
distribution with the source domain distribution in the embedding space. We
provide theoretical analysis and explain conditions under which our algorithm
is effective. Experiments on benchmark adaptation task demonstrate our method
achieves competitive performance even compared with joint UDA approaches.Comment: 12 pages, 5 figure
Does Monocular Depth Estimation Provide Better Pre-training than Classification for Semantic Segmentation?
Training a deep neural network for semantic segmentation is labor-intensive,
so it is common to pre-train it for a different task, and then fine-tune it
with a small annotated dataset. State-of-the-art methods use image
classification for pre-training, which introduces uncontrolled biases. We test
the hypothesis that depth estimation from unlabeled videos may provide better
pre-training. Despite the absence of any semantic information, we argue that
estimating scene geometry is closer to the task of semantic segmentation than
classifying whole images into semantic classes. Since analytical validation is
intractable, we test the hypothesis empirically by introducing a pre-training
scheme that yields an improvement of 5.7% mIoU and 4.1% pixel accuracy over
classification-based pre-training. While annotation is not needed for
pre-training, it is needed for testing the hypothesis. We use the KITTI
(outdoor) and NYU-V2 (indoor) benchmarks to that end, and provide an extensive
discussion of the benefits and limitations of the proposed scheme in relation
to existing unsupervised, self-supervised, and semi-supervised pre-training
protocols
Image Manipulation and Image Synthesis
Image manipulation is of historic importance. Ever since the advent of photography, pictures have been manipulated for various reasons. Historic rulers often used image manipulation techniques for the purpose of self-portrayal or propaganda. In many cases, the goal is to manipulate human behaviour by spreading credible misinformation. Photographs, by their nature, portray the real world and as such are more credible to humans. However, image manipulation may not only serve evil purposes. In this thesis, we propose and analyse methods for image manipulation that serve a positive purpose. Specifically, we treat image manipulation as a tool for solving other tasks. For this, we model image manipulation as an image-to-image translation (I2I) task, i.e., a system that receives an image as input and outputs a manipulated version of the input. We propose multiple I2I based methods. We demonstrate that I2I based image manipulation methods can be used to reduce motion blur in videos. Second, we show that I2I based image manipulation methods can be used for domain adaptation and domain extension. Specifically, we present a method that significantly improves the learning of semantic segmentation from synthetic source data. The same technique can be applied to learning nighttime semantic segmentation from daylight images. Next, we show that I2I can be used to enable weakly supervised object segmentation.
We show that each individual task requires and allows for different levels of supervision during the training of deep models in order to achieve best performance. We discuss the importance of maintaining control over the output of such methods and show that, with reduced levels of supervision, methods for maintaining stability during training and for establishing control over the output of a system become increasingly important. We propose multiple methods that solve the issues that arise in such systems. Finally, we demonstrate that our proposed mechanisms for control can be adapted to synthesise images from scratch