There have been many image denoisers using deep neural networks, which
outperform conventional model-based methods by large margins. Recently,
self-supervised methods have attracted attention because constructing a large
real noise dataset for supervised training is an enormous burden. The most
representative self-supervised denoisers are based on blind-spot networks,
which exclude the receptive field's center pixel. However, excluding any input
pixel is abandoning some information, especially when the input pixel at the
corresponding output position is excluded. In addition, a standard blind-spot
network fails to reduce real camera noise due to the pixel-wise correlation of
noise, though it successfully removes independently distributed synthetic
noise. Hence, to realize a more practical denoiser, we propose a novel
self-supervised training framework that can remove real noise. For this, we
derive the theoretic upper bound of a supervised loss where the network is
guided by the downsampled blinded output. Also, we design a conditional
blind-spot network (C-BSN), which selectively controls the blindness of the
network to use the center pixel information. Furthermore, we exploit a random
subsampler to decorrelate noise spatially, making the C-BSN free of visual
artifacts that were often seen in downsample-based methods. Extensive
experiments show that the proposed C-BSN achieves state-of-the-art performance
on real-world datasets as a self-supervised denoiser and shows qualitatively
pleasing results without any post-processing or refinement