Reference-based line-art colorization is a challenging task in computer
vision. The color, texture, and shading are rendered based on an abstract
sketch, which heavily relies on the precise long-range dependency modeling
between the sketch and reference. Popular techniques to bridge the cross-modal
information and model the long-range dependency employ the attention mechanism.
However, in the context of reference-based line-art colorization, several
techniques would intensify the existing training difficulty of attention, for
instance, self-supervised training protocol and GAN-based losses. To understand
the instability in training, we detect the gradient flow of attention and
observe gradient conflict among attention branches. This phenomenon motivates
us to alleviate the gradient issue by preserving the dominant gradient branch
while removing the conflict ones. We propose a novel attention mechanism using
this training strategy, Stop-Gradient Attention (SGA), outperforming the
attention baseline by a large margin with better training stability. Compared
with state-of-the-art modules in line-art colorization, our approach
demonstrates significant improvements in Fr\'echet Inception Distance (FID, up
to 27.21%) and structural similarity index measure (SSIM, up to 25.67%) on
several benchmarks. The code of SGA is available at
https://github.com/kunkun0w0/SGA .Comment: Accepted by ECCV202