U-Net and its variants have been widely used in medical image segmentation.
However, most current U-Net variants confine their improvement strategies to
building more complex encoder, while leaving the decoder unchanged or adopting
a simple symmetric structure. These approaches overlook the true functionality
of the decoder: receiving low-resolution feature maps from the encoder and
restoring feature map resolution and lost information through upsampling. As a
result, the decoder, especially its upsampling component, plays a crucial role
in enhancing segmentation outcomes. However, in 3D medical image segmentation,
the commonly used transposed convolution can result in visual artifacts. This
issue stems from the absence of direct relationship between adjacent pixels in
the output feature map. Furthermore, plain encoder has already possessed
sufficient feature extraction capability because downsampling operation leads
to the gradual expansion of the receptive field, but the loss of information
during downsampling process is unignorable. To address the gap in relevant
research, we extend our focus beyond the encoder and introduce neU-Net (i.e.,
not complex encoder U-Net), which incorporates a novel Sub-pixel Convolution
for upsampling to construct a powerful decoder. Additionally, we introduce
multi-scale wavelet inputs module on the encoder side to provide additional
information. Our model design achieves excellent results, surpassing other
state-of-the-art methods on both the Synapse and ACDC datasets