The overall structure consists of three parts: Initial feature extraction FE(.), spatio-temporal module FSTAM(.) and reconstruction FR(.). The horizontal line is based on our MSBPN to explore the spatial information of target slice. The vertical line computes the residual features from a pair of target and neighbor slices to explore the temporal information. On each spatio-temporal attention module, the spatial information and the temporal information are connected and enhanced to recover the missing details.</p