Although convolutional neural networks (CNNs) have been proposed to remove
adverse weather conditions in single images using a single set of pre-trained
weights, they fail to restore weather videos due to the absence of temporal
information. Furthermore, existing methods for removing adverse weather
conditions (e.g., rain, fog, and snow) from videos can only handle one type of
adverse weather. In this work, we propose the first framework for restoring
videos from all adverse weather conditions by developing a video
adverse-weather-component suppression network (ViWS-Net). To achieve this, we
first devise a weather-agnostic video transformer encoder with multiple
transformer stages. Moreover, we design a long short-term temporal modeling
mechanism for weather messenger to early fuse input adjacent video frames and
learn weather-specific information. We further introduce a weather
discriminator with gradient reversion, to maintain the weather-invariant common
information and suppress the weather-specific information in pixel features, by
adversarially predicting weather types. Finally, we develop a messenger-driven
video transformer decoder to retrieve the residual weather-specific feature,
which is spatiotemporally aggregated with hierarchical pixel features and
refined to predict the clean target frame of input videos. Experimental
results, on benchmark datasets and real-world weather videos, demonstrate that
our ViWS-Net outperforms current state-of-the-art methods in terms of restoring
videos degraded by any weather condition