4 research outputs found
Video Prediction by Efficient Transformers
Video prediction is a challenging computer vision task that has a wide range
of applications. In this work, we present a new family of Transformer-based
models for video prediction. Firstly, an efficient local spatial-temporal
separation attention mechanism is proposed to reduce the complexity of standard
Transformers. Then, a full autoregressive model, a partial autoregressive model
and a non-autoregressive model are developed based on the new efficient
Transformer. The partial autoregressive model has a similar performance with
the full autoregressive model but a faster inference speed. The
non-autoregressive model not only achieves a faster inference speed but also
mitigates the quality degradation problem of the autoregressive counterparts,
but it requires additional parameters and loss function for learning. Given the
same attention mechanism, we conducted a comprehensive study to compare the
proposed three video prediction variants. Experiments show that the proposed
video prediction models are competitive with more complex state-of-the-art
convolutional-LSTM based models. The source code is available at
https://github.com/XiYe20/VPTR.Comment: Accepted by Image and Vision Computing. arXiv admin note: text
overlap with arXiv:2203.1583
Long-Term Video Prediction via Criticization and Retrospection
Video prediction refers to predicting and generating future video frames given a set of consecutive frames. Conventional video prediction methods usually criticize the discrepancy between the ground-truth and predictions frame by frame. As the prediction error accumulates recursively, these methods would easily become out of control and are often confined to the short-term horizon. In this paper, we introduce a retrospection process to rectify the prediction errors beyond criticizing the future prediction. The introduced retrospection process is designed to look back what have been learned from the past and rectify the prediction deficiencies. To this end, we build a retrospection network to reconstruct the past frames given the currently predicted frames. A retrospection loss is introduced to push the retrospection frames being consistent with the observed frames, so that the prediction error is alleviated. On the other hand, an auxiliary route is built by reversing the flow of time and executing a similar retrospection. These two routes interact with each other to boost the performance of retrospection network and enhance the understanding of dynamics across frames, especially for the long-term horizon. An adversarial loss is employed to generate more realistic results in both prediction and retrospection process. In addition, the proposed method can be used to extend many state-of-the-art video prediction methods. Extensive experiments on the natural video dataset demonstrate the advantage of introducing the retrospection process for long-term video prediction
Long-Term Video Prediction via Criticization and Retrospection
© 1992-2012 IEEE. Video prediction refers to predicting and generating future video frames given a set of consecutive frames. Conventional video prediction methods usually criticize the discrepancy between the ground-truth and predictions frame by frame. As the prediction error accumulates recursively, these methods would easily become out of control and are often confined to the short-term horizon. In this paper, we introduce a retrospection process to rectify the prediction errors beyond criticizing the future prediction. The introduced retrospection process is designed to look back what have been learned from the past and rectify the prediction deficiencies. To this end, we build a retrospection network to reconstruct the past frames given the currently predicted frames. A retrospection loss is introduced to push the retrospection frames being consistent with the observed frames, so that the prediction error is alleviated. On the other hand, an auxiliary route is built by reversing the flow of time and executing a similar retrospection. These two routes interact with each other to boost the performance of retrospection network and enhance the understanding of dynamics across frames, especially for the long-term horizon. An adversarial loss is employed to generate more realistic results in both prediction and retrospection process. In addition, the proposed method can be used to extend many state-of-the-art video prediction methods. Extensive experiments on the natural video dataset demonstrate the advantage of introducing the retrospection process for long-term video prediction