Search CORE

4 research outputs found

Video Prediction by Efficient Transformers

Author: Bilodeau Guillaume-Alexandre
Ye Xi
Publication venue
Publication date: 12/12/2022
Field of study

Video prediction is a challenging computer vision task that has a wide range of applications. In this work, we present a new family of Transformer-based models for video prediction. Firstly, an efficient local spatial-temporal separation attention mechanism is proposed to reduce the complexity of standard Transformers. Then, a full autoregressive model, a partial autoregressive model and a non-autoregressive model are developed based on the new efficient Transformer. The partial autoregressive model has a similar performance with the full autoregressive model but a faster inference speed. The non-autoregressive model not only achieves a faster inference speed but also mitigates the quality degradation problem of the autoregressive counterparts, but it requires additional parameters and loss function for learning. Given the same attention mechanism, we conducted a comprehensive study to compare the proposed three video prediction variants. Experiments show that the proposed video prediction models are competitive with more complex state-of-the-art convolutional-LSTM based models. The source code is available at https://github.com/XiYe20/VPTR.Comment: Accepted by Image and Vision Computing. arXiv admin note: text overlap with arXiv:2203.1583

arXiv.org e-Print Archive

PolyPublie

Long-Term Video Prediction via Criticization and Retrospection

Author: Chang Xu
Dacheng Tao
Xiaokang Yang
Xinyuan Chen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Video prediction refers to predicting and generating future video frames given a set of consecutive frames. Conventional video prediction methods usually criticize the discrepancy between the ground-truth and predictions frame by frame. As the prediction error accumulates recursively, these methods would easily become out of control and are often confined to the short-term horizon. In this paper, we introduce a retrospection process to rectify the prediction errors beyond criticizing the future prediction. The introduced retrospection process is designed to look back what have been learned from the past and rectify the prediction deficiencies. To this end, we build a retrospection network to reconstruct the past frames given the currently predicted frames. A retrospection loss is introduced to push the retrospection frames being consistent with the observed frames, so that the prediction error is alleviated. On the other hand, an auxiliary route is built by reversing the flow of time and executing a similar retrospection. These two routes interact with each other to boost the performance of retrospection network and enhance the understanding of dynamics across frames, especially for the long-term horizon. An adversarial loss is employed to generate more realistic results in both prediction and retrospection process. In addition, the proposed method can be used to extend many state-of-the-art video prediction methods. Extensive experiments on the natural video dataset demonstrate the advantage of introducing the retrospection process for long-term video prediction

Crossref

Sydney eScholarship

Long-Term Video Prediction via Criticization and Retrospection

Author: Chen X
Tao D
Xu C
Yang X
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/02/2021
Field of study

© 1992-2012 IEEE. Video prediction refers to predicting and generating future video frames given a set of consecutive frames. Conventional video prediction methods usually criticize the discrepancy between the ground-truth and predictions frame by frame. As the prediction error accumulates recursively, these methods would easily become out of control and are often confined to the short-term horizon. In this paper, we introduce a retrospection process to rectify the prediction errors beyond criticizing the future prediction. The introduced retrospection process is designed to look back what have been learned from the past and rectify the prediction deficiencies. To this end, we build a retrospection network to reconstruct the past frames given the currently predicted frames. A retrospection loss is introduced to push the retrospection frames being consistent with the observed frames, so that the prediction error is alleviated. On the other hand, an auxiliary route is built by reversing the flow of time and executing a similar retrospection. These two routes interact with each other to boost the performance of retrospection network and enhance the understanding of dynamics across frames, especially for the long-term horizon. An adversarial loss is employed to generate more realistic results in both prediction and retrospection process. In addition, the proposed method can be used to extend many state-of-the-art video prediction methods. Extensive experiments on the natural video dataset demonstrate the advantage of introducing the retrospection process for long-term video prediction

OPUS - University of Technology Sydney