Dynamic vision sensors or event cameras provide rich complementary
information for video frame interpolation. Existing state-of-the-art methods
follow the paradigm of combining both synthesis-based and warping networks.
However, few of those methods fully respect the intrinsic characteristics of
events streams. Given that event cameras only encode intensity changes and
polarity rather than color intensities, estimating optical flow from events is
arguably more difficult than from RGB information. We therefore propose to
incorporate RGB information in an event-guided optical flow refinement
strategy. Moreover, in light of the quasi-continuous nature of the time signals
provided by event cameras, we propose a divide-and-conquer strategy in which
event-based intermediate frame synthesis happens incrementally in multiple
simplified stages rather than in a single, long stage. Extensive experiments on
both synthetic and real-world datasets show that these modifications lead to
more reliable and realistic intermediate frame results than previous video
frame interpolation methods. Our findings underline that a careful
consideration of event characteristics such as high temporal density and
elevated noise benefits interpolation accuracy.Comment: Accepted by IROS2023 Project Site:
https://jiabenchen.github.io/revisit_even