Event cameras are capable of responding to log-brightness changes in
microseconds. Its characteristic of producing responses only to the changing
region is particularly suitable for optical flow estimation. In contrast to the
super low-latency response speed of event cameras, existing datasets collected
via event cameras, however, only provide limited frame rate optical flow ground
truth, (e.g., at 10Hz), greatly restricting the potential of event-driven
optical flow. To address this challenge, we put forward a high-frame-rate,
low-latency event representation Unified Voxel Grid, sequentially fed into the
network bin by bin. We then propose EVA-Flow, an EVent-based Anytime Flow
estimation network to produce high-frame-rate event optical flow with only
low-frame-rate optical flow ground truth for supervision. The key component of
our EVA-Flow is the stacked Spatiotemporal Motion Refinement (SMR) module,
which predicts temporally-dense optical flow and enhances the accuracy via
spatial-temporal motion refinement. The time-dense feature warping utilized in
the SMR module provides implicit supervision for the intermediate optical flow.
Additionally, we introduce the Rectified Flow Warp Loss (RFWL) for the
unsupervised evaluation of intermediate optical flow in the absence of ground
truth. This is, to the best of our knowledge, the first work focusing on
anytime optical flow estimation via event cameras. A comprehensive variety of
experiments on MVSEC, DESC, and our EVA-FlowSet demonstrates that EVA-Flow
achieves competitive performance, super-low-latency (5ms), fastest inference
(9.2ms), time-dense motion estimation (200Hz), and strong generalization. Our
code will be available at https://github.com/Yaozhuwa/EVA-Flow.Comment: Code will be available at https://github.com/Yaozhuwa/EVA-Flo