Key-point-based scene understanding is fundamental for autonomous driving
applications. At the same time, optical flow plays an important role in many
vision tasks. However, due to the implicit bias of equal attention on all
points, classic data-driven optical flow estimation methods yield less
satisfactory performance on key points, limiting their implementations in
key-point-critical safety-relevant scenarios. To address these issues, we
introduce a points-based modeling method that requires the model to learn
key-point-related priors explicitly. Based on the modeling method, we present
FocusFlow, a framework consisting of 1) a mix loss function combined with a
classic photometric loss function and our proposed Conditional Point Control
Loss (CPCL) function for diverse point-wise supervision; 2) a conditioned
controlling model which substitutes the conventional feature encoder by our
proposed Condition Control Encoder (CCE). CCE incorporates a Frame Feature
Encoder (FFE) that extracts features from frames, a Condition Feature Encoder
(CFE) that learns to control the feature extraction behavior of FFE from input
masks containing information of key points, and fusion modules that transfer
the controlling information between FFE and CFE. Our FocusFlow framework shows
outstanding performance with up to +44.5% precision improvement on various key
points such as ORB, SIFT, and even learning-based SiLK, along with exceptional
scalability for most existing data-driven optical flow methods like PWC-Net,
RAFT, and FlowFormer. Notably, FocusFlow yields competitive or superior
performances rivaling the original models on the whole frame. The source code
will be available at https://github.com/ZhonghuaYi/FocusFlow_official.Comment: The source code of FocusFlow will be available at
https://github.com/ZhonghuaYi/FocusFlow_officia