2 research outputs found
Reveal of Domain Effect: How Visual Restoration Contributes to Object Detection in Aquatic Scenes
Underwater robotic perception usually requires visual restoration and object
detection, both of which have been studied for many years. Meanwhile, data
domain has a huge impact on modern data-driven leaning process. However,
exactly indicating domain effect, the relation between restoration and
detection remains unclear. In this paper, we generally investigate the relation
of quality-diverse data domain to detection performance. In the meantime, we
unveil how visual restoration contributes to object detection in real-world
underwater scenes. According to our analysis, five key discoveries are
reported: 1) Domain quality has an ignorable effect on within-domain
convolutional representation and detection accuracy; 2) low-quality domain
leads to higher generalization ability in cross-domain detection; 3)
low-quality domain can hardly be well learned in a domain-mixed learning
process; 4) degrading recall efficiency, restoration cannot improve
within-domain detection accuracy; 5) visual restoration is beneficial to
detection in the wild by reducing the domain shift between training data and
real-world scenes. Finally, as an illustrative example, we successfully perform
underwater object detection with an aquatic robot
Rethinking Temporal Object Detection from Robotic Perspectives
Video object detection (VID) has been vigorously studied for years but almost
all literature adopts a static accuracy-based evaluation, i.e., average
precision (AP). From a robotic perspective, the importance of recall continuity
and localization stability is equal to that of accuracy, but the AP is
insufficient to reflect detectors' performance across time. In this paper,
non-reference assessments are proposed for continuity and stability based on
object tracklets. These temporal evaluations can serve as supplements to static
AP. Further, we develop an online tracklet refinement for improving detectors'
temporal performance through short tracklet suppression, fragment filling, and
temporal location fusion.
In addition, we propose a small-overlap suppression to extend VID methods to
single object tracking (SOT) task so that a flexible SOT-by-detection framework
is then formed.
Extensive experiments are conducted on ImageNet VID dataset and real-world
robotic tasks, where the superiority of our proposed approaches are validated
and verified. Codes will be publicly available