Event cameras that asynchronously output low-latency event streams provide
great opportunities for state estimation under challenging situations. Despite
event-based visual odometry having been extensively studied in recent years,
most of them are based on monocular and few research on stereo event vision. In
this paper, we present ESVIO, the first event-based stereo visual-inertial
odometry, which leverages the complementary advantages of event streams,
standard images and inertial measurements. Our proposed pipeline achieves
temporal tracking and instantaneous matching between consecutive stereo event
streams, thereby obtaining robust state estimation. In addition, the motion
compensation method is designed to emphasize the edge of scenes by warping each
event to reference moments with IMU and ESVIO back-end. We validate that both
ESIO (purely event-based) and ESVIO (event with image-aided) have superior
performance compared with other image-based and event-based baseline methods on
public and self-collected datasets. Furthermore, we use our pipeline to perform
onboard quadrotor flights under low-light environments. A real-world
large-scale experiment is also conducted to demonstrate long-term
effectiveness. We highlight that this work is a real-time, accurate system that
is aimed at robust state estimation under challenging environments