19 research outputs found
eWand: A calibration framework for wide baseline frame-based and event-based camera systems
Accurate calibration is crucial for using multiple cameras to triangulate the
position of objects precisely. However, it is also a time-consuming process
that needs to be repeated for every displacement of the cameras. The standard
approach is to use a printed pattern with known geometry to estimate the
intrinsic and extrinsic parameters of the cameras. The same idea can be applied
to event-based cameras, though it requires extra work. By using frame
reconstruction from events, a printed pattern can be detected. A blinking
pattern can also be displayed on a screen. Then, the pattern can be directly
detected from the events. Such calibration methods can provide accurate
intrinsic calibration for both frame- and event-based cameras. However, using
2D patterns has several limitations for multi-camera extrinsic calibration,
with cameras possessing highly different points of view and a wide baseline.
The 2D pattern can only be detected from one direction and needs to be of
significant size to compensate for its distance to the camera. This makes the
extrinsic calibration time-consuming and cumbersome. To overcome these
limitations, we propose eWand, a new method that uses blinking LEDs inside
opaque spheres instead of a printed or displayed pattern. Our method provides a
faster, easier-to-use extrinsic calibration approach that maintains high
accuracy for both event- and frame-based cameras
Long-Lived Accurate Keypoints in Event Streams
We present a novel end-to-end approach to keypoint detection and tracking in
an event stream that provides better precision and much longer keypoint tracks
than previous methods. This is made possible by two contributions working
together.
First, we propose a simple procedure to generate stable keypoint labels,
which we use to train a recurrent architecture. This training data results in
detections that are very consistent over time.
Moreover, we observe that previous methods for keypoint detection work on a
representation (such as the time surface) that integrates events over a period
of time. Since this integration is required, we claim it is better to predict
the keypoints' trajectories for the time period rather than single locations,
as done in previous approaches. We predict these trajectories in the form of a
series of heatmaps for the integration time period. This improves the keypoint
localization.
Our architecture can also be kept very simple, which results in very fast
inference times. We demonstrate our approach on the HVGA ATIS Corner dataset as
well as "The Event-Camera Dataset and Simulator" dataset, and show it results
in keypoint tracks that are three times longer and nearly twice as accurate as
the best previous state-of-the-art methods. We believe our approach can be
generalized to other event-based camera problems, and we release our source
code to encourage other authors to explore it
RGB-D-E: Event Camera Calibration for Fast 6-DOF Object Tracking
Augmented reality devices require multiple sensors to perform various tasks
such as localization and tracking. Currently, popular cameras are mostly
frame-based (e.g. RGB and Depth) which impose a high data bandwidth and power
usage. With the necessity for low power and more responsive augmented reality
systems, using solely frame-based sensors imposes limits to the various
algorithms that needs high frequency data from the environement. As such,
event-based sensors have become increasingly popular due to their low power,
bandwidth and latency, as well as their very high frequency data acquisition
capabilities. In this paper, we propose, for the first time, to use an
event-based camera to increase the speed of 3D object tracking in 6 degrees of
freedom. This application requires handling very high object speed to convey
compelling AR experiences. To this end, we propose a new system which combines
a recent RGB-D sensor (Kinect Azure) with an event camera (DAVIS346). We
develop a deep learning approach, which combines an existing RGB-D network
along with a novel event-based network in a cascade fashion, and demonstrate
that our approach significantly improves the robustness of a state-of-the-art
frame-based 6-DOF object tracker using our RGB-D-E pipeline.Comment: 9 pages, 9 figure
An Asynchronous Kalman Filter for Hybrid Event Cameras
Event cameras are ideally suited to capture HDR visual information without
blur but perform poorly on static or slowly changing scenes. Conversely,
conventional image sensors measure absolute intensity of slowly changing scenes
effectively but do poorly on high dynamic range or quickly changing scenes. In
this paper, we present an event-based video reconstruction pipeline for High
Dynamic Range (HDR) scenarios. The proposed algorithm includes a frame
augmentation pre-processing step that deblurs and temporally interpolates frame
data using events. The augmented frame and event data are then fused using a
novel asynchronous Kalman filter under a unifying uncertainty model for both
sensors. Our experimental results are evaluated on both publicly available
datasets with challenging lighting conditions and fast motions and our new
dataset with HDR reference. The proposed algorithm outperforms state-of-the-art
methods in both absolute intensity error (48% reduction) and image similarity
indexes (average 11% improvement).Comment: 12 pages, 6 figures, published in International Conference on
Computer Vision (ICCV) 202
Distractor-aware Event-based Tracking
Event cameras, or dynamic vision sensors, have recently achieved success from
fundamental vision tasks to high-level vision researches. Due to its ability to
asynchronously capture light intensity changes, event camera has an inherent
advantage to capture moving objects in challenging scenarios including objects
under low light, high dynamic range, or fast moving objects. Thus event camera
are natural for visual object tracking. However, the current event-based
trackers derived from RGB trackers simply modify the input images to event
frames and still follow conventional tracking pipeline that mainly focus on
object texture for target distinction. As a result, the trackers may not be
robust dealing with challenging scenarios such as moving cameras and cluttered
foreground. In this paper, we propose a distractor-aware event-based tracker
that introduces transformer modules into Siamese network architecture (named
DANet). Specifically, our model is mainly composed of a motion-aware network
and a target-aware network, which simultaneously exploits both motion cues and
object contours from event data, so as to discover motion objects and identify
the target object by removing dynamic distractors. Our DANet can be trained in
an end-to-end manner without any post-processing and can run at over 80 FPS on
a single V100. We conduct comprehensive experiments on two large event tracking
datasets to validate the proposed model. We demonstrate that our tracker has
superior performance against the state-of-the-art trackers in terms of both
accuracy and efficiency