94,474 research outputs found
Collaboration Helps Camera Overtake LiDAR in 3D Detection
Camera-only 3D detection provides an economical solution with a simple
configuration for localizing objects in 3D space compared to LiDAR-based
detection systems. However, a major challenge lies in precise depth estimation
due to the lack of direct 3D measurements in the input. Many previous methods
attempt to improve depth estimation through network designs, e.g., deformable
layers and larger receptive fields. This work proposes an orthogonal direction,
improving the camera-only 3D detection by introducing multi-agent
collaborations. Our proposed collaborative camera-only 3D detection (CoCa3D)
enables agents to share complementary information with each other through
communication. Meanwhile, we optimize communication efficiency by selecting the
most informative cues. The shared messages from multiple viewpoints
disambiguate the single-agent estimated depth and complement the occluded and
long-range regions in the single-agent view. We evaluate CoCa3D in one
real-world dataset and two new simulation datasets. Results show that CoCa3D
improves previous SOTA performances by 44.21% on DAIR-V2X, 30.60% on OPV2V+,
12.59% on CoPerception-UAVs+ for AP@70. Our preliminary results show a
potential that with sufficient collaboration, the camera might overtake LiDAR
in some practical scenarios. We released the dataset and code at
https://siheng-chen.github.io/dataset/CoPerception+ and
https://github.com/MediaBrain-SJTU/CoCa3D.Comment: Accepted by CVPR2
Enabling monocular depth perception at the very edge
Depth estimation is crucial in several computer vision applications, and a recent trend aims at inferring such a cue from a single camera through computationally demanding CNNs - precluding their practical deployment in several application contexts characterized by low-power constraints. Purposely, we develop a tiny network tailored to microcontrollers, processing low-resolution images to obtain a coarse depth map of the observed scene. Our solution enables depth perception with minimal power requirements (a few hundreds of mW), accurately enough to pave the way to several high-level applications at-the-edge
Preliminary Survey of Multiview Synthesis Technology
[[abstract]]With the maturity of digital camera technology, it is feasible to form an array of cameras. The major usage of a camera array is to acquire different views of a scene in one shot. The captured data can be used to analyze the depths of the objects. Once we have the 3D model, we can synthesize virtual views, relight the scene, etc. The potential applications are virtual reality and augmented reality. In order to investigate the multiview technology, we studied the fundamental concepts, including the single lens camera, the eccentric lens camera, the plenoptic camera, and the multiview camera. We also discussed a few application examples for understanding the practical usages. The study showed that the initial depth estimation technology becomes important nowadays in providing photo realistic natural feel to people. The application areas were also extended to entertainment, and also to some crucial tasks such as medical operations and combat missions.[[conferencetype]]國際[[conferencedate]]20101015~20101017[[conferencelocation]]Darmstadt, German
Practical Auto-Calibration for Spatial Scene-Understanding from Crowdsourced Dashcamera Videos
Spatial scene-understanding, including dense depth and ego-motion estimation,
is an important problem in computer vision for autonomous vehicles and advanced
driver assistance systems. Thus, it is beneficial to design perception modules
that can utilize crowdsourced videos collected from arbitrary vehicular onboard
or dashboard cameras. However, the intrinsic parameters corresponding to such
cameras are often unknown or change over time. Typical manual calibration
approaches require objects such as a chessboard or additional scene-specific
information. On the other hand, automatic camera calibration does not have such
requirements. Yet, the automatic calibration of dashboard cameras is
challenging as forward and planar navigation results in critical motion
sequences with reconstruction ambiguities. Structure reconstruction of complete
visual-sequences that may contain tens of thousands of images is also
computationally untenable. Here, we propose a system for practical monocular
onboard camera auto-calibration from crowdsourced videos. We show the
effectiveness of our proposed system on the KITTI raw, Oxford RobotCar, and the
crowdsourced D-City datasets in varying conditions. Finally, we demonstrate
its application for accurate monocular dense depth and ego-motion estimation on
uncalibrated videos.Comment: Accepted at 16th International Conference on Computer Vision Theory
and Applications (VISAP, 2021
Self-Supervised Deep Visual Odometry with Online Adaptation
Self-supervised VO methods have shown great success in jointly estimating
camera pose and depth from videos. However, like most data-driven methods,
existing VO networks suffer from a notable decrease in performance when
confronted with scenes different from the training data, which makes them
unsuitable for practical applications. In this paper, we propose an online
meta-learning algorithm to enable VO networks to continuously adapt to new
environments in a self-supervised manner. The proposed method utilizes
convolutional long short-term memory (convLSTM) to aggregate rich
spatial-temporal information in the past. The network is able to memorize and
learn from its past experience for better estimation and fast adaptation to the
current frame. When running VO in the open world, in order to deal with the
changing environment, we propose an online feature alignment method by aligning
feature distributions at different time. Our VO network is able to seamlessly
adapt to different environments. Extensive experiments on unseen outdoor
scenes, virtual to real world and outdoor to indoor environments demonstrate
that our method consistently outperforms state-of-the-art self-supervised VO
baselines considerably.Comment: Accepted by CVPR 2020 ora
- …