11 research outputs found

    DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction

    Full text link
    Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames. However, they neither fully exploit the 3D point-wise geometric correspondences, nor effectively tackle the ambiguities in the photometric warping caused by occlusions or illumination inconsistency. To address these problems, this work proposes Density Volume Construction Network (DevNet), a novel self-supervised monocular depth learning framework, that can consider 3D spatial information, and exploit stronger geometric constraints among adjacent camera frustums. Instead of directly regressing the pixel value from a single image, our DevNet divides the camera frustum into multiple parallel planes and predicts the pointwise occlusion probability density on each plane. The final depth map is generated by integrating the density along corresponding rays. During the training process, novel regularization strategies and loss functions are introduced to mitigate photometric ambiguities and overfitting. Without obviously enlarging model parameters size or running time, DevNet outperforms several representative baselines on both the KITTI-2015 outdoor dataset and NYU-V2 indoor dataset. In particular, the root-mean-square-deviation is reduced by around 4% with DevNet on both KITTI-2015 and NYU-V2 in the task of depth estimation. Code is available at https://github.com/gitkaichenzhou/DevNet.Comment: Accepted by European Conference on Computer Vision 2022 (ECCV2022

    PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection

    Full text link
    Recently, polar-based representation has shown promising properties in perceptual tasks. In addition to Cartesian-based approaches, which separate point clouds unevenly, representing point clouds as polar grids has been recognized as an alternative due to (1) its advantage in robust performance under different resolutions and (2) its superiority in streaming-based approaches. However, state-of-the-art polar-based detection methods inevitably suffer from the feature distortion problem because of the non-uniform division of polar representation, resulting in a non-negligible performance gap compared to Cartesian-based approaches. To tackle this issue, we present PARTNER, a novel 3D object detector in the polar coordinate. PARTNER alleviates the dilemma of feature distortion with global representation re-alignment and facilitates the regression by introducing instance-level geometric information into the detection head. Extensive experiments show overwhelming advantages in streaming-based detection and different resolutions. Furthermore, our method outperforms the previous polar-based works with remarkable margins of 3.68% and 9.15% on Waymo and ONCE validation set, thus achieving competitive results over the state-of-the-art methods.Comment: ICCV 202

    CLIP2^2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data

    Full text link
    Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled text-image pairs, has demonstrated great performance in open-world vision understanding tasks. However, due to the limited Text-3D data pairs, adapting the success of 2D Vision-Language Models (VLM) to the 3D space remains an open problem. Existing works that leverage VLM for 3D understanding generally resort to constructing intermediate 2D representations for the 3D data, but at the cost of losing 3D geometry information. To take a step toward open-world 3D vision understanding, we propose Contrastive Language-Image-Point Cloud Pretraining (CLIP2^2) to directly learn the transferable 3D point cloud representation in realistic scenarios with a novel proxy alignment mechanism. Specifically, we exploit naturally-existed correspondences in 2D and 3D scenarios, and build well-aligned and instance-based text-image-point proxies from those complex scenarios. On top of that, we propose a cross-modal contrastive objective to learn semantic and instance-level aligned point cloud representation. Experimental results on both indoor and outdoor scenarios show that our learned 3D representation has great transfer ability in downstream tasks, including zero-shot and few-shot 3D recognition, which boosts the state-of-the-art methods by large margins. Furthermore, we provide analyses of the capability of different representations in real scenarios and present the optional ensemble scheme.Comment: To appear at CVPR 202

    CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving

    Full text link
    Contemporary deep-learning object detection methods for autonomous driving usually assume prefixed categories of common traffic participants, such as pedestrians and cars. Most existing detectors are unable to detect uncommon objects and corner cases (e.g., a dog crossing a street), which may lead to severe accidents in some situations, making the timeline for the real-world application of reliable autonomous driving uncertain. One main reason that impedes the development of truly reliably self-driving systems is the lack of public datasets for evaluating the performance of object detectors on corner cases. Hence, we introduce a challenging dataset named CODA that exposes this critical problem of vision-based detectors. The dataset consists of 1500 carefully selected real-world driving scenes, each containing four object-level corner cases (on average), spanning more than 30 object categories. On CODA, the performance of standard object detectors trained on large-scale autonomous driving datasets significantly drops to no more than 12.8% in mAR. Moreover, we experiment with the state-of-the-art open-world object detector and find that it also fails to reliably identify the novel objects in CODA, suggesting that a robust perception system for autonomous driving is probably still far from reach. We expect our CODA dataset to facilitate further research in reliable detection for real-world autonomous driving. Our dataset will be released at https://coda-dataset.github.io.Comment: ECCV 202

    ONCE-3DLanes: Building Monocular 3D Lane Detection

    Full text link
    We present ONCE-3DLanes, a real-world autonomous driving dataset with lane layout annotation in 3D space. Conventional 2D lane detection from a monocular image yields poor performance of following planning and control tasks in autonomous driving due to the case of uneven road. Predicting the 3D lane layout is thus necessary and enables effective and safe driving. However, existing 3D lane detection datasets are either unpublished or synthesized from a simulated environment, severely hampering the development of this field. In this paper, we take steps towards addressing these issues. By exploiting the explicit relationship between point clouds and image pixels, a dataset annotation pipeline is designed to automatically generate high-quality 3D lane locations from 2D lane annotations in 211K road scenes. In addition, we present an extrinsic-free, anchor-free method, called SALAD, regressing the 3D coordinates of lanes in image view without converting the feature map into the bird's-eye view (BEV). To facilitate future research on 3D lane detection, we benchmark the dataset and provide a novel evaluation metric, performing extensive experiments of both existing approaches and our proposed method. The aim of our work is to revive the interest of 3D lane detection in a real-world scenario. We believe our work can lead to the expected and unexpected innovations in both academia and industry.Comment: CVPR 2022. Project page at https://once-3dlanes.github.i

    Aligned nanofibrous collagen membranes from fish swim bladder as a tough and acid-resistant suture for pH-regulated stomach perforation and tendon rupture

    No full text
    Abstract Background Wound closure in the complex body environment places higher requirements on suture’s mechanical and biological performance. In the scenario of frequent mechanical gastric motility and extremely low pH, single functional sutures have limitations in dealing with stomach bleeding trauma where the normal healing will get deteriorated in acid. It necessitates to advance suture, which can regulate wounds, resist acid and intelligently sense stomach pH. Methods Based on fish swim bladder, a double-stranded drug-loaded suture was fabricated. Its cytotoxicity, histocompatibility, mechanical properties, acid resistance and multiple functions were verified. Also, suture’s performance suturing gastric wounds and Achilles tendon was verified in an in vivo model. Results By investigating the swim bladder’s multi-scale structure, the aligned tough collagen fibrous membrane can resist high hydrostatic pressure. We report that the multi-functional sutures on the twisted and aligned collagen fibers have acid resistance and low tissue reaction. Working with an implantable “capsule robot”, the smart suture can inhibit gastric acid secretion, curb the prolonged stomach bleeding and monitor real-time pH changes in rabbits and pigs. The suture can promote stomach healing and is strong enough to stitch the fractured Achilles tendon. Conclusions As a drug-loaded absorbable suture, the suture shows excellent performance and good application prospect in clinical work

    In Vivo Anchoring Bis‐Pyrene Probe for Molecular Imaging of Early Gastric Cancer by Endoscopic Techniques

    No full text
    Abstract With the development of blue laser endoscopy (BLE) technique, it's often used to diagnose early gastric cancer (EGC) by the morphological changes of blood vessels through BLE. However, EGC is still not obvious to identify, resulting in a high rate of missed diagnosis. Molecular imaging can show the changes in early tumors at molecular level, which provides a possibility for diagnosing EGC. Therefore, developing a probe that visually monitors blood vessels of EGC under BLE is particularly necessary. Herein, a bis‐pyrene (BP) based nanoprobe (BP‐FFVLK‐(PEG)‐RGD, M1) is designed, which can target angiogenesis and self‐assemble into fibers in situ, resulting in stable and long‐term retention in tumor. Moreover, M1 probe can emit yellow‐green fluorescence for imaging under BLE. M1 probe is confirmed to steadily remain in tumor for up to 96 hours in mice transplanted subcutaneously. In addition, the M1 probe is able to target angiogenesis for molecular imaging of isolated human gastric cancer tissue under BLE. Finally, M1 probe i.v. injected into primary gastric cancer model rabbits successfully highlighted the tumor site under BLE, which is confirmed by pathological analysis. It's the first time to develop a probe for diagnosing EGC by visualizing angiogenesis under BLE, showing great clinical significance

    Additional file 1 of Aligned nanofibrous collagen membranes from fish swim bladder as a tough and acid-resistant suture for pH-regulated stomach perforation and tendon rupture

    No full text
    Additional file 1: Supplementary Fig. S1. PGA with highly complex structure under SEM. Supplementary Fig. S2. Two other crosslinking methods. Supplementary Fig. S3. The detailed process of fabricating DCDS suture with standardization. Supplementary Fig. S4. The tensile strength of double-layers swim bladder with and without crosslinking
    corecore