416 research outputs found

    Multi-View 3D Object Detection Network for Autonomous Driving

    Full text link
    This paper aims at high-accuracy 3D object detection in autonomous driving scenario. We propose Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes. We encode the sparse 3D point cloud with a compact multi-view representation. The network is composed of two subnetworks: one for 3D object proposal generation and another for multi-view feature fusion. The proposal network generates 3D candidate boxes efficiently from the bird's eye view representation of 3D point cloud. We design a deep fusion scheme to combine region-wise features from multiple views and enable interactions between intermediate layers of different paths. Experiments on the challenging KITTI benchmark show that our approach outperforms the state-of-the-art by around 25% and 30% AP on the tasks of 3D localization and 3D detection. In addition, for 2D detection, our approach obtains 10.3% higher AP than the state-of-the-art on the hard data among the LIDAR-based methods.Comment: To appear in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 201

    Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

    Full text link
    Multi-view depth estimation has achieved impressive performance over various benchmarks. However, almost all current multi-view systems rely on given ideal camera poses, which are unavailable in many real-world scenarios, such as autonomous driving. In this work, we propose a new robustness benchmark to evaluate the depth estimation system under various noisy pose settings. Surprisingly, we find current multi-view depth estimation methods or single-view and multi-view fusion methods will fail when given noisy pose settings. To address this challenge, we propose a single-view and multi-view fused depth estimation system, which adaptively integrates high-confident multi-view and single-view results for both robust and accurate depth estimations. The adaptive fusion module performs fusion by dynamically selecting high-confidence regions between two branches based on a wrapping confidence map. Thus, the system tends to choose the more reliable branch when facing textureless scenes, inaccurate calibration, dynamic objects, and other degradation or challenging conditions. Our method outperforms state-of-the-art multi-view and fusion methods under robustness testing. Furthermore, we achieve state-of-the-art performance on challenging benchmarks (KITTI and DDAD) when given accurate pose estimations. Project website: https://github.com/Junda24/AFNet/.Comment: Accepted to CVPR 202

    READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises

    Full text link
    For many real-world applications, the user-generated inputs usually contain various noises due to speech recognition errors caused by linguistic variations1 or typographical errors (typos). Thus, it is crucial to test model performance on data with realistic input noises to ensure robustness and fairness. However, little study has been done to construct such benchmarks for Chinese, where various language-specific input noises happen in the real world. In order to fill this important gap, we construct READIN: a Chinese multi-task benchmark with REalistic And Diverse Input Noises. READIN contains four diverse tasks and requests annotators to re-enter the original test data with two commonly used Chinese input methods: Pinyin input and speech input. We designed our annotation pipeline to maximize diversity, for example by instructing the annotators to use diverse input method editors (IMEs) for keyboard noises and recruiting speakers from diverse dialectical groups for speech noises. We experiment with a series of strong pretrained language models as well as robust training methods, we find that these models often suffer significant performance drops on READIN even with robustness methods like data augmentation. As the first large-scale attempt in creating a benchmark with noises geared towards user-generated inputs, we believe that READIN serves as an important complement to existing Chinese NLP benchmarks. The source code and dataset can be obtained from https://github.com/thunlp/READIN.Comment: Preprin

    Characterization of Soybean Protein Adhesives Modified by Xanthan Gum

    Get PDF
    The aim of this study was to provide a basis for the preparation of medical adhesives from soybean protein sources. Soybean protein (SP) adhesives mixed with different concentrations of xanthan gum (XG) were prepared. Their adhesive features were evaluated by physicochemical parameters and an in vitro bone adhesion assay. The results showed that the maximal adhesion strength was achieved in 5% SP adhesive with 0.5% XG addition, which was 2.6-fold higher than the SP alone. The addition of XG significantly increased the hydrogen bond and viscosity, as well as increased the β-sheet content but decreased the α-helix content in the second structure of protein. X-ray diffraction data showed significant interactions between SP molecules and XG. Scanning electron microscopy observations showed that the surface of SP adhesive modified by XG was more viscous and compact, which were favorable for the adhesion between the adhesive and bone. In summary, XG modification caused an increase in the hydrogen bonding and zero-shear viscosity of SP adhesives, leading to a significant increase in the bond strength of SP adhesives onto porcine bones

    Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes

    Full text link
    Multi-frame depth estimation generally achieves high accuracy relying on the multi-view geometric consistency. When applied in dynamic scenes, e.g., autonomous driving, this consistency is usually violated in the dynamic areas, leading to corrupted estimations. Many multi-frame methods handle dynamic areas by identifying them with explicit masks and compensating the multi-view cues with monocular cues represented as local monocular depth or features. The improvements are limited due to the uncontrolled quality of the masks and the underutilized benefits of the fusion of the two types of cues. In this paper, we propose a novel method to learn to fuse the multi-view and monocular cues encoded as volumes without needing the heuristically crafted masks. As unveiled in our analyses, the multi-view cues capture more accurate geometric information in static areas, and the monocular cues capture more useful contexts in dynamic areas. To let the geometric perception learned from multi-view cues in static areas propagate to the monocular representation in dynamic areas and let monocular cues enhance the representation of multi-view cost volume, we propose a cross-cue fusion (CCF) module, which includes the cross-cue attention (CCA) to encode the spatially non-local relative intra-relations from each source to enhance the representation of the other. Experiments on real-world datasets prove the significant effectiveness and generalization ability of the proposed method.Comment: Accepted by CVPR 2023. Code and models are available at: https://github.com/ruili3/dynamic-multiframe-dept

    Corticosteroid Activation of Atlantic Sea Lamprey Corticoid Receptor: Allosteric Regulation by the N-terminal Domain

    Full text link
    Lampreys are jawless fish that evolved about 550 million years ago at the base of the vertebrate line. Modern lampreys contain a corticoid receptor (CR), the common ancestor of the glucocorticoid receptor (GR) and mineralocorticoid receptor (MR), which first appear in cartilaginous fish, such as sharks. Until recently, 344 amino acids at the amino terminus of adult lamprey CR were not present in the lamprey CR sequence in GenBank. A search of the recently sequenced lamprey germline genome identified two CR sequences, CR1 and CR2, containing the 344 previously un-identified amino acids at the amino terminus. CR1 also contains a novel four amino acid insertion in the DNA-binding domain (DBD). We studied corticosteroid activation of CR1 and CR2 and found their strongest response was to 11-deoxycorticosterone and 11-deoxycortisol, the two circulating corticosteroids in lamprey. Based on steroid specificity, both CRs are close to elephant shark MR and distant from elephant shark GR. HEK293 cells transfected with full-length CR1 or CR2 and the MMTV promoter have about 3-fold higher steroid-mediated activation compared to HEK293 cells transfected with these CRs and the TAT3 promoter. Deletion of the amino-terminal domain (NTD) of lamprey CR1 and CR2 to form truncated CRs decreased transcriptional activation by about 70% in HEK293 cells transfected with MMTV, but increased transcription by about 6-fold in cells transfected with TAT3, indicating that the promoter has an important effect on NTD regulation of CR transcription by corticosteroids.Comment: 27 pages, 6 figure
    corecore