23 research outputs found

    EA-BEV: Edge-aware Bird' s-Eye-View Projector for 3D Object Detection

    Full text link
    In recent years, great progress has been made in the Lift-Splat-Shot-based (LSS-based) 3D object detection method, which converts features of 2D camera view and 3D lidar view to Bird's-Eye-View (BEV) for feature fusion. However, inaccurate depth estimation (e.g. the 'depth jump' problem) is an obstacle to develop LSS-based methods. To alleviate the 'depth jump' problem, we proposed Edge-Aware Bird's-Eye-View (EA-BEV) projector. By coupling proposed edge-aware depth fusion module and depth estimate module, the proposed EA-BEV projector solves the problem and enforces refined supervision on depth. Besides, we propose sparse depth supervision and gradient edge depth supervision, for constraining learning on global depth and local marginal depth information. Our EA-BEV projector is a plug-and-play module for any LSS-based 3D object detection models, and effectively improves the baseline performance. We demonstrate the effectiveness on the nuScenes benchmark. On the nuScenes 3D object detection validation dataset, our proposed EA-BEV projector can boost several state-of-the-art LLS-based baselines on nuScenes 3D object detection benchmark and nuScenes BEV map segmentation benchmark with negligible increment of inference time

    Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

    Full text link
    Self-supervised audio-visual source localization aims to locate sound-source objects in video frames without extra annotations. Recent methods often approach this goal with the help of contrastive learning, which assumes only the audio and visual contents from the same video are positive samples for each other. However, this assumption would suffer from false negative samples in real-world training. For example, for an audio sample, treating the frames from the same audio class as negative samples may mislead the model and therefore harm the learned representations e.g., the audio of a siren wailing may reasonably correspond to the ambulances in multiple images). Based on this observation, we propose a new learning strategy named False Negative Aware Contrastive (FNAC) to mitigate the problem of misleading the training with such false negative samples. Specifically, we utilize the intra-modal similarities to identify potentially similar samples and construct corresponding adjacency matrices to guide contrastive learning. Further, we propose to strengthen the role of true negative samples by explicitly leveraging the visual features of sound sources to facilitate the differentiation of authentic sounding source regions. FNAC achieves state-of-the-art performances on Flickr-SoundNet, VGG-Sound, and AVSBench, which demonstrates the effectiveness of our method in mitigating the false negative issue. The code is available at \url{https://github.com/OpenNLPLab/FNAC_AVL}.Comment: CVPR202

    Baichuan 2: Open Large-scale Language Models

    Full text link
    Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens. Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval. Furthermore, Baichuan 2 excels in vertical domains such as medicine and law. We will release all pre-training model checkpoints to benefit the research community in better understanding the training dynamics of Baichuan 2.Comment: Baichuan 2 technical report. Github: https://github.com/baichuan-inc/Baichuan

    Automated Defect Analysis System for Industrial Computerized Tomography Images of Solid Rocket Motor Grains Based on YOLO-V4 Model

    No full text
    As industrial computerized tomography (ICT) is widely used in the non-destructive testing of a solid rocket motor (SRM), the problem of how to automatically discriminate defect types and measure defect sizes with high accuracy in ICT images of SRM grains needs to be urgently solved. To address the problems of low manual recognition efficiency and data utilization in the ICT image analysis of SRM grains, we proposed an automated defect analysis (ADA) system for ICT images of SRM grains based on the YOLO-V4 model. Using the region proposal of the YOLO-V4 model, a region growing algorithm with automatic selection of seed points was proposed to segment the defect areas of the ICT images of grains. Defect sizes were automatically measured based on the automatic determination of defect types by the YOLO-V4 model. In this paper, the image recognition performance of YOLO-V4, YOLO-V3, and Faster R-CNN models were compared. The results show that the average accuracy (mAP) of the YOLO-V4 model is more than 15% higher than that of the YOLO-V3 and Faster R-CNN models, the F1-score is 0.970, and the detection time per image is 0.152 s. The ADA system can measure defect sizes with an error of less than 10%. Tests show that the system proposed in this paper can automatically analyze the defects in ICT images of SRM grains and has certain application value

    Automated Defect Analysis System for Industrial Computerized Tomography Images of Solid Rocket Motor Grains Based on YOLO-V4 Model

    No full text
    As industrial computerized tomography (ICT) is widely used in the non-destructive testing of a solid rocket motor (SRM), the problem of how to automatically discriminate defect types and measure defect sizes with high accuracy in ICT images of SRM grains needs to be urgently solved. To address the problems of low manual recognition efficiency and data utilization in the ICT image analysis of SRM grains, we proposed an automated defect analysis (ADA) system for ICT images of SRM grains based on the YOLO-V4 model. Using the region proposal of the YOLO-V4 model, a region growing algorithm with automatic selection of seed points was proposed to segment the defect areas of the ICT images of grains. Defect sizes were automatically measured based on the automatic determination of defect types by the YOLO-V4 model. In this paper, the image recognition performance of YOLO-V4, YOLO-V3, and Faster R-CNN models were compared. The results show that the average accuracy (mAP) of the YOLO-V4 model is more than 15% higher than that of the YOLO-V3 and Faster R-CNN models, the F1-score is 0.970, and the detection time per image is 0.152 s. The ADA system can measure defect sizes with an error of less than 10%. Tests show that the system proposed in this paper can automatically analyze the defects in ICT images of SRM grains and has certain application value

    Refining deep convolutional features for improving fine-grained image recognition

    Get PDF
    Abstract Fine-grained image recognition, a computer vision task filled with challenges due to its imperceptible inter-class variance and large intra-class variance, has been drawing increasing attention. While manual annotation can be utilized to effectively enhance performance in this task, it is extremely time-consuming and expensive. Recently, Convolutional Neural Networks (CNN) achieved state-of-the-art performance in image classification. We propose a fine-grained image recognition framework by exploiting CNN as the raw feature extractor along with several effective methods including a feature encoding method, a feature weighting method, and a strategy to better incorporate information from multi-scale images to further improve recognition ability. Besides, we investigate two dimension reduction methods and successfully merge them to our framework to compact the final image representation. Based on the discriminative and compact framework, we achieved the state-of-the-art performance in terms of classification accuracy on several fine-grained image recognition benchmarks based on weekly supervision

    Rapid Characterization of Fatty Acids in Oleaginous Microalgae by Near-Infrared Spectroscopy

    No full text
    The key properties of microalgal biodiesel are largely determined by the composition of its fatty acid methyl esters (FAMEs). The gas chromatography (GC) based techniques for fatty acid analysis involve energy-intensive and time-consuming procedures and thus are less suitable for high-throughput screening applications. In the present study, a novel quantification method for microalgal fatty acids was established based on the near-infrared spectroscopy (NIRS) technique. The lyophilized cells of oleaginous Chlorella containing different contents of lipids were scanned by NIRS and their fatty acid profiles were determined by GC-MS. NIRS models were developed based on the chemometric correlation of the near-infrared spectra with fatty acid profiles in algal biomass. The optimized NIRS models showed excellent performances for predicting the contents of total fatty acids, C16:0, C18:0, C18:1 and C18:3, with the coefficient of determination (R2) being 0.998, 0.997, 0.989, 0.991 and 0.997, respectively. Taken together, the NIRS method established here bypasses the procedures of cell disruption, oil extraction and transesterification, is rapid, reliable, and of great potential for high-throughput applications, and will facilitate the screening of microalgal mutants and optimization of their growth conditions for biodiesel production
    corecore