23 research outputs found

    Frozen CLIP Model is An Efficient Point Cloud Backbone

    Full text link
    The pretraining-finetuning paradigm has demonstrated great success in NLP and 2D image fields because of the high-quality representation ability and transferability of their pretrained models. However, pretraining such a strong model is difficult in the 3D point cloud field since the training data is limited and point cloud collection is expensive. This paper introduces Efficient Point Cloud Learning (EPCL), an effective and efficient point cloud learner for directly training high-quality point cloud models with a frozen CLIP model. Our EPCL connects the 2D and 3D modalities by semantically aligning the 2D features and point cloud features without paired 2D-3D data. Specifically, the input point cloud is divided into a sequence of tokens and directly fed into the frozen CLIP model to learn point cloud representation. Furthermore, we design a task token to narrow the gap between 2D images and 3D point clouds. Comprehensive experiments on 3D detection, semantic segmentation, classification and few-shot learning demonstrate that the 2D CLIP model can be an efficient point cloud backbone and our method achieves state-of-the-art accuracy on both real-world and synthetic downstream tasks. Code will be available.Comment: Technical repor

    Experts Weights Averaging: A New General Training Scheme for Vision Transformers

    Full text link
    Structural re-parameterization is a general training scheme for Convolutional Neural Networks (CNNs), which achieves performance improvement without increasing inference cost. As Vision Transformers (ViTs) are gradually surpassing CNNs in various visual tasks, one may question: if a training scheme specifically for ViTs exists that can also achieve performance improvement without increasing inference cost? Recently, Mixture-of-Experts (MoE) has attracted increasing attention, as it can efficiently scale up the capacity of Transformers at a fixed cost through sparsely activated experts. Considering that MoE can also be viewed as a multi-branch structure, can we utilize MoE to implement a ViT training scheme similar to structural re-parameterization? In this paper, we affirmatively answer these questions, with a new general training strategy for ViTs. Specifically, we decouple the training and inference phases of ViTs. During training, we replace some Feed-Forward Networks (FFNs) of the ViT with specially designed, more efficient MoEs that assign tokens to experts by random uniform partition, and perform Experts Weights Averaging (EWA) on these MoEs at the end of each iteration. After training, we convert each MoE into an FFN by averaging the experts, transforming the model back into original ViT for inference. We further provide a theoretical analysis to show why and how it works. Comprehensive experiments across various 2D and 3D visual tasks, ViT architectures, and datasets validate the effectiveness and generalizability of the proposed training scheme. Besides, our training scheme can also be applied to improve performance when fine-tuning ViTs. Lastly, but equally important, the proposed EWA technique can significantly improve the effectiveness of naive MoE in various 2D visual small datasets and 3D visual tasks.Comment: 12 pages, 2 figure

    LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

    Full text link
    Large language models have become a potential pathway toward achieving artificial general intelligence. Recent works on multi-modal large language models have demonstrated their effectiveness in handling visual modalities. In this work, we extend the research of MLLMs to point clouds and present the LAMM-Dataset and LAMM-Benchmark for 2D image and 3D point cloud understanding. We also establish an extensible framework to facilitate the extension of MLLMs to additional modalities. Our main contribution is three-fold: 1) We present the LAMM-Dataset and LAMM-Benchmark, which cover almost all high-level vision tasks for 2D and 3D vision. Extensive experiments validate the effectiveness of our dataset and benchmark. 2) We demonstrate the detailed methods of constructing instruction-tuning datasets and benchmarks for MLLMs, which will enable future research on MLLMs to scale up and extend to other domains, tasks, and modalities faster. 3) We provide a primary but potential MLLM training framework optimized for modalities' extension. We also provide baseline models, comprehensive experimental observations, and analysis to accelerate future research. Codes and datasets are now available at https://github.com/OpenLAMM/LAMM.Comment: 37 pages, 33 figures. Code available at https://github.com/OpenLAMM/LAMM ; Project page: https://openlamm.github.io

    The state subdivision of public traffic vehicles based on K-means algorithm

    No full text

    Multi-index cutting parameters optimization for surface quality and cutting energy consumption of boring

    No full text
    Saving energy is one of the ways to achieve sustainable development. As an important equipment for manufacturing, machine tool has the characteristics of high energy consumption and high emission. In order to cope with reducing energy consumption and carbon emissions without reducing processing quality, the search for optimal cutting parameters requires balancing the contradiction between machining quality and cutting energy consumption, so that cutting parameters can both reduce energy consumption and ensure the quality of processing. It plays an important role in achieving energy saving and emission reduction. In this paper, the processing quality (residual stress, surface roughness) and cutting energy consumption are selected as the optimized multiple indicators, and the selected optimization indicators are analyzed. Weighted grey correlation analysis is used to obtain the multi-index gray correlation degree value, and the multi-index weight coefficient is determined. Based on weighted grey correlation analysis and multi-index orthogonal optimization method, the cutting parameters of the boring process are optimized, and the optimal parameter combination is that cutting depth of 0.05 mm, cutting speed of 120 m/min, and feed rate of 80 mm/min

    Mudrocks Lithofacies Characteristics and North-South Hydrocarbon Generation Difference of the Shahejie Formation in the Dongpu Sag

    No full text
    Lacustrine mudrocks are composed of minerals and organic matter (OM). The origin and preservation of OM are two controlling factors of the hydrocarbon generation capacity of mudrocks. It is a key method in source rock research to study the deposition process from the view of the OM and sedimentary environment. Following this idea, the reason for the discrepancy in hydrocarbon production between the northern and the southern part of Dongpu Sag is analyzed and discussed. The lacustrine mudrocks of the Shahejie Formation in Dongpu Sag are sampled and analyzed for information about mineralogy, microstructure, elemental geochemistry, and OM characteristics. The mudrocks are then divided into three lithofacies: silt-rich massive mudstone, homogeneous massive mudstone, and laminated mudstone. Each lithofacies shows distinct characteristics, and the hydrocarbon generation ability of them increases in sequence. Further discussion that the differences in hydrocarbon generation are caused by the sedimentary environment. The water depth, salinity, and reducibility of the sedimentary environments of these three lithofacies increase in sequence, as well. The correlation analysis indicates that it is the environment that controls the origin, accumulation, and preservation of OM in each lithofacies and then causes the great differences in hydrocarbon generation capacity. In Dongpu Sag, the proportion of laminated mudstone is much higher in the northern part, which leads to greater oil/gas production than the southern part. In research of source rocks, both the lithofacies characteristics and the sedimentary environments that control the characteristics should be studied

    Effectiveness of enteral feeding protocol on clinical outcomes in critically ill patients: A before and after study.

    No full text
    Enteral nutrition (EN) feeding protocol was proposed to have positive impact on critically ill patients. However, current studies showed conflicting results. The present study aimed to investigate whether enteral feeding protocol was able to improve clinical outcomes in critically ill patients.A before (stage 1) and after (stage 2) interventional study was performed in 10 tertiary care hospitals. All patients expected to stay in the intensive care unit (ICU) for over three days were potentially eligible. Clinical outcomes such as 28-day mortality, ICU length of stay, duration of mechanical ventilation (MV), and nosocomial infection were compared between the two stages.A total of 410 patients were enrolled during the study period, including 236 in stage 1 and 174 in stage 2. EN feeding protocol was able to increase the proportion of EN in day 2 (41.8±22.3 vs. 50.0±28.3%; p = 0.006) and day 6 (70.3±25.2 vs. 77.6±25.8%; p = 0.006). EN percentages tended to be higher in stage 1 than that in stage 2 on other days, but statistical significance was not reached. There was no difference in 28-day mortality between stage 1 and 2 (0.14 vs. 0.14; p = 0.984). Implementation of EN feeding protocol marginally reduced ICU length of stay (19.44±18.48 vs. 16.29±16.19 days; p = 0.077). There was no difference in the duration of MV between stage a and stage 2 (14.24±14.49 vs. 14.51±17.55 days; p = 0.877).The study found that the EN feeding protocol was able to increase the proportion of EN feeding, but failed to reduce 28-day mortality, incidence of nosocomial infection or duration of MV
    corecore