Search CORE

9 research outputs found

Frozen Transformers in Language Models Are Effective Visual Encoder Layers

Author: Man Yunze
Pang Ziqi
Wang Yu-Xiong
Xie Ziyang
Publication venue
Publication date: 19/10/2023
Field of study

This paper reveals that large language models (LLMs), despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language. Even more intriguingly, this can be achieved by a simple yet previously overlooked strategy -- employing a frozen transformer block from pre-trained LLMs as a constituent encoder layer to directly process visual tokens. Our work pushes the boundaries of leveraging LLMs for computer vision tasks, significantly departing from conventional practices that typically necessitate a multi-modal vision-language setup with associated language prompts, inputs, or outputs. We demonstrate that our approach consistently enhances performance across a diverse range of tasks, encompassing pure 2D and 3D visual recognition tasks (e.g., image and point cloud classification), temporal modeling tasks (e.g., action recognition), non-semantic tasks (e.g., motion forecasting), and multi-modal tasks (e.g., 2D/3D visual question answering and image-text retrieval). Such improvements are a general phenomenon, applicable to various types of LLMs (e.g., LLaMA and OPT) and different LLM transformer blocks. We additionally propose the information filtering hypothesis to explain the effectiveness of pre-trained LLMs in visual encoding -- the pre-trained LLM transformer blocks discern informative visual tokens and further amplify their effect. This hypothesis is empirically supported by the observation that the feature activation, after training with LLM transformer blocks, exhibits a stronger focus on relevant regions. We hope that our work inspires new perspectives on utilizing LLMs and deepening our understanding of their underlying mechanisms. Code is available at https://github.com/ziqipang/LM4VisualEncoding.Comment: 23 pages, 13 figures. Code at https://github.com/ziqipang/LM4VisualEncodin

arXiv.org e-Print Archive

DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception

Author: Gui Liang-Yan
Man Yunze
Wang Yu-Xiong
Publication venue
Publication date: 05/05/2023
Field of study

Closing the domain gap between training and deployment and incorporating multiple sensor modalities are two challenging yet critical topics for self-driving. Existing work only focuses on single one of the above topics, overlooking the simultaneous domain and modality shift which pervasively exists in real-world scenarios. A model trained with multi-sensor data collected in Europe may need to run in Asia with a subset of input sensors available. In this work, we propose DualCross, a cross-modality cross-domain adaptation framework to facilitate the learning of a more robust monocular bird's-eye-view (BEV) perception model, which transfers the point cloud knowledge from a LiDAR sensor in one domain during the training phase to the camera-only testing scenario in a different domain. This work results in the first open analysis of cross-domain cross-sensor perception and adaptation for monocular 3D tasks in the wild. We benchmark our approach on large-scale datasets under a wide range of domain shifts and show state-of-the-art results against various baselines.Comment: Preprint. Project website: https://yunzeman.github.io/DualCros

arXiv.org e-Print Archive

Deep Q Learning Driven CT Pancreas Segmentation With Geometry-Aware U-Net

Author: Fei Wu
Junyi Feng
Xi Li
Yangsibo Huang
Yunze Man
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Numerical Investigation on Heat-Transfer and Hydromechanical Performance inside Contaminant-Insensitive Sublimators under a Vacuum Environment for Spacecraft Applications

Author: Huijuan Xu
Lijun Gao
Man Yuan
Xianwen Ning
Xin Zhang
Yunze Li
Publication venue: 'MDPI AG'
Publication date: 29/11/2019
Field of study

The contaminant-insensitive sublimator (CIS) is a novel water sublimator in development, which uses two porous substrates to separate the sublimation point from the pressure-control point and provide long-life effective cooling for spacecraft. Many essential studies need to be carried out in the field. To overcome the reliability issues such as ice breakthrough caused by large temperature or pressure differences, the CIS development unit model, the mathematical models of heat and mass transfer and the evaluation coefficient have been established. Numerical investigations have been implemented aiming at the impacts of physical properties of porous substrate, physical properties of working fluid, orifice layouts and orifice-structure parameters on the characteristics of flow field and temperature field. The numerical investigation shows some valuable conclusion, such as the temperature uniformity coefficient at the bottom surface of the large pore substrate is 0.997669 and the pressure uniformity coefficient at the same surface is 0.85361267. These numerical results can provide structure and data reference for the CIS design of lunar probe or spacesuit

Multidisciplinary Digital Publishing Institute