9 research outputs found
Frozen Transformers in Language Models Are Effective Visual Encoder Layers
This paper reveals that large language models (LLMs), despite being trained
solely on textual data, are surprisingly strong encoders for purely visual
tasks in the absence of language. Even more intriguingly, this can be achieved
by a simple yet previously overlooked strategy -- employing a frozen
transformer block from pre-trained LLMs as a constituent encoder layer to
directly process visual tokens. Our work pushes the boundaries of leveraging
LLMs for computer vision tasks, significantly departing from conventional
practices that typically necessitate a multi-modal vision-language setup with
associated language prompts, inputs, or outputs. We demonstrate that our
approach consistently enhances performance across a diverse range of tasks,
encompassing pure 2D and 3D visual recognition tasks (e.g., image and point
cloud classification), temporal modeling tasks (e.g., action recognition),
non-semantic tasks (e.g., motion forecasting), and multi-modal tasks (e.g.,
2D/3D visual question answering and image-text retrieval). Such improvements
are a general phenomenon, applicable to various types of LLMs (e.g., LLaMA and
OPT) and different LLM transformer blocks. We additionally propose the
information filtering hypothesis to explain the effectiveness of pre-trained
LLMs in visual encoding -- the pre-trained LLM transformer blocks discern
informative visual tokens and further amplify their effect. This hypothesis is
empirically supported by the observation that the feature activation, after
training with LLM transformer blocks, exhibits a stronger focus on relevant
regions. We hope that our work inspires new perspectives on utilizing LLMs and
deepening our understanding of their underlying mechanisms. Code is available
at https://github.com/ziqipang/LM4VisualEncoding.Comment: 23 pages, 13 figures. Code at
https://github.com/ziqipang/LM4VisualEncodin
DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception
Closing the domain gap between training and deployment and incorporating
multiple sensor modalities are two challenging yet critical topics for
self-driving. Existing work only focuses on single one of the above topics,
overlooking the simultaneous domain and modality shift which pervasively exists
in real-world scenarios. A model trained with multi-sensor data collected in
Europe may need to run in Asia with a subset of input sensors available. In
this work, we propose DualCross, a cross-modality cross-domain adaptation
framework to facilitate the learning of a more robust monocular bird's-eye-view
(BEV) perception model, which transfers the point cloud knowledge from a LiDAR
sensor in one domain during the training phase to the camera-only testing
scenario in a different domain. This work results in the first open analysis of
cross-domain cross-sensor perception and adaptation for monocular 3D tasks in
the wild. We benchmark our approach on large-scale datasets under a wide range
of domain shifts and show state-of-the-art results against various baselines.Comment: Preprint. Project website: https://yunzeman.github.io/DualCros
Numerical Investigation on Heat-Transfer and Hydromechanical Performance inside Contaminant-Insensitive Sublimators under a Vacuum Environment for Spacecraft Applications
The contaminant-insensitive sublimator (CIS) is a novel water sublimator in development, which uses two porous substrates to separate the sublimation point from the pressure-control point and provide long-life effective cooling for spacecraft. Many essential studies need to be carried out in the field. To overcome the reliability issues such as ice breakthrough caused by large temperature or pressure differences, the CIS development unit model, the mathematical models of heat and mass transfer and the evaluation coefficient have been established. Numerical investigations have been implemented aiming at the impacts of physical properties of porous substrate, physical properties of working fluid, orifice layouts and orifice-structure parameters on the characteristics of flow field and temperature field. The numerical investigation shows some valuable conclusion, such as the temperature uniformity coefficient at the bottom surface of the large pore substrate is 0.997669 and the pressure uniformity coefficient at the same surface is 0.85361267. These numerical results can provide structure and data reference for the CIS design of lunar probe or spacesuit