9 research outputs found

    Frozen Transformers in Language Models Are Effective Visual Encoder Layers

    Full text link
    This paper reveals that large language models (LLMs), despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language. Even more intriguingly, this can be achieved by a simple yet previously overlooked strategy -- employing a frozen transformer block from pre-trained LLMs as a constituent encoder layer to directly process visual tokens. Our work pushes the boundaries of leveraging LLMs for computer vision tasks, significantly departing from conventional practices that typically necessitate a multi-modal vision-language setup with associated language prompts, inputs, or outputs. We demonstrate that our approach consistently enhances performance across a diverse range of tasks, encompassing pure 2D and 3D visual recognition tasks (e.g., image and point cloud classification), temporal modeling tasks (e.g., action recognition), non-semantic tasks (e.g., motion forecasting), and multi-modal tasks (e.g., 2D/3D visual question answering and image-text retrieval). Such improvements are a general phenomenon, applicable to various types of LLMs (e.g., LLaMA and OPT) and different LLM transformer blocks. We additionally propose the information filtering hypothesis to explain the effectiveness of pre-trained LLMs in visual encoding -- the pre-trained LLM transformer blocks discern informative visual tokens and further amplify their effect. This hypothesis is empirically supported by the observation that the feature activation, after training with LLM transformer blocks, exhibits a stronger focus on relevant regions. We hope that our work inspires new perspectives on utilizing LLMs and deepening our understanding of their underlying mechanisms. Code is available at https://github.com/ziqipang/LM4VisualEncoding.Comment: 23 pages, 13 figures. Code at https://github.com/ziqipang/LM4VisualEncodin

    DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception

    Full text link
    Closing the domain gap between training and deployment and incorporating multiple sensor modalities are two challenging yet critical topics for self-driving. Existing work only focuses on single one of the above topics, overlooking the simultaneous domain and modality shift which pervasively exists in real-world scenarios. A model trained with multi-sensor data collected in Europe may need to run in Asia with a subset of input sensors available. In this work, we propose DualCross, a cross-modality cross-domain adaptation framework to facilitate the learning of a more robust monocular bird's-eye-view (BEV) perception model, which transfers the point cloud knowledge from a LiDAR sensor in one domain during the training phase to the camera-only testing scenario in a different domain. This work results in the first open analysis of cross-domain cross-sensor perception and adaptation for monocular 3D tasks in the wild. We benchmark our approach on large-scale datasets under a wide range of domain shifts and show state-of-the-art results against various baselines.Comment: Preprint. Project website: https://yunzeman.github.io/DualCros

    Deep Q Learning Driven CT Pancreas Segmentation With Geometry-Aware U-Net

    No full text

    Numerical Investigation on Heat-Transfer and Hydromechanical Performance inside Contaminant-Insensitive Sublimators under a Vacuum Environment for Spacecraft Applications

    No full text
    The contaminant-insensitive sublimator (CIS) is a novel water sublimator in development, which uses two porous substrates to separate the sublimation point from the pressure-control point and provide long-life effective cooling for spacecraft. Many essential studies need to be carried out in the field. To overcome the reliability issues such as ice breakthrough caused by large temperature or pressure differences, the CIS development unit model, the mathematical models of heat and mass transfer and the evaluation coefficient have been established. Numerical investigations have been implemented aiming at the impacts of physical properties of porous substrate, physical properties of working fluid, orifice layouts and orifice-structure parameters on the characteristics of flow field and temperature field. The numerical investigation shows some valuable conclusion, such as the temperature uniformity coefficient at the bottom surface of the large pore substrate is 0.997669 and the pressure uniformity coefficient at the same surface is 0.85361267. These numerical results can provide structure and data reference for the CIS design of lunar probe or spacesuit
    corecore