2,479 research outputs found

    MV-Map: Offboard HD-Map Generation with Multi-view Consistency

    Full text link
    While bird's-eye-view (BEV) perception models can be useful for building high-definition maps (HD-Maps) with less human labor, their results are often unreliable and demonstrate noticeable inconsistencies in the predicted HD-Maps from different viewpoints. This is because BEV perception is typically set up in an 'onboard' manner, which restricts the computation and consequently prevents algorithms from reasoning multiple views simultaneously. This paper overcomes these limitations and advocates a more practical 'offboard' HD-Map generation setup that removes the computation constraints, based on the fact that HD-Maps are commonly reusable infrastructures built offline in data centers. To this end, we propose a novel offboard pipeline called MV-Map that capitalizes multi-view consistency and can handle an arbitrary number of frames with the key design of a 'region-centric' framework. In MV-Map, the target HD-Maps are created by aggregating all the frames of onboard predictions, weighted by the confidence scores assigned by an 'uncertainty network'. To further enhance multi-view consistency, we augment the uncertainty network with the global 3D structure optimized by a voxelized neural radiance field (Voxel-NeRF). Extensive experiments on nuScenes show that our MV-Map significantly improves the quality of HD-Maps, further highlighting the importance of offboard methods for HD-Map generation.Comment: ICCV 202

    Frozen Transformers in Language Models Are Effective Visual Encoder Layers

    Full text link
    This paper reveals that large language models (LLMs), despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language. Even more intriguingly, this can be achieved by a simple yet previously overlooked strategy -- employing a frozen transformer block from pre-trained LLMs as a constituent encoder layer to directly process visual tokens. Our work pushes the boundaries of leveraging LLMs for computer vision tasks, significantly departing from conventional practices that typically necessitate a multi-modal vision-language setup with associated language prompts, inputs, or outputs. We demonstrate that our approach consistently enhances performance across a diverse range of tasks, encompassing pure 2D and 3D visual recognition tasks (e.g., image and point cloud classification), temporal modeling tasks (e.g., action recognition), non-semantic tasks (e.g., motion forecasting), and multi-modal tasks (e.g., 2D/3D visual question answering and image-text retrieval). Such improvements are a general phenomenon, applicable to various types of LLMs (e.g., LLaMA and OPT) and different LLM transformer blocks. We additionally propose the information filtering hypothesis to explain the effectiveness of pre-trained LLMs in visual encoding -- the pre-trained LLM transformer blocks discern informative visual tokens and further amplify their effect. This hypothesis is empirically supported by the observation that the feature activation, after training with LLM transformer blocks, exhibits a stronger focus on relevant regions. We hope that our work inspires new perspectives on utilizing LLMs and deepening our understanding of their underlying mechanisms. Code is available at https://github.com/ziqipang/LM4VisualEncoding.Comment: 23 pages, 13 figures. Code at https://github.com/ziqipang/LM4VisualEncodin

    BigIssue: A Realistic Bug Localization Benchmark

    Full text link
    As machine learning tools progress, the inevitable question arises: How can machine learning help us write better code? With significant progress being achieved in natural language processing with models like GPT-3 and Bert, the applications of natural language processing techniques to code are starting to be explored. Most of the research has been focused on automatic program repair (APR), and while the results on synthetic or highly filtered datasets are promising, such models are hard to apply in real-world scenarios because of inadequate bug localization. We propose BigIssue: a benchmark for realistic bug localization. The goal of the benchmark is two-fold. We provide (1) a general benchmark with a diversity of real and synthetic Java bugs and (2) a motivation to improve bug localization capabilities of models through attention to the full repository context. With the introduction of BigIssue, we hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle

    Enhancement of low-mt{m_t} kaons in AGS heavy-ion collisions

    Full text link
    In the relativistic transport model, we show that the recently observed enhancement of low-mtm_t kaons (K+K^+ and K−K^-) in Si+Pb collisions at AGS can be explained if a density isomer is introduced in the nuclear equation-of-state.Comment: 12 pages, RevTex, 6 figs on request to [email protected]

    Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking

    Full text link
    This work proposes an end-to-end multi-camera 3D multi-object tracking (MOT) framework. It emphasizes spatio-temporal continuity and integrates both past and future reasoning for tracked objects. Thus, we name it "Past-and-Future reasoning for Tracking" (PF-Track). Specifically, our method adapts the "tracking by attention" framework and represents tracked instances coherently over time with object queries. To explicitly use historical cues, our "Past Reasoning" module learns to refine the tracks and enhance the object features by cross-attending to queries from previous frames and other objects. The "Future Reasoning" module digests historical information and predicts robust future trajectories. In the case of long-term occlusions, our method maintains the object positions and enables re-association by integrating motion predictions. On the nuScenes dataset, our method improves AMOTA by a large margin and remarkably reduces ID-Switches by 90% compared to prior approaches, which is an order of magnitude less. The code and models are made available at https://github.com/TRI-ML/PF-Track.Comment: CVPR 2023 Camera Ready, 15 pages, 8 figure

    BEL1-like Homeodomain Protein BLH6a Is a Negative Regulator of CAl5H2 in Sinapyl Alcohol Monolignol Biosynthesis in Poplar

    Get PDF
    Lignin is one of the major components of xylem cell walls in tree stems. The lignin in the wood of most flowering plants (dicotyledonous angiosperms) is typically polymerized from three monolignol precursors, coniferyl alcohol, sinapyl alcohol, and p-coumaroyl alcohol, resulting in guaiacyl (G), syringyl (S), and hydroxyphenyl (H) subunits, respectively. In this study, we focus on the transcriptional regulation of a coniferaldehyde 5-hydroxylase (CAld5H2) gene, which encodes a key enzyme for sinapyl alcohol biosynthesis. We carried out a yeast one-hybrid (Y1H) screen to identify candidate upstream transcription factors (TFs) regulating CAld5H2. We obtained 12 upstream TFs as potential regulators of CAld5H2. One of these TF genes, BLH6a, encodes a BEL1-like homeodomain (BLH) protein and negatively regulated the CAld5H2 promoter activity. The direct regulation of CAld5H2 promoter by BLH6a was supported by chromatin immunoprecipitation–quantitative polymerase chain reaction (ChIP–qPCR) and dominant repression of BLH6a in transgenic plants. Luciferase complementation imaging analyses showed extensive protein–protein interactions among these 12 TFs. We propose that BLH6a is a negative regulator of CAld5H2, which acts through combinatorial regulation of multiple TFs for sinapyl alcohol (S monolignol) biosynthesis in poplar
    • 

    corecore