2,650 research outputs found
MV-Map: Offboard HD-Map Generation with Multi-view Consistency
While bird's-eye-view (BEV) perception models can be useful for building
high-definition maps (HD-Maps) with less human labor, their results are often
unreliable and demonstrate noticeable inconsistencies in the predicted HD-Maps
from different viewpoints. This is because BEV perception is typically set up
in an 'onboard' manner, which restricts the computation and consequently
prevents algorithms from reasoning multiple views simultaneously. This paper
overcomes these limitations and advocates a more practical 'offboard' HD-Map
generation setup that removes the computation constraints, based on the fact
that HD-Maps are commonly reusable infrastructures built offline in data
centers. To this end, we propose a novel offboard pipeline called MV-Map that
capitalizes multi-view consistency and can handle an arbitrary number of frames
with the key design of a 'region-centric' framework. In MV-Map, the target
HD-Maps are created by aggregating all the frames of onboard predictions,
weighted by the confidence scores assigned by an 'uncertainty network'. To
further enhance multi-view consistency, we augment the uncertainty network with
the global 3D structure optimized by a voxelized neural radiance field
(Voxel-NeRF). Extensive experiments on nuScenes show that our MV-Map
significantly improves the quality of HD-Maps, further highlighting the
importance of offboard methods for HD-Map generation.Comment: ICCV 202
Frozen Transformers in Language Models Are Effective Visual Encoder Layers
This paper reveals that large language models (LLMs), despite being trained
solely on textual data, are surprisingly strong encoders for purely visual
tasks in the absence of language. Even more intriguingly, this can be achieved
by a simple yet previously overlooked strategy -- employing a frozen
transformer block from pre-trained LLMs as a constituent encoder layer to
directly process visual tokens. Our work pushes the boundaries of leveraging
LLMs for computer vision tasks, significantly departing from conventional
practices that typically necessitate a multi-modal vision-language setup with
associated language prompts, inputs, or outputs. We demonstrate that our
approach consistently enhances performance across a diverse range of tasks,
encompassing pure 2D and 3D visual recognition tasks (e.g., image and point
cloud classification), temporal modeling tasks (e.g., action recognition),
non-semantic tasks (e.g., motion forecasting), and multi-modal tasks (e.g.,
2D/3D visual question answering and image-text retrieval). Such improvements
are a general phenomenon, applicable to various types of LLMs (e.g., LLaMA and
OPT) and different LLM transformer blocks. We additionally propose the
information filtering hypothesis to explain the effectiveness of pre-trained
LLMs in visual encoding -- the pre-trained LLM transformer blocks discern
informative visual tokens and further amplify their effect. This hypothesis is
empirically supported by the observation that the feature activation, after
training with LLM transformer blocks, exhibits a stronger focus on relevant
regions. We hope that our work inspires new perspectives on utilizing LLMs and
deepening our understanding of their underlying mechanisms. Code is available
at https://github.com/ziqipang/LM4VisualEncoding.Comment: 23 pages, 13 figures. Code at
https://github.com/ziqipang/LM4VisualEncodin
BigIssue: A Realistic Bug Localization Benchmark
As machine learning tools progress, the inevitable question arises: How can
machine learning help us write better code? With significant progress being
achieved in natural language processing with models like GPT-3 and Bert, the
applications of natural language processing techniques to code are starting to
be explored. Most of the research has been focused on automatic program repair
(APR), and while the results on synthetic or highly filtered datasets are
promising, such models are hard to apply in real-world scenarios because of
inadequate bug localization. We propose BigIssue: a benchmark for realistic bug
localization. The goal of the benchmark is two-fold. We provide (1) a general
benchmark with a diversity of real and synthetic Java bugs and (2) a motivation
to improve bug localization capabilities of models through attention to the
full repository context. With the introduction of BigIssue, we hope to advance
the state of the art in bug localization, in turn improving APR performance and
increasing its applicability to the modern development cycle
Enhancement of low- kaons in AGS heavy-ion collisions
In the relativistic transport model, we show that the recently observed
enhancement of low- kaons ( and ) in Si+Pb collisions at AGS can
be explained if a density isomer is introduced in the nuclear
equation-of-state.Comment: 12 pages, RevTex, 6 figs on request to [email protected]
Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking
This work proposes an end-to-end multi-camera 3D multi-object tracking (MOT)
framework. It emphasizes spatio-temporal continuity and integrates both past
and future reasoning for tracked objects. Thus, we name it "Past-and-Future
reasoning for Tracking" (PF-Track). Specifically, our method adapts the
"tracking by attention" framework and represents tracked instances coherently
over time with object queries. To explicitly use historical cues, our "Past
Reasoning" module learns to refine the tracks and enhance the object features
by cross-attending to queries from previous frames and other objects. The
"Future Reasoning" module digests historical information and predicts robust
future trajectories. In the case of long-term occlusions, our method maintains
the object positions and enables re-association by integrating motion
predictions. On the nuScenes dataset, our method improves AMOTA by a large
margin and remarkably reduces ID-Switches by 90% compared to prior approaches,
which is an order of magnitude less. The code and models are made available at
https://github.com/TRI-ML/PF-Track.Comment: CVPR 2023 Camera Ready, 15 pages, 8 figure
BEL1-like Homeodomain Protein BLH6a Is a Negative Regulator of CAl5H2 in Sinapyl Alcohol Monolignol Biosynthesis in Poplar
Lignin is one of the major components of xylem cell walls in tree stems. The lignin in the wood of most flowering plants (dicotyledonous angiosperms) is typically polymerized from three monolignol precursors, coniferyl alcohol, sinapyl alcohol, and p-coumaroyl alcohol, resulting in guaiacyl (G), syringyl (S), and hydroxyphenyl (H) subunits, respectively. In this study, we focus on the transcriptional regulation of a coniferaldehyde 5-hydroxylase (CAld5H2) gene, which encodes a key enzyme for sinapyl alcohol biosynthesis. We carried out a yeast one-hybrid (Y1H) screen to identify candidate upstream transcription factors (TFs) regulating CAld5H2. We obtained 12 upstream TFs as potential regulators of CAld5H2. One of these TF genes, BLH6a, encodes a BEL1-like homeodomain (BLH) protein and negatively regulated the CAld5H2 promoter activity. The direct regulation of CAld5H2 promoter by BLH6a was supported by chromatin immunoprecipitationâquantitative polymerase chain reaction (ChIPâqPCR) and dominant repression of BLH6a in transgenic plants. Luciferase complementation imaging analyses showed extensive proteinâprotein interactions among these 12 TFs. We propose that BLH6a is a negative regulator of CAld5H2, which acts through combinatorial regulation of multiple TFs for sinapyl alcohol (S monolignol) biosynthesis in poplar
- âŠ