91 research outputs found
Sparse4D v3: Advancing End-to-End 3D Detection and Tracking
In autonomous driving perception systems, 3D detection and tracking are the
two fundamental tasks. This paper delves deeper into this field, building upon
the Sparse4D framework. We introduce two auxiliary training tasks (Temporal
Instance Denoising and Quality Estimation) and propose decoupled attention to
make structural improvements, leading to significant enhancements in detection
performance. Additionally, we extend the detector into a tracker using a
straightforward approach that assigns instance ID during inference, further
highlighting the advantages of query-based algorithms. Extensive experiments
conducted on the nuScenes benchmark validate the effectiveness of the proposed
improvements. With ResNet50 as the backbone, we witnessed enhancements of
3.0\%, 2.2\%, and 7.6\% in mAP, NDS, and AMOTA, achieving 46.9\%, 56.1\%, and
49.0\%, respectively. Our best model achieved 71.9\% NDS and 67.7\% AMOTA on
the nuScenes test set. Code will be released at
\url{https://github.com/linxuewu/Sparse4D}
Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion
Bird-eye-view (BEV) based methods have made great progress recently in
multi-view 3D detection task. Comparing with BEV based methods, sparse based
methods lag behind in performance, but still have lots of non-negligible
merits. To push sparse 3D detection further, in this work, we introduce a novel
method, named Sparse4D, which does the iterative refinement of anchor boxes via
sparsely sampling and fusing spatial-temporal features. (1) Sparse 4D Sampling:
for each 3D anchor, we assign multiple 4D keypoints, which are then projected
to multi-view/scale/timestamp image features to sample corresponding features;
(2) Hierarchy Feature Fusion: we hierarchically fuse sampled features of
different view/scale, different timestamp and different keypoints to generate
high-quality instance feature. In this way, Sparse4D can efficiently and
effectively achieve 3D detection without relying on dense view transformation
nor global attention, and is more friendly to edge devices deployment.
Furthermore, we introduce an instance-level depth reweight module to alleviate
the ill-posed issue in 3D-to-2D projection. In experiment, our method
outperforms all sparse based methods and most BEV based methods on detection
task in the nuScenes dataset
TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree transformation
Large-scale language models have made great progress in the field of software
engineering in recent years. They can be used for many code-related tasks such
as code clone detection, code-to-code search, and method name prediction.
However, these large-scale language models based on each code token have
several drawbacks: They are usually large in scale, heavily dependent on
labels, and require a lot of computing power and time to fine-tune new
datasets.Furthermore, code embedding should be performed on the entire code
snippet rather than encoding each code token. The main reason for this is that
encoding each code token would cause model parameter inflation, resulting in a
lot of parameters storing information that we are not very concerned about. In
this paper, we propose a novel framework, called TransformCode, that learns
about code embeddings in a contrastive learning manner. The framework uses the
Transformer encoder as an integral part of the model. We also introduce a novel
data augmentation technique called abstract syntax tree transformation: This
technique applies syntactic and semantic transformations to the original code
snippets to generate more diverse and robust anchor samples. Our proposed
framework is both flexible and adaptable: It can be easily extended to other
downstream tasks that require code representation such as code clone detection
and classification. The framework is also very efficient and scalable: It does
not require a large model or a large amount of training data, and can support
any programming language.Finally, our framework is not limited to unsupervised
learning, but can also be applied to some supervised learning tasks by
incorporating task-specific labels or objectives. To explore the effectiveness
of our framework, we conducted extensive experiments on different software
engineering tasks using different programming languages and multiple datasets
Latent Exploration for Reinforcement Learning
In Reinforcement Learning, agents learn policies by exploring and interacting
with the environment. Due to the curse of dimensionality, learning policies
that map high-dimensional sensory input to motor output is particularly
challenging. During training, state of the art methods (SAC, PPO, etc.) explore
the environment by perturbing the actuation with independent Gaussian noise.
While this unstructured exploration has proven successful in numerous tasks, it
ought to be suboptimal for overactuated systems. When multiple actuators, such
as motors or muscles, drive behavior, uncorrelated perturbations risk
diminishing each other's effect, or modifying the behavior in a task-irrelevant
way. While solutions to introduce time correlation across action perturbations
exist, introducing correlation across actuators has been largely ignored. Here,
we propose LATent TIme-Correlated Exploration (Lattice), a method to inject
temporally-correlated noise into the latent state of the policy network, which
can be seamlessly integrated with on- and off-policy algorithms. We demonstrate
that the noisy actions generated by perturbing the network's activations can be
modeled as a multivariate Gaussian distribution with a full covariance matrix.
In the PyBullet locomotion tasks, Lattice-SAC achieves state of the art
results, and reaches 18% higher reward than unstructured exploration in the
Humanoid environment. In the musculoskeletal control environments of MyoSuite,
Lattice-PPO achieves higher reward in most reaching and object manipulation
tasks, while also finding more energy-efficient policies with reductions of
20-60%. Overall, we demonstrate the effectiveness of structured action noise in
time and actuator space for complex motor control tasks.Comment: Code available at https://github.com/amathislab/lattic
Coherence-protected Quantum Gate by Continuous Dynamical Decoupling in Diamond
To implement reliable quantum information processing, quantum gates have to
be protected together with the qubits from decoherence. Here we demonstrate
experimentally on nitrogen-vacancy system that by using continuous wave
dynamical decoupling method, not only the coherence time is prolonged by about
20 times, but also the quantum gates is protected for the duration of
controlling time. This protocol shares the merits of retaining the superiority
of prolonging the coherence time and at the same time easily combining with
quantum logic tasks. It is expected to be useful in task where duration of
quantum controlling exceeds far beyond the dephasing time.Comment: 5 pages, 4 figure
- …