955 research outputs found
GameFormer: Game-theoretic Modeling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving
Autonomous vehicles operating in complex real-world environments require
accurate predictions of interactive behaviors between traffic participants.
While existing works focus on modeling agent interactions based on their past
trajectories, their future interactions are often ignored. This paper addresses
the interaction prediction problem by formulating it with hierarchical game
theory and proposing the GameFormer framework to implement it. Specifically, we
present a novel Transformer decoder structure that uses the prediction results
from the previous level together with the common environment background to
iteratively refine the interaction process. Moreover, we propose a learning
process that regulates an agent's behavior at the current level to respond to
other agents' behaviors from the last level. Through experiments on a
large-scale real-world driving dataset, we demonstrate that our model can
achieve state-of-the-art prediction accuracy on the interaction prediction
task. We also validate the model's capability to jointly reason about the ego
agent's motion plans and other agents' behaviors in both open-loop and
closed-loop planning tests, outperforming a variety of baseline methods
Augmenting Reinforcement Learning with Transformer-based Scene Representation Learning for Decision-making of Autonomous Driving
Decision-making for urban autonomous driving is challenging due to the
stochastic nature of interactive traffic participants and the complexity of
road structures. Although reinforcement learning (RL)-based decision-making
scheme is promising to handle urban driving scenarios, it suffers from low
sample efficiency and poor adaptability. In this paper, we propose Scene-Rep
Transformer to improve the RL decision-making capabilities with better scene
representation encoding and sequential predictive latent distillation.
Specifically, a multi-stage Transformer (MST) encoder is constructed to model
not only the interaction awareness between the ego vehicle and its neighbors
but also intention awareness between the agents and their candidate routes. A
sequential latent Transformer (SLT) with self-supervised learning objectives is
employed to distill the future predictive information into the latent scene
representation, in order to reduce the exploration space and speed up training.
The final decision-making module based on soft actor-critic (SAC) takes as
input the refined latent scene representation from the Scene-Rep Transformer
and outputs driving actions. The framework is validated in five challenging
simulated urban scenarios with dense traffic, and its performance is manifested
quantitatively by the substantial improvements in data efficiency and
performance in terms of success rate, safety, and efficiency. The qualitative
results reveal that our framework is able to extract the intentions of neighbor
agents to help make decisions and deliver more diversified driving behaviors
Dynamics Study of the OH + O3 Atmospheric Reaction with Both Reactants Vibrationally Excited
The dynamics of the title five-atom atmospheric reaction is studied by the quasiclassical trajectory method for vibrational states of OH over the range 2 ≤ v ≤ 9 and initial vibrational energies of O3 between 9 and 21 kcal mol-1 using a previously reported double many-body expansion potential energy surface for HO4(2A). The results show that the reaction is controlled by both capture- and barrier-type mechanisms, with the rate constants depending strongly on the reactants' internal energy content. Also suggested from the magnitude of the calculated rate coefficients is that the title processes may not be ignorable when studying the stratospheric ozone budget
MGMAE: Motion Guided Masking for Video Masked Autoencoding
Masked autoencoding has shown excellent performance on self-supervised video
representation learning. Temporal redundancy has led to a high masking ratio
and customized masking strategy in VideoMAE. In this paper, we aim to further
improve the performance of video masked autoencoding by introducing a motion
guided masking strategy. Our key insight is that motion is a general and unique
prior in video, which should be taken into account during masked pre-training.
Our motion guided masking explicitly incorporates motion information to build
temporal consistent masking volume. Based on this masking volume, we can track
the unmasked tokens in time and sample a set of temporal consistent cubes from
videos. These temporal aligned unmasked tokens will further relieve the
information leakage issue in time and encourage the MGMAE to learn more useful
structure information. We implement our MGMAE with an online efficient optical
flow estimator and backward masking map warping strategy. We perform
experiments on the datasets of Something-Something V2 and Kinetics-400,
demonstrating the superior performance of our MGMAE to the original VideoMAE.
In addition, we provide the visualization analysis to illustrate that our MGMAE
can sample temporal consistent cubes in a motion-adaptive manner for more
effective video pre-training.Comment: ICCV 2023 camera-ready versio
DTPP: Differentiable Joint Conditional Prediction and Cost Evaluation for Tree Policy Planning in Autonomous Driving
Motion prediction and cost evaluation are vital components in the
decision-making system of autonomous vehicles. However, existing methods often
ignore the importance of cost learning and treat them as separate modules. In
this study, we employ a tree-structured policy planner and propose a
differentiable joint training framework for both ego-conditioned prediction and
cost models, resulting in a direct improvement of the final planning
performance. For conditional prediction, we introduce a query-centric
Transformer model that performs efficient ego-conditioned motion prediction.
For planning cost, we propose a learnable context-aware cost function with
latent interaction features, facilitating differentiable joint learning. We
validate our proposed approach using the real-world nuPlan dataset and its
associated planning test platform. Our framework not only matches
state-of-the-art planning methods but outperforms other learning-based methods
in planning quality, while operating more efficiently in terms of runtime. We
show that joint training delivers significantly better performance than
separate training of the two modules. Additionally, we find that
tree-structured policy planning outperforms the conventional single-stage
planning approach
- …