2,446 research outputs found
Detrive: Imitation Learning with Transformer Detection for End-to-End Autonomous Driving
This Paper proposes a novel Transformer-based end-to-end autonomous driving
model named Detrive. This model solves the problem that the past end-to-end
models cannot detect the position and size of traffic participants. Detrive
uses an end-to-end transformer based detection model as its perception module;
a multi-layer perceptron as its feature fusion network; a recurrent neural
network with gate recurrent unit for path planning; and two controllers for the
vehicle's forward speed and turning angle. The model is trained with an on-line
imitation learning method. In order to obtain a better training set, a
reinforcement learning agent that can directly obtain a ground truth bird's-eye
view map from the Carla simulator as a perceptual output, is used as teacher
for the imitation learning. The trained model is tested on the Carla's
autonomous driving benchmark. The results show that the Transformer detector
based end-to-end model has obvious advantages in dynamic obstacle avoidance
compared with the traditional classifier based end-to-end model.Comment: 7 pages, 5 figures, DISA 202
Early Lane Change Prediction for Automated Driving Systems Using Multi-Task Attention-based Convolutional Neural Networks
Lane change (LC) is one of the safety-critical manoeuvres in highway driving
according to various road accident records. Thus, reliably predicting such
manoeuvre in advance is critical for the safe and comfortable operation of
automated driving systems. The majority of previous studies rely on detecting a
manoeuvre that has been already started, rather than predicting the manoeuvre
in advance. Furthermore, most of the previous works do not estimate the key
timings of the manoeuvre (e.g., crossing time), which can actually yield more
useful information for the decision making in the ego vehicle. To address these
shortcomings, this paper proposes a novel multi-task model to simultaneously
estimate the likelihood of LC manoeuvres and the time-to-lane-change (TTLC). In
both tasks, an attention-based convolutional neural network (CNN) is used as a
shared feature extractor from a bird's eye view representation of the driving
environment. The spatial attention used in the CNN model improves the feature
extraction process by focusing on the most relevant areas of the surrounding
environment. In addition, two novel curriculum learning schemes are employed to
train the proposed approach. The extensive evaluation and comparative analysis
of the proposed method in existing benchmark datasets show that the proposed
method outperforms state-of-the-art LC prediction models, particularly
considering long-term prediction performance.Comment: 13 pages, 11 figure
GoRela: Go Relative for Viewpoint-Invariant Motion Forecasting
The task of motion forecasting is critical for self-driving vehicles (SDVs)
to be able to plan a safe maneuver. Towards this goal, modern approaches reason
about the map, the agents' past trajectories and their interactions in order to
produce accurate forecasts. The predominant approach has been to encode the map
and other agents in the reference frame of each target agent. However, this
approach is computationally expensive for multi-agent prediction as inference
needs to be run for each agent. To tackle the scaling challenge, the solution
thus far has been to encode all agents and the map in a shared coordinate frame
(e.g., the SDV frame). However, this is sample inefficient and vulnerable to
domain shift (e.g., when the SDV visits uncommon states). In contrast, in this
paper, we propose an efficient shared encoding for all agents and the map
without sacrificing accuracy or generalization. Towards this goal, we leverage
pair-wise relative positional encodings to represent geometric relationships
between the agents and the map elements in a heterogeneous spatial graph. This
parameterization allows us to be invariant to scene viewpoint, and save online
computation by re-using map embeddings computed offline. Our decoder is also
viewpoint agnostic, predicting agent goals on the lane graph to enable diverse
and context-aware multimodal prediction. We demonstrate the effectiveness of
our approach on the urban Argoverse 2 benchmark as well as a novel highway
dataset
Machine Learning for Microcontroller-Class Hardware -- A Review
The advancements in machine learning opened a new opportunity to bring
intelligence to the low-end Internet-of-Things nodes such as microcontrollers.
Conventional machine learning deployment has high memory and compute footprint
hindering their direct deployment on ultra resource-constrained
microcontrollers. This paper highlights the unique requirements of enabling
onboard machine learning for microcontroller class devices. Researchers use a
specialized model development workflow for resource-limited applications to
ensure the compute and latency budget is within the device limits while still
maintaining the desired performance. We characterize a closed-loop widely
applicable workflow of machine learning model development for microcontroller
class devices and show that several classes of applications adopt a specific
instance of it. We present both qualitative and numerical insights into
different stages of model development by showcasing several use cases. Finally,
we identify the open research challenges and unsolved questions demanding
careful considerations moving forward.Comment: Accepted for publication at IEEE Sensors Journa
Towards better traffic volume estimation: Tackling both underdetermined and non-equilibrium problems via a correlation-adaptive graph convolution network
Traffic volume is an indispensable ingredient to provide fine-grained
information for traffic management and control. However, due to limited
deployment of traffic sensors, obtaining full-scale volume information is far
from easy. Existing works on this topic primarily focus on improving the
overall estimation accuracy of a particular method and ignore the underlying
challenges of volume estimation, thereby having inferior performances on some
critical tasks. This paper studies two key problems with regard to traffic
volume estimation: (1) underdetermined traffic flows caused by undetected
movements, and (2) non-equilibrium traffic flows arise from congestion
propagation. Here we demonstrate a graph-based deep learning method that can
offer a data-driven, model-free and correlation adaptive approach to tackle the
above issues and perform accurate network-wide traffic volume estimation.
Particularly, in order to quantify the dynamic and nonlinear relationships
between traffic speed and volume for the estimation of underdetermined flows, a
speed patternadaptive adjacent matrix based on graph attention is developed and
integrated into the graph convolution process, to capture non-local
correlations between sensors. To measure the impacts of non-equilibrium flows,
a temporal masked and clipped attention combined with a gated temporal
convolution layer is customized to capture time-asynchronous correlations
between upstream and downstream sensors. We then evaluate our model on a
real-world highway traffic volume dataset and compare it with several benchmark
models. It is demonstrated that the proposed model achieves high estimation
accuracy even under 20% sensor coverage rate and outperforms other baselines
significantly, especially on underdetermined and non-equilibrium flow
locations. Furthermore, comprehensive quantitative model analysis are also
carried out to justify the model designs
- …