3 research outputs found
CRITERIA: a New Benchmarking Paradigm for Evaluating Trajectory Prediction Models for Autonomous Driving
Benchmarking is a common method for evaluating trajectory prediction models
for autonomous driving. Existing benchmarks rely on datasets, which are biased
towards more common scenarios, such as cruising, and distance-based metrics
that are computed by averaging over all scenarios. Following such a regiment
provides a little insight into the properties of the models both in terms of
how well they can handle different scenarios and how admissible and diverse
their outputs are. There exist a number of complementary metrics designed to
measure the admissibility and diversity of trajectories, however, they suffer
from biases, such as length of trajectories.
In this paper, we propose a new benChmarking paRadIgm for evaluaTing
trajEctoRy predIction Approaches (CRITERIA). Particularly, we propose 1) a
method for extracting driving scenarios at varying levels of specificity
according to the structure of the roads, models' performance, and data
properties for fine-grained ranking of prediction models; 2) A set of new
bias-free metrics for measuring diversity, by incorporating the characteristics
of a given scenario, and admissibility, by considering the structure of roads
and kinematic compliancy, motivated by real-world driving constraints. 3) Using
the proposed benchmark, we conduct extensive experimentation on a
representative set of the prediction models using the large scale Argoverse
dataset. We show that the proposed benchmark can produce a more accurate
ranking of the models and serve as a means of characterizing their behavior. We
further present ablation studies to highlight contributions of different
elements that are used to compute the proposed metrics
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning
Recently, the attention-enriched encoder-decoder framework has aroused great
interest in image captioning due to its overwhelming progress. Many visual
attention models directly leverage meaningful regions to generate image
descriptions. However, seeking a direct transition from visual space to text is
not enough to generate fine-grained captions. This paper exploits a
feature-compounding approach to bring together high-level semantic concepts and
visual information regarding the contextual environment fully end-to-end. Thus,
we propose a stacked cross-modal feature consolidation (SCFC) attention network
for image captioning in which we simultaneously consolidate cross-modal
features through a novel compounding function in a multi-step reasoning
fashion. Besides, we jointly employ spatial information and context-aware
attributes (CAA) as the principal components in our proposed compounding
function, where our CAA provides a concise context-sensitive semantic
representation. To make better use of consolidated features potential, we
further propose an SCFC-LSTM as the caption generator, which can leverage
discriminative semantic information through the caption generation process. The
experimental results indicate that our proposed SCFC can outperform various
state-of-the-art image captioning benchmarks in terms of popular metrics on the
MSCOCO and Flickr30K datasets
Standard SPECT myocardial perfusion estimation from half-time acquisitions using deep convolutional residual neural networks
The purpose of this work was to assess the feasibility of acquisition time reduction in MPI-SPECT imaging using deep leering techniques through two main approaches, namely reduction of the acquisition time per projection and reduction of the number of angular projections