933 research outputs found
Forecasting People Trajectories and Head Poses by Jointly Reasoning on Tracklets and Vislets
In this work, we explore the correlation between people trajectories and
their head orientations. We argue that people trajectory and head pose
forecasting can be modelled as a joint problem. Recent approaches on trajectory
forecasting leverage short-term trajectories (aka tracklets) of pedestrians to
predict their future paths. In addition, sociological cues, such as expected
destination or pedestrian interaction, are often combined with tracklets. In
this paper, we propose MiXing-LSTM (MX-LSTM) to capture the interplay between
positions and head orientations (vislets) thanks to a joint unconstrained
optimization of full covariance matrices during the LSTM backpropagation. We
additionally exploit the head orientations as a proxy for the visual attention,
when modeling social interactions. MX-LSTM predicts future pedestrians location
and head pose, increasing the standard capabilities of the current approaches
on long-term trajectory forecasting. Compared to the state-of-the-art, our
approach shows better performances on an extensive set of public benchmarks.
MX-LSTM is particularly effective when people move slowly, i.e. the most
challenging scenario for all other models. The proposed approach also allows
for accurate predictions on a longer time horizon.Comment: Accepted at IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE
INTELLIGENCE 2019. arXiv admin note: text overlap with arXiv:1805.0065
COMIC: Towards A Compact Image Captioning Model with Attention
Recent works in image captioning have shown very promising raw performance.
However, we realize that most of these encoder-decoder style networks with
attention do not scale naturally to large vocabulary size, making them
difficult to be deployed on embedded system with limited hardware resources.
This is because the size of word and output embedding matrices grow
proportionally with the size of vocabulary, adversely affecting the compactness
of these networks. To address this limitation, this paper introduces a brand
new idea in the domain of image captioning. That is, we tackle the problem of
compactness of image captioning models which is hitherto unexplored. We showed
that, our proposed model, named COMIC for COMpact Image Captioning, achieves
comparable results in five common evaluation metrics with state-of-the-art
approaches on both MS-COCO and InstaPIC-1.1M datasets despite having an
embedding vocabulary size that is 39x - 99x smaller. The source code and models
are available at:
https://github.com/jiahuei/COMIC-Compact-Image-Captioning-with-AttentionComment: Added source code link and new results in Table
Logsig-RNN: a novel network for robust and efficient skeleton-based action recognition
This paper contributes to the challenge of skeleton-based human action recognition in
videos. The key step is to develop a generic network architecture to extract discriminative
features for the spatio-temporal skeleton data. In this paper, we propose a novel module,
namely Logsig-RNN, which is the combination of the log-signature layer and recurrent
type neural networks (RNNs). The former one comes from the mathematically principled
technology of signatures and log-signatures as representations for streamed data, which
can manage high sample rate streams, non-uniform sampling and time series of variable
length. It serves as an enhancement of the recurrent layer, which can be conveniently
plugged into neural networks. Besides we propose two path transformation layers to
significantly reduce path dimension while retaining the essential information fed into
the Logsig-RNN module. (The network architecture is illustrated in Figure 1 (Right).)
Finally, numerical results demonstrate that replacing the RNN module by the LogsigRNN module in SOTA networks consistently improves the performance on both Chalearn
gesture data and NTU RGB+D 120 action data in terms of accuracy and robustness.
In particular, we achieve the state-of-the-art accuracy on Chalearn2013 gesture data by
combining simple path transformation layers with the Logsig-RNN
Response Characterization for Auditing Cell Dynamics in Long Short-term Memory Networks
In this paper, we introduce a novel method to interpret recurrent neural
networks (RNNs), particularly long short-term memory networks (LSTMs) at the
cellular level. We propose a systematic pipeline for interpreting individual
hidden state dynamics within the network using response characterization
methods. The ranked contribution of individual cells to the network's output is
computed by analyzing a set of interpretable metrics of their decoupled step
and sinusoidal responses. As a result, our method is able to uniquely identify
neurons with insightful dynamics, quantify relationships between dynamical
properties and test accuracy through ablation analysis, and interpret the
impact of network capacity on a network's dynamical distribution. Finally, we
demonstrate generalizability and scalability of our method by evaluating a
series of different benchmark sequential datasets
Learning stochastic differential equations using RNN with log signature features
This paper contributes to the challenge of learning a function on streamed
multimodal data through evaluation. The core of the result of our paper is the
combination of two quite different approaches to this problem. One comes from
the mathematically principled technology of signatures and log-signatures as
representations for streamed data, while the other draws on the techniques of
recurrent neural networks (RNN). The ability of the former to manage high
sample rate streams and the latter to manage large scale nonlinear interactions
allows hybrid algorithms that are easy to code, quicker to train, and of lower
complexity for a given accuracy.
We illustrate the approach by approximating the unknown functional as a
controlled differential equation. Linear functionals on solutions of controlled
differential equations are the natural universal class of functions on data
streams. Following this approach, we propose a hybrid Logsig-RNN algorithm that
learns functionals on streamed data. By testing on various datasets, i.e.
synthetic data, NTU RGB+D 120 skeletal action data, and Chalearn2013 gesture
data, our algorithm achieves the outstanding accuracy with superior efficiency
and robustness
Deep Learning Approaches to Goal Recognition
Riconoscere il goal di un agente utilizzando una traccia di osservazioni è un compito importante con diverse applicazioni. In letteratura, molti approcci di goal recognition (GR) si basano sull'applicazione di tecniche di pianificazione automatica che richiedono un modello delle azioni del dominio e dello stato iniziale del dominio (scritto, ad esempio, in PDDL).
In questa tesi studiamo tre approcci alternativi (GRNet, Fast and Slow Goal Recognition e un approccio basato su BERT) in cui il goal recognition è formulato come un compito di classificazione affrontato utilizzando il machine learning.
Tutti questi approcci mirano principalmente a risolvere istanze di GR in un dato dominio, specificato da un insieme di proposizioni e da un insieme di nomi di azioni. In GRNet, le istanze di classificazione del dominio sono risolte da una rete LSTM. L'unica informazione richiesta come input della rete addestrata è una traccia di nomi di azioni, ognuno dei quali indica solo il nome di un'azione osservata. Un'esecuzione della LSTM elabora una traccia di azioni osservate per calcolare la probabilità che ogni proposizione del dominio faccia parte del goal dell'agente.
Fast and Slow Goal Recognition, ispirato al framework ``Thinking Fast and Slow'', è un modello a doppio processo che integra l'uso delle sopra-citate reti LSTM con le tecniche di pianificazione automatica.
Questa architettura può sfruttare sia il riconoscimento veloce dei goal, basato sull'esperienza, fornito dalla rete, sia l'analisi lenta e deliberata fornita dalle tecniche di pianificazione.
Infine, studiamo come un modello BERT addestrato sui piani sia in grado di comprendere il funzionamento di un dominio, le sue azioni e le loro relazioni reciproche. Questo modello viene poi sottoposto a fine-tuning per classificare le istanze di goal recognition.
Le analisi sperimentali confermano che le architetture presentate raggiungono buone prestazioni sia in termini di accuratezza della classificazione dei goal che di tempo di esecuzione, ottenendo spesso risultati migliori rispetto a un sistema di goal recognition allo stato dell'arte sui benchmark considerati.Recognising the goal of an agent from a trace of observations is an important task with many applications. In the literature, many approaches to goal recognition (GR) rely on the application of automated planning techniques which requires a model of the domain actions and of the initial domain state (written, e.g., in PDDL).
We study three alternative approaches (GRNet, Fast and Slow Goal Recognition and a BERT-based approach) where Goal Recognition is formulated as a classification task addressed by machine learning.
All these approaches are primarily aimed at solving GR instances in a given domain, which is specified by a set of propositions and a set of action names. In GRNet, the goal classification instances in the domain are solved by an LSTM network. The only information required as input of the trained network is a trace of action names, each one indicating just the name of an observed action. A run of the LSTM processes a trace of observed actions to compute how likely it is that each domain proposition is part of the agent's goal.
Fast and Slow Goal Recognition, inspired by the ``Thinking Fast and Slow'' framework, is a dual-process model which integrates the use of the aforementioned LSTM with the automated planning techniques.
This architecture can exploit both the fast, experience-based goal recognition provided by the network, and slow, deliberate analysis provided by the planning techniques.
Finally, we study how a BERT model trained on plans is able to understand how a domain works, its actions and how they are related to each other. This model is then fine-tuned in order to classify goal recognition instances.
Experimental analyses confirms that the presented architectures achieve good performance in terms of both goal classification accuracy and runtime, often obtaining better results w.r.t. a state-of-the-art GR system over the considered benchmarks
- …