17 research outputs found
Transformer Networks for Trajectory Forecasting
Most recent successes on forecasting the people motion are based on LSTM
models and all most recent progress has been achieved by modelling the social
interaction among people and the people interaction with the scene. We question
the use of the LSTM models and propose the novel use of Transformer Networks
for trajectory forecasting. This is a fundamental switch from the sequential
step-by-step processing of LSTMs to the only-attention-based memory mechanisms
of Transformers. In particular, we consider both the original Transformer
Network (TF) and the larger Bidirectional Transformer (BERT), state-of-the-art
on all natural language processing tasks. Our proposed Transformers predict the
trajectories of the individual people in the scene. These are "simple" model
because each person is modelled separately without any complex human-human nor
scene interaction terms. In particular, the TF model without bells and whistles
yields the best score on the largest and most challenging trajectory
forecasting benchmark of TrajNet. Additionally, its extension which predicts
multiple plausible future trajectories performs on par with more engineered
techniques on the 5 datasets of ETH + UCY. Finally, we show that Transformers
may deal with missing observations, as it may be the case with real sensor
data. Code is available at https://github.com/FGiuliari/Trajectory-Transformer.Comment: 18 pages, 3 figure
Leveraging commonsense for object localisation in partial scenes
We propose an end-to-end solution to address the problem of object
localisation in partial scenes, where we aim to estimate the position of an
object in an unknown area given only a partial 3D scan of the scene. We propose
a novel scene representation to facilitate the geometric reasoning, Directed
Spatial Commonsense Graph (D-SCG), a spatial scene graph that is enriched with
additional concept nodes from a commonsense knowledge base. Specifically, the
nodes of D-SCG represent the scene objects and the edges are their relative
positions. Each object node is then connected via different commonsense
relationships to a set of concept nodes. With the proposed graph-based scene
representation, we estimate the unknown position of the target object using a
Graph Neural Network that implements a novel attentional message passing
mechanism. The network first predicts the relative positions between the target
object and each visible object by learning a rich representation of the objects
via aggregating both the object nodes and the concept nodes in D-SCG. These
relative positions then are merged to obtain the final position. We evaluate
our method using Partial ScanNet, improving the state-of-the-art by 5.9% in
terms of the localisation accuracy at a 8x faster training speed.Comment: arXiv admin note: text overlap with arXiv:2203.0538
POMP++: Pomcp-based Active Visual Search in unknown indoor environments
In this paper we focus on the problem of learning online an optimal policy for Active Visual Search (AVS) of objects in unknown indoor environments. We propose POMP++, a planning strategy that introduces a novel formulation on top of the classic Partially Observable Monte Carlo Planning (POMCP) framework, to allow training-free online policy learning in unknown environments. We present a new belief reinvigoration strategy which allows to use POMCP with a dynamically growing state space to address the online generation of the floor map. We evaluate our method on two public benchmark datasets, AVD that is acquired by real robotic platforms and Habitat ObjectNav that is rendered from real 3D scene scans, achieving the best success rate with an improvement of >10% over the state-of-the-art methods
POMP: Pomcp-based Online Motion Planning for active visual search in indoor environments
In this paper we focus on the problem of learning an optimal policy for Active Visual Search (AVS) of objects in known indoor environments with an online setup. Our POMP method uses as input the current pose of an agent (e.g. a robot) and a RGB-D frame. The task is to plan the next move that brings the agent closer to the target object. We model this problem as a Partially Observable Markov Decision Process solved by a Monte-Carlo planning approach. This allows us to make decisions on the next moves by iterating over the known scenario at hand, exploring the environment and searching for the object at the same time. Differently from the current state of the art in Reinforcement Learning, POMP does not require extensive and expensive (in time and computation) labelled data so being very agile in solving AVS in small and medium real scenarios. We only require the information of the floormap of the environment, an information usually available or that can be easily extracted from an a priori single exploration run. We validate our method on the publicly available AVD benchmark, achieving an average success rate of 0.76 with an average path length of 17.1, performing close to the state of the art but without any training needed. Additionally, we show experimentally the robustness of our method when the quality of the object detection goes from ideal to faulty
POMP++: Pomcp-based Active Visual Search in unknown indoor environments
In this paper we focus on the problem of learning online an optimal policy
for Active Visual Search (AVS) of objects in unknown indoor environments. We
propose POMP++, a planning strategy that introduces a novel formulation on top
of the classic Partially Observable Monte Carlo Planning (POMCP) framework, to
allow training-free online policy learning in unknown environments. We present
a new belief reinvigoration strategy which allows to use POMCP with a
dynamically growing state space to address the online generation of the floor
map. We evaluate our method on two public benchmark datasets, AVD that is
acquired by real robotic platforms and Habitat ObjectNav that is rendered from
real 3D scene scans, achieving the best success rate with an improvement of
>10% over the state-of-the-art methods.Comment: Accepted at 2021 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS
Transformer Networks for Trajectory Forecasting
Most recent successes on forecasting the people motion are based on LSTM models and all most recent progress has been achieved by modelling the social interaction among people and the people interaction with the scene. We question the use of the LSTM models and propose the novel use of Transformer Networks for trajectory forecasting. This is a fundamental switch from the sequential step-by-step processing of LSTMs to the only-attention-based memory mechanisms of Transformers. In particular, we consider both the original Transformer Network (TF) and the larger Bidirectional Transformer (BERT), state-of-the-art on all natural language processing tasks. Our proposed Transformers predict the trajectories of the individual people in the scene. These are “simple” models because each person is modelled separately without any complex human-human nor scene interaction terms. In particular, the TF model without bells and whistles yields the best score on the largest and most challenging trajectory forecasting benchmark of TrajNet [1]. Additionally, its extension which predicts multiple plausible future trajectories performs on par with more engineered techniques on the 5 datasets of ETH [2]+UCY [3]. Finally, we show that Transformers may deal with missing observations, as it may be the case with real sensor data. Code is available at github.com/FGiuliari/Trajectory-Transformer
Spatial Commonsense Graph for Object Localisation in Partial Scenes
We solve object localisation in partial scenes, a new problem of estimating the unknown position of an object (e.g. where is the bag?) given a partial 3D scan of a scene. The proposed solution is based on a novel scene graph model, the Spatial Commonsense Graph (SCG), where objects are the nodes and edges define pairwise distances between them, enriched by concept nodes and relationships from a commonsense knowledge base. This allows SCG to better generalise its spatial inference to unknown 3D scenes. The SCG is used to estimate the unknown position of the target object in two steps: first, we feed the SCG into a novel Proximity Prediction Network, a graph neural network that uses attention to perform distance prediction between the node representing the target object and the nodes representing the observed objects in the SCG; second, we propose a Localisation Module based on circular intersection to estimate the object position using all the predicted pairwise distances in order to be independent of any reference system. We create a new dataset of partially reconstructed scenes to benchmark our method and baselines for object localisation in partial scenes, where our proposed method achieves the best localisation performance. Code and Dataset are available here: https://github.com/IIT-PAVIS/SpatialCommonsenseGrap
Blood toluene as a biological index of environmental toluene exposure in the "normal" population and in occupationally exposed workers immediately after exposure and 16 hours later.
Blood toluene was measured in a group of 100 workers occupationally exposed to a mean 8-h environmental toluene concentration of 128 mu g/1 (34 ppm), and in a group of 269 ''normal'' subjects without occupational exposure to toluene. The mean blood toluene of the workers at the end of the shift and the following morning, after 16 h, was 457 and 38 mu g/1, respectively, The normal subjects had a blood toluene level of 1.1 mu g/1. On the basis of the highly significant correlation between blood toluene and occupational exposure, it can be calculated that environmental toluene exposure of 188 and 377 mu g/1 (50 and 100 ppm) gives end-of-shift blood toluene levels of 690 and 1390 mu g/1, respectively. The corresponding blood toluene levels on the following morning are 50 and 100 mu g/1, respectively
Spatial Commonsense Graph for Object Localisation in Partial Scenes
We solve object localisation in partial scenes, a new problem of estimating the unknown position of an object (e.g. where is the bag?) given a partial 3D scan of a scene. The proposed solution is based on a novel scene graph model, the Spatial Commonsense Graph (SCG), where objects are the nodes and edges define pairwise distances between them, enriched by concept nodes and relationships from a commonsense knowledge base. This allows SCG to better generalise its spatial inference to unknown 3D scenes. The SCG is used to estimate the unknown position of the target object in two steps: first, we feed the SCG into a novel Proximity Prediction Network, a graph neural network that uses attention to perform distance prediction between the node representing the target object and the nodes representing the observed objects in the SCG; second, we propose a Localisation Module based on circular intersection to estimate the object position using all the predicted pairwise distances in order to be independent of any reference system. We create a new dataset of partially reconstructed scenes to benchmark our method and baselines for object localisation in partial scenes, where our proposed method achieves the best localisation performance. Code and Dataset are available here: https://github.com/IIT-PAVIS/SpatialCommonsenseGrap