2 research outputs found
Multi-label affordance mapping from egocentric vision
Accurate affordance detection and segmentation with pixel precision is an
important piece in many complex systems based on interactions, such as robots
and assitive devices. We present a new approach to affordance perception which
enables accurate multi-label segmentation. Our approach can be used to
automatically extract grounded affordances from first person videos of
interactions using a 3D map of the environment providing pixel level precision
for the affordance location. We use this method to build the largest and most
complete dataset on affordances based on the EPIC-Kitchen dataset, EPIC-Aff,
which provides interaction-grounded, multi-label, metric and spatial affordance
annotations. Then, we propose a new approach to affordance segmentation based
on multi-label detection which enables multiple affordances to co-exists in the
same space, for example if they are associated with the same object. We present
several strategies of multi-label detection using several segmentation
architectures. The experimental results highlight the importance of the
multi-label detection. Finally, we show how our metric representation can be
exploited for build a map of interaction hotspots in spatial action-centric
zones and use that representation to perform a task-oriented navigation.Comment: International Conference on Computer Vision (ICCV) 202
Robust Fusion for Bayesian Semantic Mapping
The integration of semantic information in a map allows robots to understand
better their environment and make high-level decisions. In the last few years,
neural networks have shown enormous progress in their perception capabilities.
However, when fusing multiple observations from a neural network in a semantic
map, its inherent overconfidence with unknown data gives too much weight to the
outliers and decreases the robustness of the resulting map. In this work, we
propose a novel robust fusion method to combine multiple Bayesian semantic
predictions. Our method uses the uncertainty estimation provided by a Bayesian
neural network to calibrate the way in which the measurements are fused. This
is done by regularizing the observations to mitigate the problem of
overconfident outlier predictions and using the epistemic uncertainty to weigh
their influence in the fusion, resulting in a different formulation of the
probability distributions. We validate our robust fusion strategy by performing
experiments on photo-realistic simulated environments and real scenes. In both
cases, we use a network trained on different data to expose the model to
varying data distributions. The results show that considering the model's
uncertainty and regularizing the probability distribution of the observations
distribution results in a better semantic segmentation performance and more
robustness to outliers, compared with other methods.Comment: 7 pages, 7 figures, under review at IEEE IROS 202