27 research outputs found
Learning Deep NBNN Representations for Robust Place Categorization
This paper presents an approach for semantic place categorization using data
obtained from RGB cameras. Previous studies on visual place recognition and
classification have shown that, by considering features derived from
pre-trained Convolutional Neural Networks (CNNs) in combination with part-based
classification models, high recognition accuracy can be achieved, even in
presence of occlusions and severe viewpoint changes. Inspired by these works,
we propose to exploit local deep representations, representing images as set of
regions applying a Na\"{i}ve Bayes Nearest Neighbor (NBNN) model for image
classification. As opposed to previous methods where CNNs are merely used as
feature extractors, our approach seamlessly integrates the NBNN model into a
fully-convolutional neural network. Experimental results show that the proposed
algorithm outperforms previous methods based on pre-trained CNN models and
that, when employed in challenging robot place recognition tasks, it is robust
to occlusions, environmental and sensor changes
Robust Place Categorization With Deep Domain Generalization
Traditional place categorization approaches in robot vision assume that training and test images have similar visual appearance. Therefore, any seasonal, illumination, and environmental changes typically lead to severe degradation in performance. To cope with this problem, recent works have been proposed to adopt domain adaptation techniques. While effective, these methods assume that some prior information about the scenario where the robot will operate is available at training time. Unfortunately, in many cases, this assumption does not hold, as we often do not know where a robot will be deployed. To overcome this issue, in this paper, we present an approach that aims at learning classification models able to generalize to unseen scenarios. Specifically, we propose a novel deep learning framework for domain generalization. Our method develops from the intuition that, given a set of different classification models associated to known domains (e.g., corresponding to multiple environments, robots), the best model for a new sample in the novel domain can be computed directly at test time by optimally combining the known models. To implement our idea, we exploit recent advances in deep domain adaptation and design a convolutional neural network architecture with novel layers performing a weighted version of batch normalization. Our experiments, conducted on three common datasets for robot place categorization, confirm the validity of our contribution
Efficient semantic place categorization by a robot through active line-of-sight selection
In this paper, we present an attention mechanism for mobile robots to face the problem of place categorization. Our approach, which is based on active perception, aims to capture images with characteristic or distinctive details of the environment that can be exploited to improve the efficiency (quickness and accuracy) of the place categorization. To do so, at each time moment, our proposal selects the most informative view by controlling the line-of-sight of the robot’s camera through a pan-only unit. We root our proposal on an information maximization scheme, formalized as a next-best-view problem through a Markov Decision Process (MDP) model. The latter exploits the short-time estimated navigation path of the robot to anticipate the next robot’s movements and make consistent decisions. We demonstrate over two datasets, with simulated and real data, that our proposal generalizes well for the two main paradigms of place categorization (object-based and image-based), outperforming typical camera-configurations (fixed and continuously-rotating) and a pure-exploratory approach, both in quickness and accuracy.This work was supported by the research projects WISER (DPI2017-84827-R) and ARPEGGIO (PID2020-117057), as well as by the Spanish grant program FPU19/00704. Funding for open access charge: Universidad de Málaga / CBUA
On the Challenges of Open World Recognitionunder Shifting Visual Domains
Robotic visual systems operating in the wild must act in unconstrained
scenarios, under different environmental conditions while facing a variety of
semantic concepts, including unknown ones. To this end, recent works tried to
empower visual object recognition methods with the capability to i) detect
unseen concepts and ii) extended their knowledge over time, as images of new
semantic classes arrive. This setting, called Open World Recognition (OWR), has
the goal to produce systems capable of breaking the semantic limits present in
the initial training set. However, this training set imposes to the system not
only its own semantic limits, but also environmental ones, due to its bias
toward certain acquisition conditions that do not necessarily reflect the high
variability of the real-world. This discrepancy between training and test
distribution is called domain-shift. This work investigates whether OWR
algorithms are effective under domain-shift, presenting the first benchmark
setup for assessing fairly the performances of OWR algorithms, with and without
domain-shift. We then use this benchmark to conduct analyses in various
scenarios, showing how existing OWR algorithms indeed suffer a severe
performance degradation when train and test distributions differ. Our analysis
shows that this degradation is only slightly mitigated by coupling OWR with
domain generalization techniques, indicating that the mere plug-and-play of
existing algorithms is not enough to recognize new and unknown categories in
unseen domains. Our results clearly point toward open issues and future
research directions, that need to be investigated for building robot visual
systems able to function reliably under these challenging yet very real
conditions. Code available at
https://github.com/DarioFontanel/OWR-VisualDomainsComment: RAL/ICRA 202
Visual Feature Learning
Categorization is a fundamental problem of many computer vision applications, e.g., image
classification, pedestrian detection and face recognition. The robustness of a categorization
system heavily relies on the quality of features, by which data are represented. The prior
arts of feature extraction can be concluded in different levels, which, in a bottom up order,
are low level features (e.g., pixels and gradients) and middle/high-level features (e.g., the
BoW model and sparse coding). Low level features can be directly extracted from images
or videos, while middle/high-level features are constructed upon low-level features, and are
designed to enhance the capability of categorization systems based on different considerations
(e.g., guaranteeing the domain-invariance and improving the discriminative power).
This thesis focuses on the study of visual feature learning. Challenges that remain in designing
visual features lie in intra-class variation, occlusions, illumination and view-point
changes and insufficient prior knowledge. To address these challenges, I present several
visual feature learning methods, where these methods cover the following sub-topics: (i)
I start by introducing a segmentation-based object recognition system. (ii) When training
data are insufficient, I seek data from other resources, which include images or videos in a
different domain, actions captured from a different viewpoint and information in a different
media form. In order to appropriately transfer such resources into the target categorization
system, four transfer learning-based feature learning methods are presented in this section,
where both cross-view, cross-domain and cross-modality scenarios are addressed accordingly.
(iii) Finally, I present a random-forest based feature fusion method for multi-view
action recognition