15 research outputs found
Robust Place Categorization With Deep Domain Generalization
Traditional place categorization approaches in robot vision assume that training and test images have similar visual appearance. Therefore, any seasonal, illumination, and environmental changes typically lead to severe degradation in performance. To cope with this problem, recent works have been proposed to adopt domain adaptation techniques. While effective, these methods assume that some prior information about the scenario where the robot will operate is available at training time. Unfortunately, in many cases, this assumption does not hold, as we often do not know where a robot will be deployed. To overcome this issue, in this paper, we present an approach that aims at learning classification models able to generalize to unseen scenarios. Specifically, we propose a novel deep learning framework for domain generalization. Our method develops from the intuition that, given a set of different classification models associated to known domains (e.g., corresponding to multiple environments, robots), the best model for a new sample in the novel domain can be computed directly at test time by optimally combining the known models. To implement our idea, we exploit recent advances in deep domain adaptation and design a convolutional neural network architecture with novel layers performing a weighted version of batch normalization. Our experiments, conducted on three common datasets for robot place categorization, confirm the validity of our contribution
Kitting in the Wild through Online Domain Adaptation
Technological developments call for increasing perception and action capabilities of robots. Among other skills, vision systems that can adapt to any possible change in the working conditions are needed. Since these conditions are unpredictable, we need benchmarks which allow to assess the generalization and robustness capabilities of our visual recognition algorithms. In this work we focus on robotic kitting in unconstrained scenarios. As a first contribution, we present a new visual dataset for the kitting task. Differently from standard object recognition datasets, we provide images of the same objects acquired under various conditions where camera, illumination and background are changed. This novel dataset allows for testing the robustness of robot visual recognition algorithms to a series of different domain shifts both in isolation and unified. Our second contribution is a novel online adaptation algorithm for deep models, based on batch-normalization layers, which allows to continuously adapt a model to the current working conditions. Differently from standard domain adaptation algorithms, it does not require any image from the target domain at training time. We benchmark the performance of the algorithm on the proposed dataset, showing its capability to fill the gap between the performances of a standard architecture and its counterpart adapted offline to the given target domain
Boosting Deep Open World Recognition by Clustering
While convolutional neural networks have brought significant advances in
robot vision, their ability is often limited to closed world scenarios, where
the number of semantic concepts to be recognized is determined by the available
training set. Since it is practically impossible to capture all possible
semantic concepts present in the real world in a single training set, we need
to break the closed world assumption, equipping our robot with the capability
to act in an open world. To provide such ability, a robot vision system should
be able to (i) identify whether an instance does not belong to the set of known
categories (i.e. open set recognition), and (ii) extend its knowledge to learn
new classes over time (i.e. incremental learning). In this work, we show how we
can boost the performance of deep open world recognition algorithms by means of
a new loss formulation enforcing a global to local clustering of class-specific
features. In particular, a first loss term, i.e. global clustering, forces the
network to map samples closer to the class centroid they belong to while the
second one, local clustering, shapes the representation space in such a way
that samples of the same class get closer in the representation space while
pushing away neighbours belonging to other classes. Moreover, we propose a
strategy to learn class-specific rejection thresholds, instead of heuristically
estimating a single global threshold, as in previous works. Experiments on
RGB-D Object and Core50 datasets show the effectiveness of our approach.Comment: IROS/RAL 202
On the Challenges of Open World Recognitionunder Shifting Visual Domains
Robotic visual systems operating in the wild must act in unconstrained
scenarios, under different environmental conditions while facing a variety of
semantic concepts, including unknown ones. To this end, recent works tried to
empower visual object recognition methods with the capability to i) detect
unseen concepts and ii) extended their knowledge over time, as images of new
semantic classes arrive. This setting, called Open World Recognition (OWR), has
the goal to produce systems capable of breaking the semantic limits present in
the initial training set. However, this training set imposes to the system not
only its own semantic limits, but also environmental ones, due to its bias
toward certain acquisition conditions that do not necessarily reflect the high
variability of the real-world. This discrepancy between training and test
distribution is called domain-shift. This work investigates whether OWR
algorithms are effective under domain-shift, presenting the first benchmark
setup for assessing fairly the performances of OWR algorithms, with and without
domain-shift. We then use this benchmark to conduct analyses in various
scenarios, showing how existing OWR algorithms indeed suffer a severe
performance degradation when train and test distributions differ. Our analysis
shows that this degradation is only slightly mitigated by coupling OWR with
domain generalization techniques, indicating that the mere plug-and-play of
existing algorithms is not enough to recognize new and unknown categories in
unseen domains. Our results clearly point toward open issues and future
research directions, that need to be investigated for building robot visual
systems able to function reliably under these challenging yet very real
conditions. Code available at
https://github.com/DarioFontanel/OWR-VisualDomainsComment: RAL/ICRA 202
Hallucinating Agnostic Images to Generalize Across Domains
The ability to generalize across visual domains is crucial for the robustness
of artificial recognition systems. Although many training sources may be
available in real contexts, the access to even unlabeled target samples cannot
be taken for granted, which makes standard unsupervised domain adaptation
methods inapplicable in the wild. In this work we investigate how to exploit
multiple sources by hallucinating a deep visual domain composed of images,
possibly unrealistic, able to maintain categorical knowledge while discarding
specific source styles. The produced agnostic images are the result of a deep
architecture that applies pixel adaptation on the original source data guided
by two adversarial domain classifier branches at image and feature level. Our
approach is conceived to learn only from source data, but it seamlessly extends
to the use of unlabeled target samples. Remarkable results for both
multi-source domain adaptation and domain generalization support the power of
hallucinating agnostic images in this framework
Generalizing to Unseen Domains via Adversarial Data Augmentation
We are concerned with learning models that generalize well to different
\emph{unseen} domains. We consider a worst-case formulation over data
distributions that are near the source domain in the feature space. Only using
training data from a single source distribution, we propose an iterative
procedure that augments the dataset with examples from a fictitious target
domain that is "hard" under the current model. We show that our iterative
scheme is an adaptive data augmentation method where we append adversarial
examples at each iteration. For softmax losses, we show that our method is a
data-dependent regularization scheme that behaves differently from classical
regularizers that regularize towards zero (e.g., ridge or lasso). On digit
recognition and semantic segmentation tasks, our method learns models improve
performance across a range of a priori unknown target domains.Comment: Accepted to NIPS 2018 (camera ready