377 research outputs found
The RGB-D Triathlon: Towards Agile Visual Toolboxes for Robots
Deep networks have brought significant advances in robot perception, enabling
to improve the capabilities of robots in several visual tasks, ranging from
object detection and recognition to pose estimation, semantic scene
segmentation and many others. Still, most approaches typically address visual
tasks in isolation, resulting in overspecialized models which achieve strong
performances in specific applications but work poorly in other (often related)
tasks. This is clearly sub-optimal for a robot which is often required to
perform simultaneously multiple visual recognition tasks in order to properly
act and interact with the environment. This problem is exacerbated by the
limited computational and memory resources typically available onboard to a
robotic platform. The problem of learning flexible models which can handle
multiple tasks in a lightweight manner has recently gained attention in the
computer vision community and benchmarks supporting this research have been
proposed. In this work we study this problem in the robot vision context,
proposing a new benchmark, the RGB-D Triathlon, and evaluating state of the art
algorithms in this novel challenging scenario. We also define a new evaluation
protocol, better suited to the robot vision setting. Results shed light on the
strengths and weaknesses of existing approaches and on open issues, suggesting
directions for future research.Comment: This work has been submitted to IROS/RAL 201
Best Sources Forward: Domain Generalization through Source-Specific Nets
A long standing problem in visual object categorization is the ability of algorithms to generalize across different testing conditions. The problem has been formalized as a covariate shift among the probability distributions generating the training data (source) and the test data (target) and several domain adaptation methods have been proposed to address this issue. While these approaches have considered the single source-single target scenario, it is plausible to have multiple sources and require adaptation to any possible target domain. This last scenario, named Domain Generalization (DG), is the focus of our work. Differently from previous DG methods which learn domain invariant representations from source data, we design a deep network with multiple domain-specific classifiers, each associated to a source domain. At test time we estimate the probabilities that a target sample belongs to each source domain and exploit them to optimally fuse the classifiers predictions. To further improve the generalization ability of our model, we also introduced a domain agnostic component supporting the final classifier. Experiments on two public benchmarks demonstrate the power of our approach
AdaGraph: Unifying Predictive and Continuous Domain Adaptation through Graphs
The ability to categorize is a cornerstone of visual intelligence, and a key
functionality for artificial, autonomous visual machines. This problem will
never be solved without algorithms able to adapt and generalize across visual
domains. Within the context of domain adaptation and generalization, this paper
focuses on the predictive domain adaptation scenario, namely the case where no
target data are available and the system has to learn to generalize from
annotated source images plus unlabeled samples with associated metadata from
auxiliary domains. Our contributionis the first deep architecture that tackles
predictive domainadaptation, able to leverage over the information broughtby
the auxiliary domains through a graph. Moreover, we present a simple yet
effective strategy that allows us to take advantage of the incoming target data
at test time, in a continuous domain adaptation scenario. Experiments on three
benchmark databases support the value of our approach.Comment: CVPR 2019 (oral
Robust Place Categorization With Deep Domain Generalization
Traditional place categorization approaches in robot vision assume that training and test images have similar visual appearance. Therefore, any seasonal, illumination, and environmental changes typically lead to severe degradation in performance. To cope with this problem, recent works have been proposed to adopt domain adaptation techniques. While effective, these methods assume that some prior information about the scenario where the robot will operate is available at training time. Unfortunately, in many cases, this assumption does not hold, as we often do not know where a robot will be deployed. To overcome this issue, in this paper, we present an approach that aims at learning classification models able to generalize to unseen scenarios. Specifically, we propose a novel deep learning framework for domain generalization. Our method develops from the intuition that, given a set of different classification models associated to known domains (e.g., corresponding to multiple environments, robots), the best model for a new sample in the novel domain can be computed directly at test time by optimally combining the known models. To implement our idea, we exploit recent advances in deep domain adaptation and design a convolutional neural network architecture with novel layers performing a weighted version of batch normalization. Our experiments, conducted on three common datasets for robot place categorization, confirm the validity of our contribution
Learning Deep NBNN Representations for Robust Place Categorization
This paper presents an approach for semantic place categorization using data
obtained from RGB cameras. Previous studies on visual place recognition and
classification have shown that, by considering features derived from
pre-trained Convolutional Neural Networks (CNNs) in combination with part-based
classification models, high recognition accuracy can be achieved, even in
presence of occlusions and severe viewpoint changes. Inspired by these works,
we propose to exploit local deep representations, representing images as set of
regions applying a Na\"{i}ve Bayes Nearest Neighbor (NBNN) model for image
classification. As opposed to previous methods where CNNs are merely used as
feature extractors, our approach seamlessly integrates the NBNN model into a
fully-convolutional neural network. Experimental results show that the proposed
algorithm outperforms previous methods based on pre-trained CNN models and
that, when employed in challenging robot place recognition tasks, it is robust
to occlusions, environmental and sensor changes
Differences in Early Occupational Earnings of UK Male Graduates by Degree Subject: Evidence from the 1980-1993 USR
This paper investigates the differences in early occupational earnings of UK male graduates by degree subject during the period 1980-1993. We match administrative student-level data from the Universities' Statistical Record (USR) and occupational earnings information from the New Earnings Survey (NES). The paper estimates relative earnings premia by degree subject using three alternative modeling approaches to control for student self-selection into
university courses: i) a proxying and matching method, ii) a propensity score matching method, and iii) a simultaneous equations model of subject choice and earnings determination. Our analysis shows that there is a substantial amount of sample selection originating from unobservable student characteristics. The ranking of university subjects based on relative earnings premia is sensitive not only to the modelling approach but also to time, showing that analyses focusing on single-year data may not generalise to other periods
Embedding Words and Senses Together via Joint Knowledge-Enhanced Training
Word embeddings are widely used in Nat-ural Language Processing, mainly due totheir success in capturing semantic infor-mation from massive corpora. However,their creation process does not allow thedifferent meanings of a word to be auto-matically separated, as it conflates theminto a single vector. We address this issueby proposing a new model which learnsword and sense embeddings jointly. Ourmodel exploits large corpora and knowl-edge from semantic networks in order toproduce a unified vector space of wordand sense embeddings. We evaluate themain features of our approach both qual-itatively and quantitatively in a variety oftasks, highlighting the advantages of theproposed method in comparison to state-of-the-art word- and sense-based models
Kitting in the Wild through Online Domain Adaptation
Technological developments call for increasing perception and action capabilities of robots. Among other skills, vision systems that can adapt to any possible change in the working conditions are needed. Since these conditions are unpredictable, we need benchmarks which allow to assess the generalization and robustness capabilities of our visual recognition algorithms. In this work we focus on robotic kitting in unconstrained scenarios. As a first contribution, we present a new visual dataset for the kitting task. Differently from standard object recognition datasets, we provide images of the same objects acquired under various conditions where camera, illumination and background are changed. This novel dataset allows for testing the robustness of robot visual recognition algorithms to a series of different domain shifts both in isolation and unified. Our second contribution is a novel online adaptation algorithm for deep models, based on batch-normalization layers, which allows to continuously adapt a model to the current working conditions. Differently from standard domain adaptation algorithms, it does not require any image from the target domain at training time. We benchmark the performance of the algorithm on the proposed dataset, showing its capability to fill the gap between the performances of a standard architecture and its counterpart adapted offline to the given target domain
Iterative Superquadric Recomposition of 3D Objects from Multiple Views
Humans are good at recomposing novel objects, i.e. they can identify
commonalities between unknown objects from general structure to finer detail,
an ability difficult to replicate by machines. We propose a framework, ISCO, to
recompose an object using 3D superquadrics as semantic parts directly from 2D
views without training a model that uses 3D supervision. To achieve this, we
optimize the superquadric parameters that compose a specific instance of the
object, comparing its rendered 3D view and 2D image silhouette. Our ISCO
framework iteratively adds new superquadrics wherever the reconstruction error
is high, abstracting first coarse regions and then finer details of the target
object. With this simple coarse-to-fine inductive bias, ISCO provides
consistent superquadrics for related object parts, despite not having any
semantic supervision. Since ISCO does not train any neural network, it is also
inherently robust to out-of-distribution objects. Experiments show that,
compared to recent single instance superquadrics reconstruction approaches,
ISCO provides consistently more accurate 3D reconstructions, even from images
in the wild. Code available at https://github.com/ExplainableML/ISCO .Comment: Accepted at ICCV 202
- …