82 research outputs found

    Teaching iCub to recognize objects using deep convolutional neural networks

    Get PDF
    Providing robots with accurate and robust visual recognition capabilities in the real-world today is a challenge which prevents the use of autonomous agents for concrete applications. Indeed, the majority of tasks, as manipulation and interaction with other agents, critically depends on the ability to visually recognize the entities involved in a scene. At the same time, computer vision systems based on deep Convolutional Neural Networks (CNNs) are marking a breakthrough in fields as largescale image classification and retrieval. In this work we investigate how latest results on deep learning can advance the visual recognition capabilities of a robotic platform (the iCub humanoid robot) in a real-world scenario. We benchmark the performance of the resulting system on a new dataset of images depicting 28 objects, named iCubWorld28, that we plan on releasing. As in the spirit of the iCubWorld dataset series, this has been collected in a framework reflecting the typical iCub\u2019s daily visual experience. Moreover, in this release we provide four different acquisition sessions, to test incremental learning capabilities over multiple days. Our study addresses the question: how many objects can the iCub recognize today

    Is Deep Learning Safe for Robot Vision? Adversarial Examples against the iCub Humanoid

    Full text link
    Deep neural networks have been widely adopted in recent years, exhibiting impressive performances in several application domains. It has however been shown that they can be fooled by adversarial examples, i.e., images altered by a barely-perceivable adversarial noise, carefully crafted to mislead classification. In this work, we aim to evaluate the extent to which robot-vision systems embodying deep-learning algorithms are vulnerable to adversarial examples, and propose a computationally efficient countermeasure to mitigate this threat, based on rejecting classification of anomalous inputs. We then provide a clearer understanding of the safety properties of deep networks through an intuitive empirical analysis, showing that the mapping learned by such networks essentially violates the smoothness assumption of learning algorithms. We finally discuss the main limitations of this work, including the creation of real-world adversarial examples, and sketch promising research directions.Comment: Accepted for publication at the ICCV 2017 Workshop on Vision in Practice on Autonomous Robots (ViPAR

    Adaptive Deep Learning through Visual Domain Localization

    Get PDF
    A commercial robot, trained by its manufacturer to recognize a predefined number and type of objects, might be used in many settings, that will in general differ in their illumination conditions, background, type and degree of clutter, and so on. Recent computer vision works tackle this generalization issue through domain adaptation methods, assuming as source the visual domain where the system is trained and as target the domain of deployment. All approaches assume to have access to images from all classes of the target during training, an unrealistic condition in robotics applications. We address this issue proposing an algorithm that takes into account the specific needs of robot vision. Our intuition is that the nature of the domain shift experienced mostly in robotics is local. We exploit this through the learning of maps that spatially ground the domain and quantify the degree of shift, embedded into an end-to-end deep domain adaptation architecture. By explicitly localizing the roots of the domain shift we significantly reduce the number of parameters of the architecture to tune, we gain the flexibility necessary to deal with subset of categories in the target domain at training time, and we provide a clear feedback on the rationale behind any classification decision, which can be exploited in human-robot interactions. Experiments on two different settings of the iCub World database confirm the suitability of our method for robot vision

    Recognizing Objects In-the-wild: Where Do We Stand?

    Full text link
    The ability to recognize objects is an essential skill for a robotic system acting in human-populated environments. Despite decades of effort from the robotic and vision research communities, robots are still missing good visual perceptual systems, preventing the use of autonomous agents for real-world applications. The progress is slowed down by the lack of a testbed able to accurately represent the world perceived by the robot in-the-wild. In order to fill this gap, we introduce a large-scale, multi-view object dataset collected with an RGB-D camera mounted on a mobile robot. The dataset embeds the challenges faced by a robot in a real-life application and provides a useful tool for validating object recognition algorithms. Besides describing the characteristics of the dataset, the paper evaluates the performance of a collection of well-established deep convolutional networks on the new dataset and analyzes the transferability of deep representations from Web images to robotic data. Despite the promising results obtained with such representations, the experiments demonstrate that object classification with real-life robotic data is far from being solved. Finally, we provide a comparative study to analyze and highlight the open challenges in robot vision, explaining the discrepancies in the performance

    Developing the Knowledge of Number Digits in a child like Robot

    Get PDF
    Number knowledge can be boosted initially by embodied strategies such as the use of fingers. This Article explores the perceptual process of grounding number symbols in artificial agents, particularly the iCub robot—a child-like humanoid with fully functional, five-fingered hands. It studies the application of convolutional neural network models in the context of cognitive developmental robotics, where the training information is likely to be gradually acquired while operating, rather than being abundant and fully available as in many machine learning scenarios. The experimental analyses show increased efficiency of the training and similarities with studies in developmental psychology. Indeed, the proprioceptive information from the robot hands can improve accuracy in the recognition of spoken digits by supporting a quicker creation of a uniform number line. In conclusion, these findings reveal a novel way for the humanization of artificial training strategies, where the embodiment can make the robot’s learning more efficient and understandable for humans

    CORe50: a New Dataset and Benchmark for Continuous Object Recognition

    Full text link
    Continuous/Lifelong learning of high-dimensional data streams is a challenging research problem. In fact, fully retraining models each time new data become available is infeasible, due to computational and storage issues, while na\"ive incremental strategies have been shown to suffer from catastrophic forgetting. In the context of real-world object recognition applications (e.g., robotic vision), where continuous learning is crucial, very few datasets and benchmarks are available to evaluate and compare emerging techniques. In this work we propose a new dataset and benchmark CORe50, specifically designed for continuous object recognition, and introduce baseline approaches for different continuous learning scenarios

    Adaptive Deep Learning through Visual Domain Localization

    Get PDF
    A commercial robot, trained by its manufacturer to recognize a predefined number and type of objects, might be used in many settings, that will in general differ in their illumination conditions, background, type and degree of clutter, and so on. Recent computer vision works tackle this generalization issue through domain adaptation methods, assuming as source the visual domain where the system is trained and as target the domain of deployment. All approaches assume to have access to images from all classes of the target during training, an unrealistic condition in robotics applications. We address this issue proposing an algorithm that takes into account the specific needs of robot vision. Our intuition is that the nature of the domain shift experienced mostly in robotics is local. We exploit this through the learning of maps that spatially ground the domain and quantify the degree of shift, embedded into an end-to-end deep domain adaptation architecture. By explicitly localizing the roots of the domain shift we significantly reduce the number of parameters of the architecture to tune, we gain the flexibility necessary to deal with subset of categories in the target domain at training time, and we provide a clear feedback on the rationale behind any classification decision, which can be exploited in human-robot interactions. Experiments on two different settings of the iCub World database confirm the suitability of our method for robot vision.Comment: Accepted at ICRA 201

    On the Illumination Influence for Object Learning on Robot Companions

    Get PDF
    • …
    corecore