140 research outputs found
Deep learning for object detection in robotic grasping contexts
Dans la dernière décennie, les approches basées sur les réseaux de neurones convolutionnels sont devenus les standards pour la plupart des tâches en vision numérique. Alors qu'une grande partie des méthodes classiques de vision étaient basées sur des règles et algorithmes, les réseaux de neurones sont optimisés directement à partir de données d'entraînement qui sont étiquetées pour la tâche voulue. En pratique, il peut être difficile d'obtenir une quantité su sante de données d'entraînement ou d'interpréter les prédictions faites par les réseaux. Également, le processus d'entraînement doit être recommencé pour chaque nouvelle tâche ou ensemble d'objets. Au final, bien que très performantes, les solutions basées sur des réseaux de neurones peuvent être difficiles à mettre en place. Dans cette thèse, nous proposons des stratégies visant à contourner ou solutionner en partie ces limitations en contexte de détection d'instances d'objets. Premièrement, nous proposons d'utiliser une approche en cascade consistant à utiliser un réseau de neurone comme pré-filtrage d'une méthode standard de "template matching". Cette façon de faire nous permet d'améliorer les performances de la méthode de "template matching" tout en gardant son interprétabilité. Deuxièmement, nous proposons une autre approche en cascade. Dans ce cas, nous proposons d'utiliser un réseau faiblement supervisé pour générer des images de probabilité afin d'inférer la position de chaque objet. Cela permet de simplifier le processus d'entraînement et diminuer le nombre d'images d'entraînement nécessaires pour obtenir de bonnes performances. Finalement, nous proposons une architecture de réseau de neurones ainsi qu'une procédure d'entraînement permettant de généraliser un détecteur d'objets à des objets qui ne sont pas vus par le réseau lors de l'entraînement. Notre approche supprime donc la nécessité de réentraîner le réseau de neurones pour chaque nouvel objet.In the last decade, deep convolutional neural networks became a standard for computer vision applications. As opposed to classical methods which are based on rules and hand-designed features, neural networks are optimized and learned directly from a set of labeled training data specific for a given task. In practice, both obtaining sufficient labeled training data and interpreting network outputs can be problematic. Additionnally, a neural network has to be retrained for new tasks or new sets of objects. Overall, while they perform really well, deployment of deep neural network approaches can be challenging. In this thesis, we propose strategies aiming at solving or getting around these limitations for object detection. First, we propose a cascade approach in which a neural network is used as a prefilter to a template matching approach, allowing an increased performance while keeping the interpretability of the matching method. Secondly, we propose another cascade approach in which a weakly-supervised network generates object-specific heatmaps that can be used to infer their position in an image. This approach simplifies the training process and decreases the number of required training images to get state-of-the-art performances. Finally, we propose a neural network architecture and a training procedure allowing detection of objects that were not seen during training, thus removing the need to retrain networks for new objects
Challenges for Monocular 6D Object Pose Estimation in Robotics
Object pose estimation is a core perception task that enables, for example,
object grasping and scene understanding. The widely available, inexpensive and
high-resolution RGB sensors and CNNs that allow for fast inference based on
this modality make monocular approaches especially well suited for robotics
applications. We observe that previous surveys on object pose estimation
establish the state of the art for varying modalities, single- and multi-view
settings, and datasets and metrics that consider a multitude of applications.
We argue, however, that those works' broad scope hinders the identification of
open challenges that are specific to monocular approaches and the derivation of
promising future challenges for their application in robotics. By providing a
unified view on recent publications from both robotics and computer vision, we
find that occlusion handling, novel pose representations, and formalizing and
improving category-level pose estimation are still fundamental challenges that
are highly relevant for robotics. Moreover, to further improve robotic
performance, large object sets, novel objects, refractive materials, and
uncertainty estimates are central, largely unsolved open challenges. In order
to address them, ontological reasoning, deformability handling, scene-level
reasoning, realistic datasets, and the ecological footprint of algorithms need
to be improved.Comment: arXiv admin note: substantial text overlap with arXiv:2302.1182
PourIt!: Weakly-supervised Liquid Perception from a Single Image for Visual Closed-Loop Robotic Pouring
Liquid perception is critical for robotic pouring tasks. It usually requires
the robust visual detection of flowing liquid. However, while recent works have
shown promising results in liquid perception, they typically require labeled
data for model training, a process that is both time-consuming and reliant on
human labor. To this end, this paper proposes a simple yet effective framework
PourIt!, to serve as a tool for robotic pouring tasks. We design a simple data
collection pipeline that only needs image-level labels to reduce the reliance
on tedious pixel-wise annotations. Then, a binary classification model is
trained to generate Class Activation Map (CAM) that focuses on the visual
difference between these two kinds of collected data, i.e., the existence of
liquid drop or not. We also devise a feature contrast strategy to improve the
quality of the CAM, thus entirely and tightly covering the actual liquid
regions. Then, the container pose is further utilized to facilitate the 3D
point cloud recovery of the detected liquid region. Finally, the
liquid-to-container distance is calculated for visual closed-loop control of
the physical robot. To validate the effectiveness of our proposed method, we
also contribute a novel dataset for our task and name it PourIt! dataset.
Extensive results on this dataset and physical Franka robot have shown the
utility and effectiveness of our method in the robotic pouring tasks. Our
dataset, code and pre-trained models will be available on the project page.Comment: ICCV202
The e-Bike Motor Assembly: Towards Advanced Robotic Manipulation for Flexible Manufacturing
Robotic manipulation is currently undergoing a profound paradigm shift due to
the increasing needs for flexible manufacturing systems, and at the same time,
because of the advances in enabling technologies such as sensing, learning,
optimization, and hardware. This demands for robots that can observe and reason
about their workspace, and that are skillfull enough to complete various
assembly processes in weakly-structured settings. Moreover, it remains a great
challenge to enable operators for teaching robots on-site, while managing the
inherent complexity of perception, control, motion planning and reaction to
unexpected situations. Motivated by real-world industrial applications, this
paper demonstrates the potential of such a paradigm shift in robotics on the
industrial case of an e-Bike motor assembly. The paper presents a concept for
teaching and programming adaptive robots on-site and demonstrates their
potential for the named applications. The framework includes: (i) a method to
teach perception systems onsite in a self-supervised manner, (ii) a general
representation of object-centric motion skills and force-sensitive assembly
skills, both learned from demonstration, (iii) a sequencing approach that
exploits a human-designed plan to perform complex tasks, and (iv) a system
solution for adapting and optimizing skills online. The aforementioned
components are interfaced through a four-layer software architecture that makes
our framework a tangible industrial technology. To demonstrate the generality
of the proposed framework, we provide, in addition to the motivating e-Bike
motor assembly, a further case study on dense box packing for logistics
automation
- …