36 research outputs found
Benchmark Concept for Industrial Pick&Place Applications
Robotic grasping and manipulation is a highly active research field. Typical solutions are usually composed of several modules, e.g. object detection, grasp selection and motion planning. However, from an industrial point of view, it is not clear which solutions can be readily used and how individual components affect each other. Benchmarks used in research are often designed with simplified settings in a very specific scenario, disregarding the peculiarities of the industrial environment. Performance in real-world applications is therefore likely to differ from benchmark results. In this paper, we present a concept for the design of general Pick&Place benchmarks, which help practitioners to evaluate the system and its components for an industrial scenario. The user specifies the workspace (obstacles, movable objects), the robot (kinematics, etc.) and chooses from a set of methods to realize a desired task. Our proposed framework executes the workflow in a physics simulation to determine a range of system-level performance measures. Furthermore, it provides introspective insights for the performance of individual components
Dienstebasierte Objekterkennung und Posenschätzung zur Automatisierung industrieller Produktionsprozesse
The growing trend towards high-mix and low-volume production demands more flexible and reconfigurable control for assembly systems. In unstructured or less structured environments, object detection and pose estimation is a key capability to enable industrial robotics applications such as grasping, handling and assembling. The integration and interconnectivity of such automation functions is fostered by Industry 4.0 through the adoption of service-based ecosystems.
The main objective of this thesis is to create a service-based framework for robust object detection and pose estimation in manufacturing environments. This could resemble a viable alternative to traditional machine vision systems such as smart cameras and embedded PCs, which are challenged by the high diversity and fast-paced progress in the field of object detection and pose estimation.
We approach this problem in three steps. In the first step, we propose a framework and demonstrate that it is realizable. It has a REST / gRPC interface that allows to handle all detection methods uniformly. A virtualization strategy enables upscaling and easy deployment, and the new OPC UA vision specification is exploited for integration of the detector services into the production environment.
In a second step, we examine three exemplary object detection and pose estimation methods and prove that they can be integrated into the framework. This boils down to automatic training from a CAD model and parameterization without expert knowledge, which is possible for two of the three methods. Regarding the third method – a Deep Learning approach – we demonstrate that synthetic images can be generated from the model to enable training, but further measures are required to attain the desired pose accuracy.
Finally, in a third step, we characterize the framework to identify its strengths and weaknesses compared to conventional machine vision systems. We perform a scenario-based analysis to determine certain quality attributes and find that both system types have their justification. The proposed service-based framework enables more efficient resource utilization, has a better configurability, maintainability and availability. On the other hand, conventional systems have better timing behavior and do not require such elaborate security measures. Timing, resource utilization and reusability are moreover strongly affected by the chosen detection method. Given a particular application, our characterization helps to identify the most suitable system type.
Altogether, in this work we have contributed a novel type of vision system and demonstrated that it is a viable alternative for object detection and pose estimation applications. The framework structure as well as the identified architectural trade-offs can furthermore be generalized to other machine vision or automation tasks. Promising future research directions include facilitating the training of Deep Learning methods, quantifying architectural trade-offs in case studies, and integrating other vision applications to create an ecosystem of vision services.Der Trend zur Produktion von kleineren Losgrößen und mehr Varianten erfordert flexible und rekonfigurierbare Montagesysteme. Die Objekterkennung und Posenschätzung ist eine Schlüsseltechnologie, um Roboteranwendungen wie Handhabung und Montage in weniger strukturierten Umgebungen zu ermöglichen. Im Rahmen von Industrie 4.0 wird vorgeschlagen, die Integration und Vernetzung
solcher Technologien durch Dienste-Plattformen zu fördern. Das Ziel dieser Arbeit ist die Schaffung einer Dienste-Plattform für robuste Objekterkennung und Posenschätzung in der Industrie. Dies wäre eine mögliche Alternative zu traditionellen Bildverarbeitungssystemen wie intelligenten Kameras oder PC-Systemen, die mit der großen Vielfalt und dem schnellen Fortschritt im Bereich Objekterkennung und Posenschätzung zu kämpfen haben. Diese Aufgabe gehen wir in drei Schritten an. Zuerst konzipieren wir eine Dienste-Plattform und zeigen, dass sie umsetzbar ist. Sie hat eine einheitliche REST / gRPC Schnittstelle für alle Objekterkennungsmethoden. Das Virtualisierungskonzept ermöglicht einfache Skalierung und Bereitstellung und die neue OPC UA Vision Spezifikation wird für die Integration in das Produktionsumfeld genutzt.
In einem zweiten Schritt zeigen wir die beispielhafte Integration von drei repräsentativen Methoden zur Objekterkennung und Posenschätzung in die Plattform. Dazu müssen die Detektoren auf Basis eines CAD Modells trainiert und
ohne Expertenwissen konfiguriert werden können. Für zwei der drei Methoden war dies ohne Weiteres möglich, für die dritte – einen Deep Learning Ansatz – nicht. Wir zeigen hier, wie synthetisch generierte Trainingsbilder genutzt werden können, um das Training dennoch zu ermöglichen.
Im dritten Schritt untersuchen wir schließlich die Stärken und Schwächen der vorgeschlagenen Dienste-Plattform im Vergleich zu einem konventionellen System.
Die Analyse verschiedenster Szenarien ergibt, dass beide Systeme ihre Berechtigung haben. Zu den Vorteilen der Plattform gehören effizientere Ressourcennutzung sowie bessere Konfigurierbarkeit, Wartbarkeit und Verfügbarkeit. Konventionelle Systeme punkten mit ihrem Zeitverhalten und bergen weniger Sicherheitsrisiken. Zeitverhalten, Ressourcennutzung und Wiederverwendbarkeit sind jedoch auch stark von der gewählten Detektionsmethode abhängig. Unsere Charakterisierung hilft, um für eine spezifische Anwendung den geeignetsten Systemtyp auszuwählen.
In dieser Arbeit haben wir einen neuen Typ Bildverarbeitungssystem vorgeschlagen und gezeigt, dass er eine lohnende Alternative für Objekterkennungs- und Posenschätzungsanwendungen sein kann. Die Struktur der Plattform sowie Erkenntnisse zu architektonischen Abwägungen können auch auf andere Bildverarbeitungsanwendungen übertragen werden. Zukünftige Forschungsarbeiten könnten sich auf eine Vereinfachung des Trainings von Deep Learning Methoden, die Quantifizierung der architektonischen Abwägungen in Fallstudien, oder die Integration anderer Bildverarbeitungsaufgaben konzentrieren
A Framework for Joint Grasp and Motion Planning in Confined Spaces
Robotic grasping is a fundamental skill across all domains of robot applications. There is a large body of research for grasping objects in table-top scenarios, where finding suitable grasps is the main challenge. In this work, we are interested in scenarios where the objects are in confined spaces and hence particularly difficult to reach. Planning how the robot approaches the object becomes a major part of the challenge, giving rise to methods for joint grasp and motion planning. The framework proposed in this paper provides 20 benchmark scenarios with systematically increasing difficulty, realistic objects with precomputed grasp annotations, and tools to create and share more scenarios. We further provide two baseline planners and evaluate them on the scenarios, demonstrating that the proposed difficulty levels indeed offer a meaningful progression. We invite the research community to build upon this framework by making all components publicly available as open source
End-to-end learning to grasp via sampling from object point clouds
The ability to grasp objects is an essential skill that enables many robotic manipulation tasks. Recent works have studied point cloud-based methods for object grasping by starting from simulated datasets and have shown promising performance in real-world scenarios. Nevertheless, many of them still rely on ad-hoc geometric heuristics to generate grasp candidates, which fail to generalize to objects with significantly different shapes with respect to those observed during training. Several approaches exploit complex multi-stage learning strategies and local neighborhood feature extraction while ignoring semantic global information. Furthermore, they are inefficient in terms of number of training samples and time required for inference. In this letter, we propose an end-to-end learning solution to generate 6-DOF parallel-jaw grasps starting from the 3D partial view of the object. Our Learning to Grasp (L2G) method gathers information from the input point cloud through a new procedure that combines a differentiable sampling strategy to identify the visible contact points, with a feature encoder that leverages local and global cues. Overall, L2G is guided by a multi-task objective that generates a diverse set of grasps by optimizing contact point sampling, grasp regression, and grasp classification. With a thorough experimental analysis, we show the effectiveness of L2G as well as its robustness and generalization abilities
End-to-End Learning to Grasp via Sampling from Object Point Clouds
The ability to grasp objects is an essential skill that enables many robotic
manipulation tasks. Recent works have studied point cloud-based methods for
object grasping by starting from simulated datasets and have shown promising
performance in real-world scenarios. Nevertheless, many of them still rely on
ad-hoc geometric heuristics to generate grasp candidates, which fail to
generalize to objects with significantly different shapes with respect to those
observed during training. Several approaches exploit complex multi-stage
learning strategies and local neighborhood feature extraction while ignoring
semantic global information. Furthermore, they are inefficient in terms of
number of training samples and time required for inference. In this paper, we
propose an end-to-end learning solution to generate 6-DOF parallel-jaw grasps
starting from the 3D partial view of the object. Our Learning to Grasp (L2G)
method gathers information from the input point cloud through a new procedure
that combines a differentiable sampling strategy to identify the visible
contact points, with a feature encoder that leverages local and global cues.
Overall, L2G is guided by a multi-task objective that generates a diverse set
of grasps by optimizing contact point sampling, grasp regression, and grasp
classification. With a thorough experimental analysis, we show the
effectiveness of L2G as well as its robustness and generalization abilities.Comment: 8 pages, under review for RA-L/IROS 202