Fully Convolutional Networks for Semantic Segmentation from RGB-D images

Harich, Nicolai

Fully Convolutional Networks for Semantic Segmentation from RGB-D images

Authors: Nicolai Harich
Publication date: 1 January 2016
Publisher

Abstract

In recent years new trends such as industry 4.0 boosted the research and development in the field of autonomous systems and robotics. Robots collaborate and even take over complete tasks of humans. But the high degree of automation requires high reliability even in complex and changing environments. Those challenging conditions make it hard to rely on static models of the real world. In addition to adaptable maps, mobile robots require a local and current understanding of the scene. The Bosch Start-Up Company is developing robots for intra-logistic systems, which could highly benefit from such a detailed scene understanding. The aim of this work is to research and develop such a system for warehouse environments. While the possible field of application is in general very broad, this work will focus on the detection and localization of warehouse specific objects such as palettes. In order to provide a meaningful perception of the surrounding a RGB-D camera is used. A pre-trained convolutional network extracts scene understanding in the form of pixelwise class labels. As this convolutional network is the core of the application, this work focuses on different network set-ups and learning strategies. One difficulty was the lack of annotated training data. Since the creation of densely labeled images is a very time consuming process it was important to elaborate on good alternatives. One interesting finding was that it’s possible to transfer learning to a high extent from similar models pre-trained on thousands of RGB-images. This is done by selective interventions on the net parameters. By ensuring a good initialization it’s possible to train towards a well performing model within few iterations. In this way it’s possible to train even branched nets at once. This can also be achieved by including certain normalization steps. Another important aspect was to find a suitable way to incorporate depth-information. How to fuse depth into the existing model? By providing the height over ground as an additional feature the segmentation accuracy was further improved while keeping the extra computational costs low. Finally the segmentation maps are refined by a conditional random field. The joint training of both parts results in accurate object segmentations comparable to recently published state-of-the-art models.Aktuelle Themen, wie zum Beispiel Industrie 4.0, haben Fortschritte im Bereich autonomer Systeme und Robotik vorangetrieben. Roboter kollaborieren mit Arbeitern oder übernehmen komplette Arbeitsschritte. Dieser hohe Automatisierungsgrad erfordert, dass solche Systeme, selbst in komplexen Situationen und Umgebungen, hochgradig zuverlässig und sicher arbeiten. Statische Modelle zur Abstrahierung der Umgebung sind unzureichend. Mobile Roboter benötigen neben dynamischen Lokalisierungskarten bestenfalls auch ein Verständnis der Umgebung. Die Bosch Start-Up GmbH entwickelt Roboter, welche zukünftig in Warenlagern eingesetzt werden sollen. Diese würden von einem solchen Verständnis profitieren. Das Ziel war es aktuelle Erkenntnisse aus der Forschung zur semantischen Segmentierung mithilfe von Deep Learning Techniken zu einer prototypischen Anwendung zu transferieren. Die entwickelte Anwendung im Allgemeinen zwar universell einsetzbar, der Fokus dieser Arbeit liegt jedoch auf der Segmentierung von Objekten aus einem typischen Warenlager (bspw. Paletten). Die Segmentierung basiert auf den Bildern einer RGB-D Kamera und ermöglicht gleichzeitig eine räumliche Lokalisierung von Objekten. Ein spezielles tiefes neuronales Netz (FCN) führt die komplette Segmentierung durch. Die Arbeit beschäftigt sich schwerpunktmäßig mit der Adaption und dem Training eines solches Netzes. Die Bereitstellung von annotatierten Daten ist äußerst aufwändig. Um die Zahl der nötigen Daten gering zu halten wurden geeignete Techniken eingesetzt. Dazu wurden Modellparameter frei zugänglicher Netze transferiert, um eine möglichst gute Initialisierung sicherzustellen. Außerdem wurden Normalisierungsschritte in die Netzarchitektur eingeführt, sodass auch verzweigte Strukturen in einem Trainingslauf trainiert werden können. Ein wichtiger Aspekt ist zudem die Einbeziehung von Tiefeninformation in den Segmentierungsprozess. Das finale Netz berücksichtigt neben RGB-Daten auch eine Höheninformation. Dadurch wurde die Segmentierungsqualität mit nur geringem zusätzlichen Rechenaufwand verbessert. Zudem wurde ein Conditional Random Field zur iterativen Verfeinerung der Segmentierung eingesetzt. Das gemeinsame Training beider Komponenten, FCN und CRF, hat dazu beigetragen, dass die Qualität der Ergebnisse sich im Bereich aktueller Forschungsarbeiten bewegen

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Hochschulschriftenserver der Hochschule der Medien Stuttgart

oai:hdms.bsz-bw.de:4880

Last time updated on 18/04/2019