Search CORE

152 research outputs found

Relate to Predict: Towards Task-Independent Knowledge Representations for Reinforcement Learning

Author: Gross Horst-Michael
Probst Malte
Schnürer Thomas
Publication venue
Publication date: 10/12/2022
Field of study

Reinforcement Learning (RL) can enable agents to learn complex tasks. However, it is difficult to interpret the knowledge and reuse it across tasks. Inductive biases can address such issues by explicitly providing generic yet useful decomposition that is otherwise difficult or expensive to learn implicitly. For example, object-centered approaches decompose a high dimensional observation into individual objects. Expanding on this, we utilize an inductive bias for explicit object-centered knowledge separation that provides further decomposition into semantic representations and dynamics knowledge. For this, we introduce a semantic module that predicts an objects' semantic state based on its context. The resulting affordance-like object state can then be used to enrich perceptual object representations. With a minimal setup and an environment that enables puzzle-like tasks, we demonstrate the feasibility and benefits of this approach. Specifically, we compare three different methods of integrating semantic representations into a model-based RL architecture. Our experiments show that the degree of explicitness in knowledge separation correlates with faster learning, better accuracy, better generalization, and better interpretability.Comment: submitted to IJCNN 202

arXiv.org e-Print Archive

Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds

Author: Amende Karl
Gross Horst Michael
Honer Jens
Kaulbersch Hauke
Kraus Andrea
Milz Stefan
Simon Martin
Sämann Timo
Publication venue
Publication date: 16/04/2019
Field of study

Accurate detection of 3D objects is a fundamental problem in computer vision and has an enormous impact on autonomous cars, augmented/virtual reality and many applications in robotics. In this work we present a novel fusion of neural network based state-of-the-art 3D detector and visual semantic segmentation in the context of autonomous driving. Additionally, we introduce Scale-Rotation-Translation score (SRTs), a fast and highly parameterizable evaluation metric for comparison of object detections, which speeds up our inference time up to 20\% and halves training time. On top, we apply state-of-the-art online multi target feature tracking on the object measurements to further increase accuracy and robustness utilizing temporal information. Our experiments on KITTI show that we achieve same results as state-of-the-art in all related categories, while maintaining the performance and accuracy trade-off and still run in real-time. Furthermore, our model is the first one that fuses visual semantic with 3D object detection

arXiv.org e-Print Archive

Crossref

Fusing Hand and Body Skeletons for Human Action Recognition in Assembly

Author: Aganian Dustin
Eisenbach Markus
Gross Horst-Michael
Köhler Mona
Stephan Benedict
Publication venue
Publication date: 18/07/2023
Field of study

As collaborative robots (cobots) continue to gain popularity in industrial manufacturing, effective human-robot collaboration becomes crucial. Cobots should be able to recognize human actions to assist with assembly tasks and act autonomously. To achieve this, skeleton-based approaches are often used due to their ability to generalize across various people and environments. Although body skeleton approaches are widely used for action recognition, they may not be accurate enough for assembly actions where the worker's fingers and hands play a significant role. To address this limitation, we propose a method in which less detailed body skeletons are combined with highly detailed hand skeletons. We investigate CNNs and transformers, the latter of which are particularly adept at extracting and combining important information from both skeleton types using attention. This paper demonstrates the effectiveness of our proposed approach in enhancing action recognition in assembly scenarios.Comment: International Conference on Artificial Neural Networks (ICANN) 202

arXiv.org e-Print Archive

How Object Information Improves Skeleton-based Human Action Recognition in Assembly Tasks

Author: Aganian Dustin
Baake Sebastian
Eisenbach Markus
Gross Horst-Michael
Köhler Mona
Publication venue
Publication date: 09/06/2023
Field of study

As the use of collaborative robots (cobots) in industrial manufacturing continues to grow, human action recognition for effective human-robot collaboration becomes increasingly important. This ability is crucial for cobots to act autonomously and assist in assembly tasks. Recently, skeleton-based approaches are often used as they tend to generalize better to different people and environments. However, when processing skeletons alone, information about the objects a human interacts with is lost. Therefore, we present a novel approach of integrating object information into skeleton-based action recognition. We enhance two state-of-the-art methods by treating object centers as further skeleton joints. Our experiments on the assembly dataset IKEA ASM show that our approach improves the performance of these state-of-the-art methods to a large extent when combining skeleton joints with objects predicted by a state-of-the-art instance segmentation model. Our research sheds light on the benefits of combining skeleton joints with object information for human action recognition in assembly tasks. We analyze the effect of the object detector on the combination for action classification and discuss the important factors that must be taken into account.Comment: IEEE International Joint Conference on Neural Networks (IJCNN) 202

arXiv.org e-Print Archive

Efficient Multi-Task Scene Analysis with RGB-D Transformers

Author: Fischedick Söhnke Benedikt
Gross Horst-Michael
Rabes Leonard
Schmidt Robin
Seichter Daniel
Publication venue
Publication date: 08/06/2023
Field of study

Scene analysis is essential for enabling autonomous systems, such as mobile robots, to operate in real-world environments. However, obtaining a comprehensive understanding of the scene requires solving multiple tasks, such as panoptic segmentation, instance orientation estimation, and scene classification. Solving these tasks given limited computing and battery capabilities on mobile platforms is challenging. To address this challenge, we introduce an efficient multi-task scene analysis approach, called EMSAFormer, that uses an RGB-D Transformer-based encoder to simultaneously perform the aforementioned tasks. Our approach builds upon the previously published EMSANet. However, we show that the dual CNN-based encoder of EMSANet can be replaced with a single Transformer-based encoder. To achieve this, we investigate how information from both RGB and depth data can be effectively incorporated in a single encoder. To accelerate inference on robotic hardware, we provide a custom NVIDIA TensorRT extension enabling highly optimization for our EMSAFormer approach. Through extensive experiments on the commonly used indoor datasets NYUv2, SUNRGB-D, and ScanNet, we show that our approach achieves state-of-the-art performance while still enabling inference with up to 39.1 FPS on an NVIDIA Jetson AGX Orin 32 GB.Comment: To be published in IEEE International Joint Conference on Neural Networks (IJCNN) 202

arXiv.org e-Print Archive

PanopticNDT: Efficient and Robust Panoptic Mapping

Author: Fischedick Söhnke Benedikt
Gross Horst-Michael
Müller Steffen
Rabes Leonard
Seichter Daniel
Stephan Benedict
Publication venue
Publication date: 24/09/2023
Field of study

As the application scenarios of mobile robots are getting more complex and challenging, scene understanding becomes increasingly crucial. A mobile robot that is supposed to operate autonomously in indoor environments must have precise knowledge about what objects are present, where they are, what their spatial extent is, and how they can be reached; i.e., information about free space is also crucial. Panoptic mapping is a powerful instrument providing such information. However, building 3D panoptic maps with high spatial resolution is challenging on mobile robots, given their limited computing capabilities. In this paper, we propose PanopticNDT - an efficient and robust panoptic mapping approach based on occupancy normal distribution transform (NDT) mapping. We evaluate our approach on the publicly available datasets Hypersim and ScanNetV2. The results reveal that our approach can represent panoptic information at a higher level of detail than other state-of-the-art approaches while enabling real-time panoptic mapping on mobile robots. Finally, we prove the real-world applicability of PanopticNDT with qualitative results in a domestic application.Comment: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 202

arXiv.org e-Print Archive

A multi-modal person perception framework for socially interactive mobile service robots

Author: Aganian Dustin
Eisenbach Markus
Gross Horst-Michael
Müller Steffen
Trinh Thanh Quang
Wengefeld Tim
Publication venue: 'MDPI AG'
Publication date: 28/01/2020
Field of study

In order to meet the increasing demands of mobile service robot applications, a dedicated perception module is an essential requirement for the interaction with users in real-world scenarios. In particular, multi sensor fusion and human re-identification are recognized as active research fronts. Through this paper we contribute to the topic and present a modular detection and tracking system that models position and additional properties of persons in the surroundings of a mobile robot. The proposed system introduces a probability-based data association method that besides the position can incorporate face and color-based appearance features in order to realize a re-identification of persons when tracking gets interrupted. The system combines the results of various state-of-the-art image-based detection systems for person recognition, person identification and attribute estimation. This allows a stable estimate of a mobile robot’s user, even in complex, cluttered environments with long-lasting occlusions. In our benchmark, we introduce a new measure for tracking consistency and show the improvements when face and appearance-based re-identification are combined. The tracking system was applied in a real world application with a mobile rehabilitation assistant robot in a public hospital. The estimated states of persons are used for the user-centered navigation behaviors, e.g., guiding or approaching a person, but also for realizing a socially acceptable navigation in public environments

Multidisciplinary Digital Publishing Institute

Digitale Bibliothek Thüringen

Can communication technologies reduce loneliness and social isolation in older people?: a scoping review of reviews

Author: Brandenburg Karlheinz
Broll Wolfgang
Conde Melisa
Döring Nicola
Gross Horst-Michael
Raake Alexander
Werner Stephan
Publication venue: 'MDPI AG'
Publication date: 08/09/2022
Field of study

Background: Loneliness and social isolation in older age are considered major public health concerns and research on technology-based solutions is growing rapidly. This scoping review of reviews aims to summarize the communication technologies (CTs) (review question RQ1), theoretical frameworks (RQ2), study designs (RQ3), and positive effects of technology use (RQ4) present in the research field. Methods: A comprehensive multi-disciplinary, multi-database literature search was conducted. Identified reviews were analyzed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) framework. A total of N = 28 research reviews that cover 248 primary studies spanning 50 years were included. Results: The majority of the included reviews addressed general internet and computer use (82% each) (RQ1). Of the 28 reviews, only one (4%) worked with a theoretical framework (RQ2) and 26 (93%) covered primary studies with quantitative-experimental designs (RQ3). The positive effects of technology use were shown in 55% of the outcome measures for loneliness and 44% of the outcome measures for social isolation (RQ4). Conclusion: While research reviews show that CTs can reduce loneliness and social isolation in older people, causal evidence is limited and insights on innovative technologies such as augmented reality systems are scarce

PubMed Central

Digitale Bibliothek Thüringen