152 research outputs found
Relate to Predict: Towards Task-Independent Knowledge Representations for Reinforcement Learning
Reinforcement Learning (RL) can enable agents to learn complex tasks.
However, it is difficult to interpret the knowledge and reuse it across tasks.
Inductive biases can address such issues by explicitly providing generic yet
useful decomposition that is otherwise difficult or expensive to learn
implicitly. For example, object-centered approaches decompose a high
dimensional observation into individual objects. Expanding on this, we utilize
an inductive bias for explicit object-centered knowledge separation that
provides further decomposition into semantic representations and dynamics
knowledge. For this, we introduce a semantic module that predicts an objects'
semantic state based on its context. The resulting affordance-like object state
can then be used to enrich perceptual object representations. With a minimal
setup and an environment that enables puzzle-like tasks, we demonstrate the
feasibility and benefits of this approach. Specifically, we compare three
different methods of integrating semantic representations into a model-based RL
architecture. Our experiments show that the degree of explicitness in knowledge
separation correlates with faster learning, better accuracy, better
generalization, and better interpretability.Comment: submitted to IJCNN 202
Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds
Accurate detection of 3D objects is a fundamental problem in computer vision
and has an enormous impact on autonomous cars, augmented/virtual reality and
many applications in robotics. In this work we present a novel fusion of neural
network based state-of-the-art 3D detector and visual semantic segmentation in
the context of autonomous driving. Additionally, we introduce
Scale-Rotation-Translation score (SRTs), a fast and highly parameterizable
evaluation metric for comparison of object detections, which speeds up our
inference time up to 20\% and halves training time. On top, we apply
state-of-the-art online multi target feature tracking on the object
measurements to further increase accuracy and robustness utilizing temporal
information. Our experiments on KITTI show that we achieve same results as
state-of-the-art in all related categories, while maintaining the performance
and accuracy trade-off and still run in real-time. Furthermore, our model is
the first one that fuses visual semantic with 3D object detection
Fusing Hand and Body Skeletons for Human Action Recognition in Assembly
As collaborative robots (cobots) continue to gain popularity in industrial
manufacturing, effective human-robot collaboration becomes crucial. Cobots
should be able to recognize human actions to assist with assembly tasks and act
autonomously. To achieve this, skeleton-based approaches are often used due to
their ability to generalize across various people and environments. Although
body skeleton approaches are widely used for action recognition, they may not
be accurate enough for assembly actions where the worker's fingers and hands
play a significant role. To address this limitation, we propose a method in
which less detailed body skeletons are combined with highly detailed hand
skeletons. We investigate CNNs and transformers, the latter of which are
particularly adept at extracting and combining important information from both
skeleton types using attention. This paper demonstrates the effectiveness of
our proposed approach in enhancing action recognition in assembly scenarios.Comment: International Conference on Artificial Neural Networks (ICANN) 202
How Object Information Improves Skeleton-based Human Action Recognition in Assembly Tasks
As the use of collaborative robots (cobots) in industrial manufacturing
continues to grow, human action recognition for effective human-robot
collaboration becomes increasingly important. This ability is crucial for
cobots to act autonomously and assist in assembly tasks. Recently,
skeleton-based approaches are often used as they tend to generalize better to
different people and environments. However, when processing skeletons alone,
information about the objects a human interacts with is lost. Therefore, we
present a novel approach of integrating object information into skeleton-based
action recognition. We enhance two state-of-the-art methods by treating object
centers as further skeleton joints. Our experiments on the assembly dataset
IKEA ASM show that our approach improves the performance of these
state-of-the-art methods to a large extent when combining skeleton joints with
objects predicted by a state-of-the-art instance segmentation model. Our
research sheds light on the benefits of combining skeleton joints with object
information for human action recognition in assembly tasks. We analyze the
effect of the object detector on the combination for action classification and
discuss the important factors that must be taken into account.Comment: IEEE International Joint Conference on Neural Networks (IJCNN) 202
Efficient Multi-Task Scene Analysis with RGB-D Transformers
Scene analysis is essential for enabling autonomous systems, such as mobile
robots, to operate in real-world environments. However, obtaining a
comprehensive understanding of the scene requires solving multiple tasks, such
as panoptic segmentation, instance orientation estimation, and scene
classification. Solving these tasks given limited computing and battery
capabilities on mobile platforms is challenging. To address this challenge, we
introduce an efficient multi-task scene analysis approach, called EMSAFormer,
that uses an RGB-D Transformer-based encoder to simultaneously perform the
aforementioned tasks. Our approach builds upon the previously published
EMSANet. However, we show that the dual CNN-based encoder of EMSANet can be
replaced with a single Transformer-based encoder. To achieve this, we
investigate how information from both RGB and depth data can be effectively
incorporated in a single encoder. To accelerate inference on robotic hardware,
we provide a custom NVIDIA TensorRT extension enabling highly optimization for
our EMSAFormer approach. Through extensive experiments on the commonly used
indoor datasets NYUv2, SUNRGB-D, and ScanNet, we show that our approach
achieves state-of-the-art performance while still enabling inference with up to
39.1 FPS on an NVIDIA Jetson AGX Orin 32 GB.Comment: To be published in IEEE International Joint Conference on Neural
Networks (IJCNN) 202
PanopticNDT: Efficient and Robust Panoptic Mapping
As the application scenarios of mobile robots are getting more complex and
challenging, scene understanding becomes increasingly crucial. A mobile robot
that is supposed to operate autonomously in indoor environments must have
precise knowledge about what objects are present, where they are, what their
spatial extent is, and how they can be reached; i.e., information about free
space is also crucial. Panoptic mapping is a powerful instrument providing such
information. However, building 3D panoptic maps with high spatial resolution is
challenging on mobile robots, given their limited computing capabilities. In
this paper, we propose PanopticNDT - an efficient and robust panoptic mapping
approach based on occupancy normal distribution transform (NDT) mapping. We
evaluate our approach on the publicly available datasets Hypersim and
ScanNetV2. The results reveal that our approach can represent panoptic
information at a higher level of detail than other state-of-the-art approaches
while enabling real-time panoptic mapping on mobile robots. Finally, we prove
the real-world applicability of PanopticNDT with qualitative results in a
domestic application.Comment: IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS), 202
A multi-modal person perception framework for socially interactive mobile service robots
In order to meet the increasing demands of mobile service robot applications, a dedicated perception module is an essential requirement for the interaction with users in real-world scenarios. In particular, multi sensor fusion and human re-identification are recognized as active research fronts. Through this paper we contribute to the topic and present a modular detection and tracking system that models position and additional properties of persons in the surroundings of a mobile robot. The proposed system introduces a probability-based data association method that besides the position can incorporate face and color-based appearance features in order to realize a re-identification of persons when tracking gets interrupted. The system combines the results of various state-of-the-art image-based detection systems for person recognition, person identification and attribute estimation. This allows a stable estimate of a mobile robot’s user, even in complex, cluttered environments with long-lasting occlusions. In our benchmark, we introduce a new measure for tracking consistency and show the improvements when face and appearance-based re-identification are combined. The tracking system was applied in a real world application with a mobile rehabilitation assistant robot in a public hospital. The estimated states of persons are used for the user-centered navigation behaviors, e.g., guiding or approaching a person, but also for realizing a socially acceptable navigation in public environments
Can communication technologies reduce loneliness and social isolation in older people?: a scoping review of reviews
Background: Loneliness and social isolation in older age are considered major public health concerns and research on technology-based solutions is growing rapidly. This scoping review of reviews aims to summarize the communication technologies (CTs) (review question RQ1), theoretical frameworks (RQ2), study designs (RQ3), and positive effects of technology use (RQ4) present in the research field. Methods: A comprehensive multi-disciplinary, multi-database literature search was conducted. Identified reviews were analyzed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) framework. A total of N = 28 research reviews that cover 248 primary studies spanning 50 years were included. Results: The majority of the included reviews addressed general internet and computer use (82% each) (RQ1). Of the 28 reviews, only one (4%) worked with a theoretical framework (RQ2) and 26 (93%) covered primary studies with quantitative-experimental designs (RQ3). The positive effects of technology use were shown in 55% of the outcome measures for loneliness and 44% of the outcome measures for social isolation (RQ4). Conclusion: While research reviews show that CTs can reduce loneliness and social isolation in older people, causal evidence is limited and insights on innovative technologies such as augmented reality systems are scarce
- …