5 research outputs found
A Whole-Body Pose Taxonomy for Loco-Manipulation Tasks
Exploiting interaction with the environment is a promising and powerful way
to enhance stability of humanoid robots and robustness while executing
locomotion and manipulation tasks. Recently some works have started to show
advances in this direction considering humanoid locomotion with multi-contacts,
but to be able to fully develop such abilities in a more autonomous way, we
need to first understand and classify the variety of possible poses a humanoid
robot can achieve to balance. To this end, we propose the adaptation of a
successful idea widely used in the field of robot grasping to the field of
humanoid balance with multi-contacts: a whole-body pose taxonomy classifying
the set of whole-body robot configurations that use the environment to enhance
stability. We have revised criteria of classification used to develop grasping
taxonomies, focusing on structuring and simplifying the large number of
possible poses the human body can adopt. We propose a taxonomy with 46 poses,
containing three main categories, considering number and type of supports as
well as possible transitions between poses. The taxonomy induces a
classification of motion primitives based on the pose used for support, and a
set of rules to store and generate new motions. We present preliminary results
that apply known segmentation techniques to motion data from the KIT whole-body
motion database. Using motion capture data with multi-contacts, we can identify
support poses providing a segmentation that can distinguish between locomotion
and manipulation parts of an action.Comment: 8 pages, 7 figures, 1 table with full page figure that appears in
landscape page, 2015 IEEE/RSJ International Conference on Intelligent Robots
and System
Action-oriented Scene Understanding
In order to allow robots to act autonomously it is crucial that they do not only describe their environment accurately but also identify how to interact with their surroundings.
While we witnessed tremendous progress in descriptive computer vision, approaches that explicitly target action are scarcer.
This cumulative dissertation approaches the goal of interpreting visual scenes “in the wild” with respect to actions implied by the scene. We call this approach action-oriented scene understanding. It involves identifying and judging opportunities for interaction with constituents of the scene (e.g. objects and their parts) as well as understanding object functions and how interactions will impact the future. All of these aspects are addressed on three levels of abstraction: elements, perception and reasoning.
On the elementary level, we investigate semantic and functional grouping of objects by analyzing annotated natural image scenes. We compare object label-based and visual context definitions with respect to their suitability for generating meaningful object class representations. Our findings suggest that representations generated from visual context are on-par in terms of semantic quality with those generated from large quantities of text.
The perceptive level concerns action identification. We propose a system to identify possible interactions for robots and humans with the environment (affordances) on a pixel level using state-of-the-art machine learning methods. Pixel-wise part annotations of images are transformed into 12 affordance maps. Using these maps, a convolutional neural network is trained to densely predict affordance maps from unknown RGB images. In contrast to previous work, this approach operates exclusively on RGB images during both, training and testing, and yet achieves state-of-the-art performance.
At the reasoning level, we extend the question from asking what actions are possible to what actions are plausible. For this, we gathered a dataset of household images associated with human ratings of the likelihoods of eight different actions. Based on the judgement provided by the human raters, we train convolutional neural networks to generate plausibility scores from unseen images.
Furthermore, having considered only static scenes previously in this thesis, we propose a system that takes video input and predicts plausible future actions. Since this requires careful identification of relevant features in the video sequence, we analyze this particular aspect in detail using a synthetic dataset for several state-of-the-art video models. We identify feature learning as a major obstacle for anticipation in natural video data.
The presented projects analyze the role of action in scene understanding from various angles and in multiple settings while highlighting the advantages of assuming an action-oriented perspective.
We conclude that action-oriented scene understanding can augment classic computer vision in many real-life applications, in particular robotics
Reasoning and understanding grasp affordances for robot manipulation
This doctoral research focuses on developing new methods that enable an artificial agent
to grasp and manipulate objects autonomously. More specifically, we are using the concept
of affordances to learn and generalise robot grasping and manipulation techniques. [75] defined affordances as the ability of an agent to perform a certain action with an object in a
given environment. In robotics, affordances defines the possibility of an agent to perform
actions with an object. Therefore, by understanding the relation between actions, objects
and the effect of these actions, the agent understands the task at hand, providing the robot
with the potential to bridge perception to action. The significance of affordances in robotics
has been studied from varied perspectives, such as psychology and cognitive sciences.
Many efforts have been made to pragmatically employ the concept of affordances as it
provides the potential for an artificial agent to perform tasks autonomously. We start by reviewing and finding common ground amongst different strategies that use affordances for
robotic tasks. We build on the identified grounds to provide guidance on including the concept of affordances as a medium to boost autonomy for an artificial agent. To this end, we
outline common design choices to build an affordance relation; and their implications on
the generalisation capabilities of the agent when facing previously unseen scenarios. Based
on our exhaustive review, we conclude that prior research on object affordance detection
is effective, however, among others, it has the following technical gaps: (i) the methods are
limited to a single object ↔ affordance hypothesis, and (ii) they cannot guarantee task completion or any level of performance for the manipulation task alone nor (iii) in collaboration
with other agents. In this research thesis, we propose solutions to these technical challenges.
In an incremental fashion, we start by addressing the limited generalisation capabilities
of, at the time state-of-the-art methods, by strengthening the perception to action connection through the construction of an Knowledge Base (KB). We then leverage the information
encapsulated in the KB to design and implement a reasoning and understanding method
based on statistical relational leaner (SRL) that allows us to cope with uncertainty in testing
environments, and thus, improve generalisation capabilities in affordance-aware manipulation tasks. The KB in conjunctions with our SRL are the base for our designed solutions
that guarantee task completion when the robot is performing a task alone as well as when in
collaboration with other agents. We finally expose and discuss a range of interesting avenues
that have the potential to thrive the capabilities of a robotic agent through the use of the
concept of affordances for manipulation tasks. A summary of the contributions of this thesis
can be found at: https://bit.ly/grasp_affordance_reasonin