46 research outputs found
Hierarchical Policy Learning for Mechanical Search
Retrieving objects from clutters is a complex task, which requires multiple
interactions with the environment until the target object can be extracted.
These interactions involve executing action primitives like grasping or pushing
as well as setting priorities for the objects to manipulate and the actions to
execute. Mechanical Search (MS) is a framework for object retrieval, which uses
a heuristic algorithm for pushing and rule-based algorithms for high-level
planning. While rule-based policies profit from human intuition in how they
work, they usually perform sub-optimally in many cases. Deep reinforcement
learning (RL) has shown great performance in complex tasks such as taking
decisions through evaluating pixels, which makes it suitable for training
policies in the context of object-retrieval. In this work, we first formulate
the MS problem in a principled formulation as a hierarchical POMDP. Based on
this formulation, we propose a hierarchical policy learning approach for the MS
problem. For demonstration, we present two main parameterized sub-policies: a
push policy and an action selection policy. When integrated into the
hierarchical POMDP's policy, our proposed sub-policies increase the success
rate of retrieving the target object from less than 32% to nearly 80%, while
reducing the computation time for push actions from multiple seconds to less
than 10 milliseconds.Comment: ICRA 202
A Covariance Matrix Adaptation Evolution Strategy for Direct Policy Search in Reproducing Kernel Hilbert Space
The covariance matrix adaptation evolution strategy (CMA-ES) is an efficient derivative-free optimization algorithm. It optimizes a black-box objective function over a well defined parameter space. In some problems, such parameter spaces are defined using function approximation in which feature functions are manually defined. Therefore, the performance of those techniques strongly depends on the quality of chosen features. Hence, enabling CMA-ES to optimize on a more complex and general function class of the objective has long been desired. Specifically, we consider modeling the input space for black-box optimization in reproducing kernel Hilbert spaces (RKHS). This modeling leads to a functional optimization problem whose domain is a function space that enables us to optimize in a very rich function class. In addition, we propose CMA-ES-RKHS, a generalized CMA-ES framework, that performs black-box functional optimization in the RKHS. A search distribution, represented as a Gaussian process, is adapted by updating both its mean function and covariance operator. Adaptive representation of the function and covariance operator is achieved with sparsification techniques. We evaluate CMA-ES-RKHS on a simple functional optimization problem and bench-mark reinforcement learning (RL) domains. For an application in RL, we model policies for MDPs in RKHS and transform a cumulative return objective as a functional of RKHS policies, which can be optimized via CMA-ES-RKHS. This formulation results in a black-box functional policy search framework
DMFC-GraspNet: Differentiable Multi-Fingered Robotic Grasp Generation in Cluttered Scenes
Robotic grasping is a fundamental skill required for object manipulation in
robotics. Multi-fingered robotic hands, which mimic the structure of the human
hand, can potentially perform complex object manipulation. Nevertheless,
current techniques for multi-fingered robotic grasping frequently predict only
a single grasp for each inference time, limiting computational efficiency and
their versatility, i.e. unimodal grasp distribution. This paper proposes a
differentiable multi-fingered grasp generation network (DMFC-GraspNet) with
three main contributions to address this challenge. Firstly, a novel neural
grasp planner is proposed, which predicts a new grasp representation to enable
versatile and dense grasp predictions. Secondly, a scene creation and label
mapping method is developed for dense labeling of multi-fingered robotic hands,
which allows a dense association of ground truth grasps. Thirdly, we propose to
train DMFC-GraspNet end-to-end using using a forward-backward automatic
differentiation approach with both a supervised loss and a differentiable
collision loss and a generalized Q 1 grasp metric loss. The proposed approach
is evaluated using the Shadow Dexterous Hand on Mujoco simulation and ablated
by different choices of loss functions. The results demonstrate the
effectiveness of the proposed approach in predicting versatile and dense
grasps, and in advancing the field of multi-fingered robotic grasping.Comment: Submitted IROS 2023 workshop "Policy Learning in Geometric Spaces
SA6D: Self-Adaptive Few-Shot 6D Pose Estimator for Novel and Occluded Objects
To enable meaningful robotic manipulation of objects in the real-world, 6D
pose estimation is one of the critical aspects. Most existing approaches have
difficulties to extend predictions to scenarios where novel object instances
are continuously introduced, especially with heavy occlusions. In this work, we
propose a few-shot pose estimation (FSPE) approach called SA6D, which uses a
self-adaptive segmentation module to identify the novel target object and
construct a point cloud model of the target object using only a small number of
cluttered reference images. Unlike existing methods, SA6D does not require
object-centric reference images or any additional object information, making it
a more generalizable and scalable solution across categories. We evaluate SA6D
on real-world tabletop object datasets and demonstrate that SA6D outperforms
existing FSPE methods, particularly in cluttered scenes with occlusions, while
requiring fewer reference images
FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image Fusion
Sensor fusion can significantly improve the performance of many computer
vision tasks. However, traditional fusion approaches are either not data-driven
and cannot exploit prior knowledge nor find regularities in a given dataset or
they are restricted to a single application. We overcome this shortcoming by
presenting a novel deep hierarchical variational autoencoder called FusionVAE
that can serve as a basis for many fusion tasks. Our approach is able to
generate diverse image samples that are conditioned on multiple noisy,
occluded, or only partially visible input images. We derive and optimize a
variational lower bound for the conditional log-likelihood of FusionVAE. In
order to assess the fusion capabilities of our model thoroughly, we created
three novel datasets for image fusion based on popular computer vision
datasets. In our experiments, we show that FusionVAE learns a representation of
aggregated information that is relevant to fusion tasks. The results
demonstrate that our approach outperforms traditional methods significantly.
Furthermore, we present the advantages and disadvantages of different design
choices.Comment: Accepted at ECCV 202
Deep Energy Autoencoder for Noncoherent Multicarrier MU-SIMO Systems
We propose a novel deep energy autoencoder (EA) for noncoherent multicarrier
multiuser single-input multipleoutput (MU-SIMO) systems under fading channels.
In particular, a single-user noncoherent EA-based (NC-EA) system, based on the
multicarrier SIMO framework, is first proposed, where both the transmitter and
receiver are represented by deep neural networks (DNNs), known as the encoder
and decoder of an EA. Unlike existing systems, the decoder of the NC-EA is fed
only with the energy combined from all receive antennas, while its encoder
outputs a real-valued vector whose elements stand for the subcarrier power
levels. Using the NC-EA, we then develop two novel DNN structures for both
uplink and downlink NC-EA multiple access (NC-EAMA) schemes, based on the
multicarrier MUSIMO framework. Note that NC-EAMA allows multiple users to share
the same sub-carriers, thus enables to achieve higher performance gains than
noncoherent orthogonal counterparts. By properly training, the proposed NC-EA
and NC-EAMA can efficiently recover the transmitted data without any channel
state information estimation. Simulation results clearly show the superiority
of our schemes in terms of reliability, flexibility and complexity over
baseline schemes.Comment: Accepted, IEEE TW
Multi-Arm Bin-Picking in Real-Time: A Combined Task and Motion Planning Approach
Automated bin-picking is a prerequisite for fully automated manufacturing and
warehouses. To successfully pick an item from an unstructured bin the robot
needs to first detect possible grasps for the objects, decide on the object to
remove and consequently plan and execute a feasible trajectory to retrieve the
chosen object. Over the last years significant progress has been made towards
solving these problems. However, when multiple robot arms are cooperating the
decision and planning problems become exponentially harder. We propose an
integrated multi-arm bin-picking pipeline (IMAPIP), and demonstrate that it is
able to reliably pick objects from a bin in real-time using multiple robot
arms. IMAPIP solves the multi-arm bin-picking task first at high-level using a
geometry-aware policy integrated in a combined task and motion planning
framework. We then plan motions consistent with this policy using the BIT*
algorithm on the motion planning level. We show that this integrated solution
enables robot arm cooperation. In our experiments, we show the proposed
geometry-aware policy outperforms a baseline by increasing bin-picking time by
28\% using two robot arms. The policy is robust to changes in the position of
the bin and number of objects. We also show that IMAPIP to successfully scale
up to four robot arms working in close proximity.Comment: 8 page
What Matters for Meta-Learning Vision Regression Tasks?
Meta-learning is widely used in few-shot classification and function
regression due to its ability to quickly adapt to unseen tasks. However, it has
not yet been well explored on regression tasks with high dimensional inputs
such as images. This paper makes two main contributions that help understand
this barely explored area. \emph{First}, we design two new types of
cross-category level vision regression tasks, namely object discovery and pose
estimation of unprecedented complexity in the meta-learning domain for computer
vision. To this end, we (i) exhaustively evaluate common meta-learning
techniques on these tasks, and (ii) quantitatively analyze the effect of
various deep learning techniques commonly used in recent meta-learning
algorithms in order to strengthen the generalization capability: data
augmentation, domain randomization, task augmentation and meta-regularization.
Finally, we (iii) provide some insights and practical recommendations for
training meta-learning algorithms on vision regression tasks. \emph{Second}, we
propose the addition of functional contrastive learning (FCL) over the task
representations in Conditional Neural Processes (CNPs) and train in an
end-to-end fashion. The experimental results show that the results of prior
work are misleading as a consequence of a poor choice of the loss function as
well as too small meta-training sets. Specifically, we find that CNPs
outperform MAML on most tasks without fine-tuning. Furthermore, we observe that
naive task augmentation without a tailored design results in underfitting.Comment: Accepted at CVPR 202