90 research outputs found
Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning
Reinforcement Learning (RL) methods are typically applied directly in
environments to learn policies. In some complex environments with continuous
state-action spaces, sparse rewards, and/or long temporal horizons, learning a
good policy in the original environments can be difficult. Focusing on the
offline RL setting, we aim to build a simple and discrete world model that
abstracts the original environment. RL methods are applied to our world model
instead of the environment data for simplified policy learning. Our world
model, dubbed Value Memory Graph (VMG), is designed as a directed-graph-based
Markov decision process (MDP) of which vertices and directed edges represent
graph states and graph actions, separately. As the state-action spaces of VMG
are finite and relatively small compared to the original environment, we can
directly apply the value iteration algorithm on VMG to estimate graph state
values and figure out the best graph actions. VMG is trained from and built on
the offline RL dataset. Together with an action translator that converts the
abstract graph actions in VMG to real actions in the original environment, VMG
controls agents to maximize episode returns. Our experiments on the D4RL
benchmark show that VMG can outperform state-of-the-art offline RL methods in
several tasks, especially when environments have sparse rewards and long
temporal horizons. Code will be made publicly available
For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal
In recent years, increasing attention has been directed to leveraging
pre-trained vision models for motor control. While existing works mainly
emphasize the importance of this pre-training phase, the arguably equally
important role played by downstream policy learning during control-specific
fine-tuning is often neglected. It thus remains unclear if pre-trained vision
models are consistent in their effectiveness under different control policies.
To bridge this gap in understanding, we conduct a comprehensive study on 14
pre-trained vision models using 3 distinct classes of policy learning methods,
including reinforcement learning (RL), imitation learning through behavior
cloning (BC), and imitation learning with a visual reward function (VRF). Our
study yields a series of intriguing results, including the discovery that the
effectiveness of pre-training is highly dependent on the choice of the
downstream policy learning algorithm. We show that conventionally accepted
evaluation based on RL methods is highly variable and therefore unreliable, and
further advocate for using more robust methods like VRF and BC. To facilitate
more universal evaluations of pre-trained models and their policy learning
methods in the future, we also release a benchmark of 21 tasks across 3
different environments alongside our work
ImageCaptioner: Image Captioner for Image Captioning Bias Amplification Assessment
Most pre-trained learning systems are known to suffer from bias, which
typically emerges from the data, the model, or both. Measuring and quantifying
bias and its sources is a challenging task and has been extensively studied in
image captioning. Despite the significant effort in this direction, we observed
that existing metrics lack consistency in the inclusion of the visual signal.
In this paper, we introduce a new bias assessment metric, dubbed
, for image captioning. Instead of measuring the absolute
bias in the model or the data, pay more attention to the
bias introduced by the model w.r.t the data bias, termed bias amplification.
Unlike the existing methods, which only evaluate the image captioning
algorithms based on the generated captions only,
incorporates the image while measuring the bias. In addition, we design a
formulation for measuring the bias of generated captions as prompt-based image
captioning instead of using language classifiers. Finally, we apply our
metric across 11 different image captioning architectures on
three different datasets, i.e., MS-COCO caption dataset, Artemis V1, and
Artemis V2, and on three different protected attributes, i.e., gender, race,
and emotions. Consequently, we verify the effectiveness of our
metric by proposing AnonymousBench, which is a novel human
evaluation paradigm for bias metrics. Our metric shows significant superiority
over the recent bias metric; LIC, in terms of human alignment, where the
correlation scores are 80% and 54% for our metric and LIC, respectively. The
code is available at https://eslambakr.github.io/imagecaptioner2.github.io/
Analysis of a Cone-Based Distributed Topology Control Algorithm for Wireless Multi-hop Networks
The topology of a wireless multi-hop network can be controlled by varying the
transmission power at each node. In this paper, we give a detailed analysis of
a cone-based distributed topology control algorithm. This algorithm, introduced
in [16], does not assume that nodes have GPS information available; rather it
depends only on directional information. Roughly speaking, the basic idea of
the algorithm is that a node transmits with the minimum power
required to ensure that in every cone of degree around
, there is some node that can reach with power . We show
that taking is a necessary and sufficient condition to
guarantee that network connectivity is preserved. More precisely, if there is a
path from to when every node communicates at maximum power, then, if
, there is still a path in the smallest symmetric graph
containing all edges such that can communicate with
using power . On the other hand, if ,
connectivity is not necessarily preserved. We also propose a set of
optimizations that further reduce power consumption and prove that they retain
network connectivity. Dynamic reconfiguration in the presence of failures and
mobility is also discussed. Simulation results are presented to demonstrate the
effectiveness of the algorithm and the optimizations.Comment: 10 page
- …