Search CORE

90 research outputs found

Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

Author: Elhoseiny Mohamed
Li Li Erran
Zhu Deyao
Publication venue
Publication date: 03/10/2022
Field of study

Reinforcement Learning (RL) methods are typically applied directly in environments to learn policies. In some complex environments with continuous state-action spaces, sparse rewards, and/or long temporal horizons, learning a good policy in the original environments can be difficult. Focusing on the offline RL setting, we aim to build a simple and discrete world model that abstracts the original environment. RL methods are applied to our world model instead of the environment data for simplified policy learning. Our world model, dubbed Value Memory Graph (VMG), is designed as a directed-graph-based Markov decision process (MDP) of which vertices and directed edges represent graph states and graph actions, separately. As the state-action spaces of VMG are finite and relatively small compared to the original environment, we can directly apply the value iteration algorithm on VMG to estimate graph state values and figure out the best graph actions. VMG is trained from and built on the offline RL dataset. Together with an action translator that converts the abstract graph actions in VMG to real actions in the original environment, VMG controls agents to maximize episode returns. Our experiments on the D4RL benchmark show that VMG can outperform state-of-the-art offline RL methods in several tasks, especially when environments have sparse rewards and long temporal horizons. Code will be made publicly available

arXiv.org e-Print Archive

For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal

Author: Gao Yang
Hu Yingdong
Li Li Erran
Wang Renhao
Publication venue
Publication date: 10/04/2023
Field of study

In recent years, increasing attention has been directed to leveraging pre-trained vision models for motor control. While existing works mainly emphasize the importance of this pre-training phase, the arguably equally important role played by downstream policy learning during control-specific fine-tuning is often neglected. It thus remains unclear if pre-trained vision models are consistent in their effectiveness under different control policies. To bridge this gap in understanding, we conduct a comprehensive study on 14 pre-trained vision models using 3 distinct classes of policy learning methods, including reinforcement learning (RL), imitation learning through behavior cloning (BC), and imitation learning with a visual reward function (VRF). Our study yields a series of intriguing results, including the discovery that the effectiveness of pre-training is highly dependent on the choice of the downstream policy learning algorithm. We show that conventionally accepted evaluation based on RL methods is highly variable and therefore unreliable, and further advocate for using more robust methods like VRF and BC. To facilitate more universal evaluations of pre-trained models and their policy learning methods in the future, we also release a benchmark of 21 tasks across 3 different environments alongside our work

arXiv.org e-Print Archive

ImageCaptioner $^2$ : Image Captioner for Image Captioning Bias Amplification Assessment

Author: Bakr Eslam Mohamed
Elhoseiny Mohamed
Li Li Erran
Sun Pengzhan
Publication venue
Publication date: 10/04/2023
Field of study

Most pre-trained learning systems are known to suffer from bias, which typically emerges from the data, the model, or both. Measuring and quantifying bias and its sources is a challenging task and has been extensively studied in image captioning. Despite the significant effort in this direction, we observed that existing metrics lack consistency in the inclusion of the visual signal. In this paper, we introduce a new bias assessment metric, dubbed

ImageCaptioner^2

, for image captioning. Instead of measuring the absolute bias in the model or the data,

ImageCaptioner^2

pay more attention to the bias introduced by the model w.r.t the data bias, termed bias amplification. Unlike the existing methods, which only evaluate the image captioning algorithms based on the generated captions only,

ImageCaptioner^2

incorporates the image while measuring the bias. In addition, we design a formulation for measuring the bias of generated captions as prompt-based image captioning instead of using language classifiers. Finally, we apply our

ImageCaptioner^2

metric across 11 different image captioning architectures on three different datasets, i.e., MS-COCO caption dataset, Artemis V1, and Artemis V2, and on three different protected attributes, i.e., gender, race, and emotions. Consequently, we verify the effectiveness of our

ImageCaptioner^2

metric by proposing AnonymousBench, which is a novel human evaluation paradigm for bias metrics. Our metric shows significant superiority over the recent bias metric; LIC, in terms of human alignment, where the correlation scores are 80% and 54% for our metric and LIC, respectively. The code is available at https://eslambakr.github.io/imagecaptioner2.github.io/

arXiv.org e-Print Archive

Analysis of a Cone-Based Distributed Topology Control Algorithm for Wireless Multi-hop Networks

Author: Bahl Paramvir
Halpern Joseph Y.
Li Erran L.
Wang Yi-Min
Wattenhofer Roger
Publication venue
Publication date: 01/01/2001
Field of study

The topology of a wireless multi-hop network can be controlled by varying the transmission power at each node. In this paper, we give a detailed analysis of a cone-based distributed topology control algorithm. This algorithm, introduced in [16], does not assume that nodes have GPS information available; rather it depends only on directional information. Roughly speaking, the basic idea of the algorithm is that a node

u

transmits with the minimum power

p_{u,\alpha}

required to ensure that in every cone of degree

\alpha

around

u

, there is some node that

u

can reach with power

p_{u,\alpha}

. We show that taking

\alpha = 5\pi/6

is a necessary and sufficient condition to guarantee that network connectivity is preserved. More precisely, if there is a path from

s

t

when every node communicates at maximum power, then, if

\alpha <= 5\pi/6

, there is still a path in the smallest symmetric graph

G_\alpha

containing all edges

(u,v)

such that

u

can communicate with

v

using power

p_{u,\alpha}

. On the other hand, if

\alpha > 5\pi/6

, connectivity is not necessarily preserved. We also propose a set of optimizations that further reduce power consumption and prove that they retain network connectivity. Dynamic reconfiguration in the presence of failures and mobility is also discussed. Simulation results are presented to demonstrate the effectiveness of the algorithm and the optimizations.Comment: 10 page

arXiv.org e-Print Archive

CiteSeerX

Crossref