89 research outputs found

    Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

    Full text link
    Reinforcement Learning (RL) methods are typically applied directly in environments to learn policies. In some complex environments with continuous state-action spaces, sparse rewards, and/or long temporal horizons, learning a good policy in the original environments can be difficult. Focusing on the offline RL setting, we aim to build a simple and discrete world model that abstracts the original environment. RL methods are applied to our world model instead of the environment data for simplified policy learning. Our world model, dubbed Value Memory Graph (VMG), is designed as a directed-graph-based Markov decision process (MDP) of which vertices and directed edges represent graph states and graph actions, separately. As the state-action spaces of VMG are finite and relatively small compared to the original environment, we can directly apply the value iteration algorithm on VMG to estimate graph state values and figure out the best graph actions. VMG is trained from and built on the offline RL dataset. Together with an action translator that converts the abstract graph actions in VMG to real actions in the original environment, VMG controls agents to maximize episode returns. Our experiments on the D4RL benchmark show that VMG can outperform state-of-the-art offline RL methods in several tasks, especially when environments have sparse rewards and long temporal horizons. Code will be made publicly available

    For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal

    Full text link
    In recent years, increasing attention has been directed to leveraging pre-trained vision models for motor control. While existing works mainly emphasize the importance of this pre-training phase, the arguably equally important role played by downstream policy learning during control-specific fine-tuning is often neglected. It thus remains unclear if pre-trained vision models are consistent in their effectiveness under different control policies. To bridge this gap in understanding, we conduct a comprehensive study on 14 pre-trained vision models using 3 distinct classes of policy learning methods, including reinforcement learning (RL), imitation learning through behavior cloning (BC), and imitation learning with a visual reward function (VRF). Our study yields a series of intriguing results, including the discovery that the effectiveness of pre-training is highly dependent on the choice of the downstream policy learning algorithm. We show that conventionally accepted evaluation based on RL methods is highly variable and therefore unreliable, and further advocate for using more robust methods like VRF and BC. To facilitate more universal evaluations of pre-trained models and their policy learning methods in the future, we also release a benchmark of 21 tasks across 3 different environments alongside our work

    ImageCaptioner2^2: Image Captioner for Image Captioning Bias Amplification Assessment

    Full text link
    Most pre-trained learning systems are known to suffer from bias, which typically emerges from the data, the model, or both. Measuring and quantifying bias and its sources is a challenging task and has been extensively studied in image captioning. Despite the significant effort in this direction, we observed that existing metrics lack consistency in the inclusion of the visual signal. In this paper, we introduce a new bias assessment metric, dubbed ImageCaptioner2ImageCaptioner^2, for image captioning. Instead of measuring the absolute bias in the model or the data, ImageCaptioner2ImageCaptioner^2 pay more attention to the bias introduced by the model w.r.t the data bias, termed bias amplification. Unlike the existing methods, which only evaluate the image captioning algorithms based on the generated captions only, ImageCaptioner2ImageCaptioner^2 incorporates the image while measuring the bias. In addition, we design a formulation for measuring the bias of generated captions as prompt-based image captioning instead of using language classifiers. Finally, we apply our ImageCaptioner2ImageCaptioner^2 metric across 11 different image captioning architectures on three different datasets, i.e., MS-COCO caption dataset, Artemis V1, and Artemis V2, and on three different protected attributes, i.e., gender, race, and emotions. Consequently, we verify the effectiveness of our ImageCaptioner2ImageCaptioner^2 metric by proposing AnonymousBench, which is a novel human evaluation paradigm for bias metrics. Our metric shows significant superiority over the recent bias metric; LIC, in terms of human alignment, where the correlation scores are 80% and 54% for our metric and LIC, respectively. The code is available at https://eslambakr.github.io/imagecaptioner2.github.io/

    Analysis of a Cone-Based Distributed Topology Control Algorithm for Wireless Multi-hop Networks

    Full text link
    The topology of a wireless multi-hop network can be controlled by varying the transmission power at each node. In this paper, we give a detailed analysis of a cone-based distributed topology control algorithm. This algorithm, introduced in [16], does not assume that nodes have GPS information available; rather it depends only on directional information. Roughly speaking, the basic idea of the algorithm is that a node uu transmits with the minimum power pu,αp_{u,\alpha} required to ensure that in every cone of degree α\alpha around uu, there is some node that uu can reach with power pu,αp_{u,\alpha}. We show that taking α=5π/6\alpha = 5\pi/6 is a necessary and sufficient condition to guarantee that network connectivity is preserved. More precisely, if there is a path from ss to tt when every node communicates at maximum power, then, if α<=5π/6\alpha <= 5\pi/6, there is still a path in the smallest symmetric graph GαG_\alpha containing all edges (u,v)(u,v) such that uu can communicate with vv using power pu,αp_{u,\alpha}. On the other hand, if α>5π/6\alpha > 5\pi/6, connectivity is not necessarily preserved. We also propose a set of optimizations that further reduce power consumption and prove that they retain network connectivity. Dynamic reconfiguration in the presence of failures and mobility is also discussed. Simulation results are presented to demonstrate the effectiveness of the algorithm and the optimizations.Comment: 10 page
    • …
    corecore