199,143 research outputs found

    HIQL: Offline Goal-Conditioned RL with Latent States as Actions

    Full text link
    Unsupervised pre-training has recently become the bedrock for computer vision and natural language processing. In reinforcement learning (RL), goal-conditioned RL can potentially provide an analogous self-supervised approach for making use of large quantities of unlabeled (reward-free) data. However, building effective algorithms for goal-conditioned RL that can learn directly from diverse offline data is challenging, because it is hard to accurately estimate the exact value function for faraway goals. Nonetheless, goal-reaching problems exhibit structure, such that reaching distant goals entails first passing through closer subgoals. This structure can be very useful, as assessing the quality of actions for nearby goals is typically easier than for more distant goals. Based on this idea, we propose a hierarchical algorithm for goal-conditioned RL from offline data. Using one action-free value function, we learn two policies that allow us to exploit this structure: a high-level policy that treats states as actions and predicts (a latent representation of) a subgoal and a low-level policy that predicts the action for reaching this subgoal. Through analysis and didactic examples, we show how this hierarchical decomposition makes our method robust to noise in the estimated value function. We then apply our method to offline goal-reaching benchmarks, showing that our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data. Our code is available at https://seohong.me/projects/hiql

    Integer programming based solution approaches for the train dispatching problem

    Get PDF
    Railroads face the challenge of competing with the trucking industry in a fastpaced environment. In this respect, they are working toward running freight trains on schedule and reducing travel times. The planned train schedules consist of departure and arrival times at main stations on the rail network. A detailed timetable, on the other hand, consists of the departure and arrival times of each train in each track section of its route. The train dispatching problem aims to determine detailed timetables over a rail network in order to minimize deviations from the planned schedule. We provide a new integer programming formulation for this problem based on a spacetime network; we propose heuristic algorithms to solve it and present computational results of these algorithms. Our approach includes some realistic constraints that have not been previously considered as well as all the assumptions and practical issues considered by the earlier works
    corecore