17 research outputs found

    Replay across Experiments: A Natural Extension of Off-Policy RL

    Full text link
    Replaying data is a principal mechanism underlying the stability and data efficiency of off-policy reinforcement learning (RL). We present an effective yet simple framework to extend the use of replays across multiple experiments, minimally adapting the RL workflow for sizeable improvements in controller performance and research iteration times. At its core, Replay Across Experiments (RaE) involves reusing experience from previous experiments to improve exploration and bootstrap learning while reducing required changes to a minimum in comparison to prior work. We empirically show benefits across a number of RL algorithms and challenging control domains spanning both locomotion and manipulation, including hard exploration tasks from egocentric vision. Through comprehensive ablations, we demonstrate robustness to the quality and amount of data available and various hyperparameter choices. Finally, we discuss how our approach can be applied more broadly across research life cycles and can increase resilience by reloading data across random seeds or hyperparameter variations

    Gradual (In)Compatibility of Fairness Criteria

    No full text
    Impossibility results show that important fairness measures (independence, separation, sufficiency) cannot be satisfied at the same time under reasonable assumptions. This paper explores whether we can satisfy and/or improve these fairness measures simultaneously to a certain degree. We introduce information-theoretic formulations of the fairness measures and define degrees of fairness based on these formulations. The information-theoretic formulations suggest unexplored theoretical relations between the three fairness measures. In the experimental part, we use the information-theoretic expressions as regularizers to obtain fairness-regularized predictors for three standard datasets. Our experiments show that a) fairness regularization directly increases fairness measures, in line with existing work, and b) some fairness regularizations indirectly increase other fairness measures, as suggested by our theoretical findings. This establishes that it is possible to increase the degree to which some fairness measures are satisfied at the same time -- some fairness measures are gradually compatible.Comment: Code available on GitHub: https://github.com/hcorinna/gradual-compatibility, extended version of paper accepted to AAAI'2

    Like two peas in a pod – organic and digital transformation (extended abstract)

    No full text
    Transforming our food system is important to achieving global climate neutrality and food security. Germany has set a national target of reaching a 30% share in organic farming to support the goal. When looking at the transformation process from conventional to organic farming, it becomes apparent that measures need to be taken to reach this anticipated goal. A particular emphasis of this work is placed on finding a digital solution and process improvements to ensure longevity and efficiency. Interviews with actors along the farm-to-fork value chain were conducted to identify central barriers and drivers of organic transformation. The results of the interviews show firstly, that three subsystems need to be distinguished when talking about the farm-to-fork value chain: (1) farmers, (2) intermediaries, and (3) the canteen system. Although all three subsystems can be combined to form a coherent value chain, they rarely act and communicate beyond the boundaries of their subsystem. Secondly, we were able to allocate primary barriers and drivers to each of the subsystems, highlighting the need to include all three in the transformation process and aim for a comprehensive digital solution. This work explores the potential of a network-based platform to improve the current practice of rigid and strictly hierarchical value chains. We focus on deriving user requirements from the interviews to describe the necessary functionality of the platform to address the identified barriers and exploit existing drivers
    corecore