3,802 research outputs found

    Reinforcement Learning in the Era of LLMs: What is Essential? What is needed? An RL Perspective on RLHF, Prompting, and Beyond

    Full text link
    Recent advancements in Large Language Models (LLMs) have garnered wide attention and led to successful products such as ChatGPT and GPT-4. Their proficiency in adhering to instructions and delivering harmless, helpful, and honest (3H) responses can largely be attributed to the technique of Reinforcement Learning from Human Feedback (RLHF). In this paper, we aim to link the research in conventional RL to RL techniques used in LLM research. Demystify this technique by discussing why, when, and how RL excels. Furthermore, we explore potential future avenues that could either benefit from or contribute to RLHF research. Highlighted Takeaways: 1. RLHF is Online Inverse RL with Offline Demonstration Data. 2. RLHF >> SFT because Imitation Learning (and Inverse RL) >> Behavior Cloning (BC) by alleviating the problem of compounding error. 3. The RM step in RLHF generates a proxy of the expensive human feedback, such an insight can be generalized to other LLM tasks such as prompting evaluation and optimization where feedback is also expensive. 4. The policy learning in RLHF is more challenging than conventional problems studied in IRL due to their high action dimensionality and feedback sparsity. 5. The main superiority of PPO over off-policy value-based methods is its stability gained from (almost) on-policy data and conservative policy updates

    Regimes and Resilience in the Modern Global Food System

    Full text link
    Much public discourse surrounding the modern global food system operates on the assumption of the primary agency of individual consumers in ensuring an equitable and sustainable food supply. However, this approach fails to account for the larger structural forces of the system which frame the limits of how we interact with and are affected by our food system. Taking a closer look at the global economic, political, cultural, and environmental forces that have collectively shaped historical food regimes reveals the deeper structural patterns that currently determine how we produce, distribute, and consume food around the world. Due to the underlying structural processes of increasing distancing and standardization, we have become highly disembedded from our food system and will need to look for clues from past periods of transition between food regimes to better position ourselves to work towards a global restructuring of, and human reembedding in, the modern global food system

    Spectral Flow in 3D Flat Spacetimes

    Full text link
    In this paper we investigate spectral flow symmetry in asymptotically flat spacetimes both from a gravity as well as a putative dual quantum field theory perspective. On the gravity side we consider models in Einstein gravity and supergravity as well as their "reloaded" versions, present suitable boundary conditions, determine the respective asymptotic symmetry algebras and the thermal entropy of cosmological solutions in each of these models. On the quantum field theory side we identify the spectral flow symmetry as automorphisms of the underlying symmetry algebra of the theory. Using spectral flow invariance we then determine the thermal entropy of these quantum field theories and find perfect agreement with the results from the gravity side. In addition we determine logarithmic corrections to the thermal entropy.Comment: 42 pages; v2: added minor clarifications, matches published versio

    An Empowerment-based Solution to Robotic Manipulation Tasks with Sparse Rewards

    Full text link
    In order to provide adaptive and user-friendly solutions to robotic manipulation, it is important that the agent can learn to accomplish tasks even if they are only provided with very sparse instruction signals. To address the issues reinforcement learning algorithms face when task rewards are sparse, this paper proposes an intrinsic motivation approach that can be easily integrated into any standard reinforcement learning algorithm and can allow robotic manipulators to learn useful manipulation skills with only sparse extrinsic rewards. Through integrating and balancing empowerment and curiosity, this approach shows superior performance compared to other state-of-the-art intrinsic exploration approaches during extensive empirical testing. Qualitative analysis also shows that when combined with diversity-driven intrinsic motivations, this approach can help manipulators learn a set of diverse skills which could potentially be applied to other more complicated manipulation tasks and accelerate their learning process

    Revisiting unresolved questions: land, food and agriculture

    Get PDF
    This article explores three articles from the perspective of 2011. They are Makhosazane Gcabashe and Alan Mabin’s ‘Preparing to negotiate the land question’ (Transformation 11), Tom Bennett’s ‘Human rights and the African cultural tradition’ (Transformation 22) and Henry Bernstein’s ‘Food security in a democratic South Africa’ (Transformation 24). The author focuses on four themes: the politics of negotiations; the location of ‘rights’ in land and to custom; the political economy of agrarian change; and the multiple facets of the ‘land question’. In conclusion, it draws attention to enduring questions about how to confront agrarian dualism, dynamics of changing and deepening inequality in the countryside, tensions between the logic underpinning land and agricultural policies, and the need to recast agrarian change in a wider frame, in recognition of the profound ways in which what happens in South Africa’s rural areas are part of regional and global dynamics.International Bibliography of Social Science
    • …
    corecore