3,802 research outputs found
Reinforcement Learning in the Era of LLMs: What is Essential? What is needed? An RL Perspective on RLHF, Prompting, and Beyond
Recent advancements in Large Language Models (LLMs) have garnered wide
attention and led to successful products such as ChatGPT and GPT-4. Their
proficiency in adhering to instructions and delivering harmless, helpful, and
honest (3H) responses can largely be attributed to the technique of
Reinforcement Learning from Human Feedback (RLHF). In this paper, we aim to
link the research in conventional RL to RL techniques used in LLM research.
Demystify this technique by discussing why, when, and how RL excels.
Furthermore, we explore potential future avenues that could either benefit from
or contribute to RLHF research.
Highlighted Takeaways:
1. RLHF is Online Inverse RL with Offline Demonstration Data.
2. RLHF SFT because Imitation Learning (and Inverse RL) Behavior
Cloning (BC) by alleviating the problem of compounding error.
3. The RM step in RLHF generates a proxy of the expensive human feedback,
such an insight can be generalized to other LLM tasks such as prompting
evaluation and optimization where feedback is also expensive.
4. The policy learning in RLHF is more challenging than conventional problems
studied in IRL due to their high action dimensionality and feedback sparsity.
5. The main superiority of PPO over off-policy value-based methods is its
stability gained from (almost) on-policy data and conservative policy updates
Regimes and Resilience in the Modern Global Food System
Much public discourse surrounding the modern global food system operates on the assumption of the primary agency of individual consumers in ensuring an equitable and sustainable food supply. However, this approach fails to account for the larger structural forces of the system which frame the limits of how we interact with and are affected by our food system. Taking a closer look at the global economic, political, cultural, and environmental forces that have collectively shaped historical food regimes reveals the deeper structural patterns that currently determine how we produce, distribute, and consume food around the world. Due to the underlying structural processes of increasing distancing and standardization, we have become highly disembedded from our food system and will need to look for clues from past periods of transition between food regimes to better position ourselves to work towards a global restructuring of, and human reembedding in, the modern global food system
Spectral Flow in 3D Flat Spacetimes
In this paper we investigate spectral flow symmetry in asymptotically flat
spacetimes both from a gravity as well as a putative dual quantum field theory
perspective. On the gravity side we consider models in Einstein gravity and
supergravity as well as their "reloaded" versions, present suitable boundary
conditions, determine the respective asymptotic symmetry algebras and the
thermal entropy of cosmological solutions in each of these models. On the
quantum field theory side we identify the spectral flow symmetry as
automorphisms of the underlying symmetry algebra of the theory. Using spectral
flow invariance we then determine the thermal entropy of these quantum field
theories and find perfect agreement with the results from the gravity side. In
addition we determine logarithmic corrections to the thermal entropy.Comment: 42 pages; v2: added minor clarifications, matches published versio
An Empowerment-based Solution to Robotic Manipulation Tasks with Sparse Rewards
In order to provide adaptive and user-friendly solutions to robotic
manipulation, it is important that the agent can learn to accomplish tasks even
if they are only provided with very sparse instruction signals. To address the
issues reinforcement learning algorithms face when task rewards are sparse,
this paper proposes an intrinsic motivation approach that can be easily
integrated into any standard reinforcement learning algorithm and can allow
robotic manipulators to learn useful manipulation skills with only sparse
extrinsic rewards. Through integrating and balancing empowerment and curiosity,
this approach shows superior performance compared to other state-of-the-art
intrinsic exploration approaches during extensive empirical testing.
Qualitative analysis also shows that when combined with diversity-driven
intrinsic motivations, this approach can help manipulators learn a set of
diverse skills which could potentially be applied to other more complicated
manipulation tasks and accelerate their learning process
Revisiting unresolved questions: land, food and agriculture
This article explores three articles from the perspective of 2011. They are Makhosazane Gcabashe and Alan Mabin’s ‘Preparing to negotiate
the land question’ (Transformation 11), Tom Bennett’s ‘Human rights and
the African cultural tradition’ (Transformation 22) and Henry Bernstein’s
‘Food security in a democratic South Africa’ (Transformation 24).
The author focuses on four themes: the politics of negotiations; the location of ‘rights’
in land and to custom; the political economy of agrarian change; and the
multiple facets of the ‘land question’. In conclusion, it draws attention to
enduring questions about how to confront agrarian dualism, dynamics of
changing and deepening inequality in the countryside, tensions between
the logic underpinning land and agricultural policies, and the need to recast
agrarian change in a wider frame, in recognition of the profound ways in
which what happens in South Africa’s rural areas are part of regional and
global dynamics.International Bibliography of Social Science
- …