11 research outputs found

    The History and Risks of Reinforcement Learning and Human Feedback

    Full text link
    Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) easier to use and more effective. A core piece of the RLHF process is the training and utilization of a model of human preferences that acts as a reward function for optimization. This approach, which operates at the intersection of many stakeholders and academic disciplines, remains poorly understood. RLHF reward models are often cited as being central to achieving performance, yet very few descriptors of capabilities, evaluations, training methods, or open-source models exist. Given this lack of information, further study and transparency is needed for learned RLHF reward models. In this paper, we illustrate the complex history of optimizing preferences, and articulate lines of inquiry to understand the sociotechnical context of reward models. In particular, we highlight the ontological differences between costs, rewards, and preferences at stake in RLHF's foundations, related methodological tensions, and possible research directions to improve general understanding of how reward models function.Comment: 14 pages, 3 figure

    The Making of Modern America: Quantifying Chaos

    Get PDF
    As we begin to explore the Gilded Age (1870-1900), that era in American History sandwiched between the Civil War/Reconstruction and the Progressive Era to the Great War, we want students to grasp the enormity of the changes impacting the lives of Americans who have largely been engaged in farming in many cases not so different than their ancestors had for several hundreds of years. Technological changes in the first half of the 19th century contributed to some mechanization and manufacturing, but the enormity of the Civil War and the acquisition of the entire continental territory in the 1850s, accelerated changes in the production of goods, in the development of communication and transportation, in the growth of cities, in the opportunities for immigrants, for participation in politics, and in the reach of the government. In this lesson, students will dip into the many changes over the decades from 1860 to 1900 by searching for information on a variety of topics, including: Banking or Finance, Demographics, Government, Industrialization, Immigration, Middle Class Angst, Military, Natural Resources, Politics, Racism, Robber Barons/Captains of Industry, Technological Innovations, Transportation, Urbanization, Voter Turnout, and Xenophobia.https://repository.stcloudstate.edu/gilded_age/1001/thumbnail.jp

    Reward Reports for Reinforcement Learning

    No full text
    The desire to build good systems in the face of complex societal effects requires a dynamic approach towards equity and access. Recent approaches to machine learning (ML) documentation have demonstrated the promise of discursive frameworks for deliberation about these complexities. However, these developments have been grounded in a static ML paradigm, leaving the role of feedback and post-deployment performance unexamined. Meanwhile, recent work in reinforcement learning design has shown that the effects of optimization objectives on the resultant system behavior can be wide-ranging and unpredictable. In this paper we sketch a framework for documenting deployed learning systems, which we call Reward Reports. Taking inspiration from various contributions to the technical literature on reinforcement learning, we outline Reward Reports as living documents that track updates to design choices and assumptions behind what a particular automated system is optimizing for. They are intended to track dynamic phenomena arising from system deployment, rather than merely static properties of models or data. After presenting the elements of a Reward Report, we provide three examples: DeepMind's MuZero, MovieLens, and a hypothetical deployment of a Project Flow traffic control policy
    corecore