Search CORE

11 research outputs found

The History and Risks of Reinforcement Learning and Human Feedback

Author: Gilbert Thomas Krendl
Lambert Nathan
Zick Tom
Publication venue
Publication date: 28/11/2023
Field of study

Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) easier to use and more effective. A core piece of the RLHF process is the training and utilization of a model of human preferences that acts as a reward function for optimization. This approach, which operates at the intersection of many stakeholders and academic disciplines, remains poorly understood. RLHF reward models are often cited as being central to achieving performance, yet very few descriptors of capabilities, evaluations, training methods, or open-source models exist. Given this lack of information, further study and transparency is needed for learned RLHF reward models. In this paper, we illustrate the complex history of optimizing preferences, and articulate lines of inquiry to understand the sociotechnical context of reward models. In particular, we highlight the ontological differences between costs, rewards, and preferences at stake in RLHF's foundations, related methodological tensions, and possible research directions to improve general understanding of how reward models function.Comment: 14 pages, 3 figure

arXiv.org e-Print Archive

Recommended from our members

Food Support Networks and their Relationship to Food Insecurity in Colorado Counties

Author: Zick-smith Nathan
Publication venue: CU Scholar
Publication date: 01/01/2015
Field of study

Food insecurity has reemerged as a significant social problem in the United States, despite the fact that we produce more than enough food as a nation to feed all of our citizens. Since the economic recession in 2007, food insecurity has increased, and in recent years has remained at 14.3%. Many strategies have been adopted to address food insecurity in the U.S., some of which are sponsored by the federal government, such as SNAP and the School Lunch Program, while others are donation-driven non-profit organizations, such as food pantries. While there are a number of food support networks that have been established with the intent of decreasing food insecurity, there are still gaps in the food system in which food is wasted and people are hungry. This study explores contemporary food insecurity within Colorado counties, specifically the effectiveness of existing food support networks, the drivers of food insecurity (aside from the factors that are used to calculate the county food insecurity rate), and how effective two local non-profit organizations, Boulder Food Rescue and Denver Food Rescue, have been at addressing hunger and food insecurity in the communities in which they operate. In this study I used both quantitative analyses of Colorado counties as well as qualitative interviews with key players addressing food insecurity. Results demonstrated that the number of food pantries in a county and the presence of a food rescue organization are both positively related to the county\u27s food insecurity rate. As the literature suggests, this indicates that food pantries and food rescue organizations are more likely to locate in areas of high food insecurity. The most statistically significant drivers of food insecurity are the percentage of individuals with a high school diploma (the higher the percentage, the lower the rate of food insecurity) and the number of individuals where English is not their first language (the higher the percentage, the higher the rate of food insecurity). Lastly, both Boulder and Denver Food Rescue have filled an interesting gap in the food system, as both organizations are helping supplement a growing trend of providing fresh and nutritious fruits and vegetables to food-insecure individuals. Furthermore, both non-profits have succeeded at reaching several key traditionally unreachable food insecure populations, such as the elderly and people for whom English is a second language

CU Scholar Institutional Repository

The Making of Modern America: Quantifying Chaos

Author: Evensen David
Glade Mary E.
Koenig Dylan
Lee-Benton Olivia
Nelson Cassandra
Peterson Kayla
Pulkrabek Payton
Szymanski Nickolas
Voigt Alex
Zick Nathan
Publication venue: The Repository at St. Cloud State
Publication date: 08/01/2016
Field of study

As we begin to explore the Gilded Age (1870-1900), that era in American History sandwiched between the Civil War/Reconstruction and the Progressive Era to the Great War, we want students to grasp the enormity of the changes impacting the lives of Americans who have largely been engaged in farming in many cases not so different than their ancestors had for several hundreds of years. Technological changes in the first half of the 19th century contributed to some mechanization and manufacturing, but the enormity of the Civil War and the acquisition of the entire continental territory in the 1850s, accelerated changes in the production of goods, in the development of communication and transportation, in the growth of cities, in the opportunities for immigrants, for participation in politics, and in the reach of the government. In this lesson, students will dip into the many changes over the decades from 1860 to 1900 by searching for information on a variety of topics, including: Banking or Finance, Demographics, Government, Industrialization, Immigration, Middle Class Angst, Military, Natural Resources, Politics, Racism, Robber Barons/Captains of Industry, Technological Innovations, Transportation, Urbanization, Voter Turnout, and Xenophobia.https://repository.stcloudstate.edu/gilded_age/1001/thumbnail.jp

St. Cloud State University

Comparison of glycaemic control in patients with Type 2 diabetes on basal insulin and fixed combination oral antidiabetic treatment: results of a pilot study

Author: A desktop guide to type 2 diabetes mellitus
A. Moretti
CD Saudek
DCCT/EDIC
DE Goldstein
DM Nathan
DM Nathan
FJ Service
G. De Mattia
IB Hirsch
J Rosenstock
JA Davidson
JM Chehade
L Monnier
L Monnier
M Lepore
MC Riddle
O. Laurenti
P Gaede
P Mullins
R Zick
SD Luzio
SF Praet
XL Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Reward Reports for Reinforcement Learning

Author: Dean Sarah
Gilbert Thomas
Lambert Nathan
Snoswell Aaron
Zick Tom
Publication venue: Cornell University Library / arXiv
Publication date: 22/04/2022
Field of study

The desire to build good systems in the face of complex societal effects requires a dynamic approach towards equity and access. Recent approaches to machine learning (ML) documentation have demonstrated the promise of discursive frameworks for deliberation about these complexities. However, these developments have been grounded in a static ML paradigm, leaving the role of feedback and post-deployment performance unexamined. Meanwhile, recent work in reinforcement learning design has shown that the effects of optimization objectives on the resultant system behavior can be wide-ranging and unpredictable. In this paper we sketch a framework for documenting deployed learning systems, which we call Reward Reports. Taking inspiration from various contributions to the technical literature on reinforcement learning, we outline Reward Reports as living documents that track updates to design choices and assumptions behind what a particular automated system is optimizing for. They are intended to track dynamic phenomena arising from system deployment, rather than merely static properties of models or data. After presenting the elements of a Reward Report, we provide three examples: DeepMind's MuZero, MovieLens, and a hypothetical deployment of a Project Flow traffic control policy

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive