1 research outputs found
The ethical ambiguity of AI data enrichment: Measuring gaps in research ethics norms and practices
The technical progression of artificial intelligence (AI) research has been
built on breakthroughs in fields such as computer science, statistics, and
mathematics. However, in the past decade AI researchers have increasingly
looked to the social sciences, turning to human interactions to solve the
challenges of model development. Paying crowdsourcing workers to generate or
curate data, or data enrichment, has become indispensable for many areas of AI
research, from natural language processing to reinforcement learning from human
feedback (RLHF). Other fields that routinely interact with crowdsourcing
workers, such as Psychology, have developed common governance requirements and
norms to ensure research is undertaken ethically. This study explores how, and
to what extent, comparable research ethics requirements and norms have
developed for AI research and data enrichment. We focus on the approach taken
by two leading conferences: ICLR and NeurIPS, and journal publisher Springer.
In a longitudinal study of accepted papers, and via a comparison with
Psychology and CHI papers, this work finds that leading AI venues have begun to
establish protocols for human data collection, but these are are inconsistently
followed by authors. Whilst Psychology papers engaging with crowdsourcing
workers frequently disclose ethics reviews, payment data, demographic data and
other information, similar disclosures are far less common in leading AI venues
despite similar guidance. The work concludes with hypotheses to explain these
gaps in research ethics practices and considerations for its implications.Comment: 10 page