6 research outputs found
Worker Retention, Response Quality, and Diversity in Microtask Crowdsourcing: An Experimental Investigation of the Potential for Priming Effects to Promote Project Goals
Online microtask crowdsourcing platforms act as efficient resources for delegating small units of work, gathering data, generating ideas, and more. Members of research and business communities have incorporated crowdsourcing into problem-solving processes. When human workers contribute to a crowdsourcing task, they are subject to various stimuli as a result of task design. Inter-task priming effects - through which work is nonconsciously, yet significantly, influenced by exposure to certain stimuli - have been shown to affect microtask crowdsourcing responses in a variety of ways. Instead of simply being wary of the potential for priming effects to skew results, task administrators can utilize proven priming procedures in order to promote project goals. In a series of three experiments conducted on Amazon’s Mechanical Turk, we investigated the effects of proposed priming treatments on worker retention, response quality, and response diversity. In our first two experiments, we studied the effect of initial response freedom on sustained worker participation and response quality. We expected that workers who were granted greater levels of freedom in an initial response would be stimulated to complete more work and deliver higher quality work than workers originally constrained in their initial response possibilities. We found no significant relationship between the initial response freedom granted to workers and the amount of optional work they completed. The degree of initial response freedom also did not have a significant impact on subsequent response quality. However, the influence of inter-task effects were evident based on response tendencies for different question types. We found evidence that consistency in task structure may play a stronger role in promoting response quality than proposed priming procedures. In our final experiment, we studied the influence of a group-level priming treatment on response diversity. Instead of varying task structure for different workers, we varied the degree of overlap in question content distributed to different workers in a group. We expected groups of workers that were exposed to more diverse preliminary question sets to offer greater diversity in response to a subsequent question. Although differences in response diversity were revealed, no consistent trend between question content overlap and response diversity prevailed. Nevertheless, combining consistent task structure with crowd-level priming procedures - to encourage diversity in inter-task effects across the crowd - offers an exciting path for future study
The quality of data collected online: An investigation of careless responding in a crowdsourced sample
Despite recent concerns about data quality, various academic fields rely increasingly on crowdsourced samples. Thus, the goal of this study was to systematically assess carelessness in a crowdsourced sample (N = 394) by applying various measures and detection methods. A Latent Profile Analysis revealed that 45.9% of the participants showed some form of careless behavior. Excluding these participants increased the effect size in an experiment included in the survey. Based on our findings, several recommendations of easy to apply measures for assessing data quality are given
Quality Control in Crowdsourcing: A Survey of Quality Attributes, Assessment Techniques and Assurance Actions
Crowdsourcing enables one to leverage on the intelligence and wisdom of
potentially large groups of individuals toward solving problems. Common
problems approached with crowdsourcing are labeling images, translating or
transcribing text, providing opinions or ideas, and similar - all tasks that
computers are not good at or where they may even fail altogether. The
introduction of humans into computations and/or everyday work, however, also
poses critical, novel challenges in terms of quality control, as the crowd is
typically composed of people with unknown and very diverse abilities, skills,
interests, personal objectives and technological resources. This survey studies
quality in the context of crowdsourcing along several dimensions, so as to
define and characterize it and to understand the current state of the art.
Specifically, this survey derives a quality model for crowdsourcing tasks,
identifies the methods and techniques that can be used to assess the attributes
of the model, and the actions and strategies that help prevent and mitigate
quality problems. An analysis of how these features are supported by the state
of the art further identifies open issues and informs an outlook on hot future
research directions.Comment: 40 pages main paper, 5 pages appendi
Methods for detecting and mitigating linguistic bias in text corpora
Im Zuge der fortschreitenden Ausbreitung des Webs in alle Aspekte des täglichen
Lebens wird Bias in Form von Voreingenommenheit und versteckten Meinungen zu einem
zunehmend herausfordernden Problem. Eine weitverbreitete Erscheinungsform ist Bias in
Textdaten. Um dem entgegenzuwirken hat die Online-Enzyklopädie Wikipedia das Prinzip
des neutralen Standpunkts (Englisch: Neutral Point of View, kurz: NPOV) eingefĂĽhrt,
welcher die Verwendung neutraler Sprache und die Vermeidung von einseitigen oder subjektiven
Formulierungen vorschreibt. Während Studien gezeigt haben, dass die Qualität von
Wikipedia-Artikel mit der Qualität von Artikeln in klassischen Enzyklopädien vergleichbar
ist, zeigt die Forschung gleichzeitig auch, dass Wikipedia anfällig für verschiedene Typen
von NPOV-Verletzungen ist. Bias zu identifizieren, kann eine herausfordernde Aufgabe sein,
sogar fĂĽr Menschen, und mit Millionen von Artikeln und einer zurĂĽckgehenden Anzahl von
Mitwirkenden wird diese Aufgabe zunehmend schwieriger. Wenn Bias nicht eingedämmt
wird, kann dies nicht nur zu Polarisierungen und Konflikten zwischen Meinungsgruppen
fĂĽhren, sondern Nutzer auch negativ in ihrer freien Meinungsbildung beeinflussen. Hinzu
kommt, dass sich Bias in Texten und in Ground-Truth-Daten negativ auf Machine Learning
Modelle, die auf diesen Daten trainiert werden, auswirken kann, was zu diskriminierendem
Verhalten von Modellen fĂĽhren kann.
In dieser Arbeit beschäftigen wir uns mit Bias, indem wir uns auf drei zentrale Aspekte
konzentrieren: Bias-Inhalte in Form von geschriebenen Aussagen, Bias von Crowdworkern
während des Annotierens von Daten und Bias in Word Embeddings Repräsentationen.
Wir stellen zwei Ansätze für die Identifizierung von Aussagen mit Bias in Textsammlungen
wie Wikipedia vor. Unser auf Features basierender Ansatz verwendet Bag-of-Word
Features inklusive einer Liste von Bias-Wörtern, die wir durch das Identifizieren von Clustern
von Bias-Wörtern im Vektorraum von Word Embeddings zusammengestellt haben.
Unser verbesserter, neuronaler Ansatz verwendet Gated Recurrent Neural Networks, um
Kontext-Abhängigkeiten zu erfassen und die Performance des Modells weiter zu verbessern.
Unsere Studie zum Thema Crowd Worker Bias deckt Bias-Verhalten von Crowdworkern
mit extremen Meinungen zu einem bestimmten Thema auf und zeigt, dass dieses Verhalten
die entstehenden Ground-Truth-Label beeinflusst, was wiederum Einfluss auf die Erstellung
von Datensätzen für Aufgaben wie Bias Identifizierung oder Sentiment Analysis hat. Wir
stellen Ansätze für die Abschwächung von Worker Bias vor, die Bewusstsein unter den
Workern erzeugen und das Konzept der sozialen Projektion verwenden.
Schließlich beschäftigen wir uns mit dem Problem von Bias in Word Embeddings,
indem wir uns auf das Beispiel von variierenden Sentiment-Scores fĂĽr Namen konzentrieren.
Wir zeigen, dass Bias in den Trainingsdaten von den Embeddings erfasst und an
nachgelagerte Modelle weitergegeben wird. In diesem Zusammenhang stellen wir einen
Debiasing-Ansatz vor, der den Bias-Effekt reduziert und sich positiv auf die produzierten
Label eines nachgeschalteten Sentiment Classifiers auswirkt
Novel Methods for Designing Tasks in Crowdsourcing
Crowdsourcing is becoming more popular as a means for scalable data processing that requires human intelligence. The involvement of groups of people to accomplish tasks could be an effective success factor for data-driven businesses. Unlike in other technical systems, the quality of the results depends on human factors and how well crowd workers understand the requirements of the task, to produce high-quality results. Looking at previous studies in this area, we found that one of the main factors that affect workers’ performance is the design of the crowdsourcing tasks. Previous studies of crowdsourcing task design covered a limited set of factors. The main contribution of this research is the focus on some of the less-studied technical factors, such as examining the effect of task ordering and class balance and measuring the consistency of the same task design over time and on different crowdsourcing platforms. Furthermore, this study ambitiously extends work towards understanding workers’ point of view in terms of the quality of the task and the payment aspect by performing a qualitative study with crowd workers and shedding light on some of the ethical issues around payments for crowdsourcing tasks. To achieve our goal, we performed several crowdsourcing experiments on specific platforms and measured the factors that influenced the quality of the overall result
Human-Centered Machine Learning: Algorithm Design and Human Behavior
Machine learning is increasingly engaged in a large number of important daily decisions and has great potential to reshape various sectors of our modern society. To fully realize this potential, it is important to understand the role that humans play in the design of machine learning algorithms and investigate the impacts of the algorithm on humans.
Towards the understanding of such interactions between humans and algorithms, this dissertation takes a human-centric perspective and focuses on investigating the interplay between human behavior and algorithm design. Accounting for the roles of humans in algorithm design creates unique challenges. For example, humans might be strategic or exhibit behavioral biases when generating data or responding to algorithms, violating the standard independence assumption in algorithm design. How do we design algorithms that take such human behavior into account? Moreover, humans possess various ethical values, e.g., humans want to be treated fairly and care about privacy. How do we design algorithms that align with human values? My dissertation addresses these challenges by combining both theoretical and empirical approaches. From the theoretical perspective, we explore how to design algorithms that account for human behavior and respect human values. In particular, we formulate models of human behavior in the data generation process and design algorithms that can leverage data with human biases. Moreover, we investigate the long-term impacts of algorithm decisions and design algorithms that mitigate the reinforcement of existing inequalities. From the empirical perspective, we have conducted behavioral experiments to understand human behavior in the context of data generation and information design. We have further developed more realistic human models based on empirical data and studied the algorithm design building on the updated behavior models