137 research outputs found

    The Importance of Worker Reputation Information in Microtask-Based Crowd Work Systems

    Get PDF
    This paper presents the first systematic investigation of the potential performance gains for crowd work systems, deriving from available information at the requester about individual worker reputation. In particular, we first formalize the optimal task assignment problem when workers’ reputation estimates are available, as the maximization of a monotone (sub-modular) function subject to Matroid constraints. Then, being the optimal problem NP-hard, we propose a simple but efficient greedy heuristic task allocation algorithm. We also propose a simple “maximum a-posteriori” decision rule and a decision algorithm based on message passing. Finally, we test and compare different solutions, showing that system performance can greatly benefit from information about workers’ reputation. Our main findings are that: i) even largely inaccurate estimates of workers’ reputation can be effectively exploited in the task assignment to greatly improve system performance; ii) the performance of the maximum a-posteriori decision rule quickly degrades as worker reputation estimates become inaccurate; iii) when workers’ reputation estimates are significantly inaccurate, the best performance can be obtained by combining our proposed task assignment algorithm with the message-passing decision algorithm

    TurkScanner: Predicting the Hourly Wage of Microtasks

    Full text link
    Workers in crowd markets struggle to earn a living. One reason for this is that it is difficult for workers to accurately gauge the hourly wages of microtasks, and they consequently end up performing labor with little pay. In general, workers are provided with little information about tasks, and are left to rely on noisy signals, such as textual description of the task or rating of the requester. This study explores various computational methods for predicting the working times (and thus hourly wages) required for tasks based on data collected from other workers completing crowd work. We provide the following contributions. (i) A data collection method for gathering real-world training data on crowd-work tasks and the times required for workers to complete them; (ii) TurkScanner: a machine learning approach that predicts the necessary working time to complete a task (and can thus implicitly provide the expected hourly wage). We collected 9,155 data records using a web browser extension installed by 84 Amazon Mechanical Turk workers, and explored the challenge of accurately recording working times both automatically and by asking workers. TurkScanner was created using ~150 derived features, and was able to predict the hourly wages of 69.6% of all the tested microtasks within a 75% error. Directions for future research include observing the effects of tools on people's working practices, adapting this approach to a requester tool for better price setting, and predicting other elements of work (e.g., the acceptance likelihood and worker task preferences.)Comment: Proceedings of the 28th International Conference on World Wide Web (WWW '19), San Francisco, CA, USA, May 13-17, 201

    It's getting crowded! : improving the effectiveness of microtask crowdsourcing

    Get PDF
    [no abstract

    On Classification in Human-driven and Data-driven Systems

    Get PDF
    Classification systems are ubiquitous, and the design of effective classification algorithms has been an even more active area of research since the emergence of machine learning techniques. Despite the significant efforts devoted to training and feature selection in classification systems, misclassifications do occur and their effects can be critical in various applications. The central goal of this thesis is to analyze classification problems in human-driven and data-driven systems, with potentially unreliable components and design effective strategies to ensure reliable and effective classification algorithms in such systems. The components/agents in the system can be machines and/or humans. The system components can be unreliable due to a variety of reasons such as faulty machines, security attacks causing machines to send falsified information, unskilled human workers sending imperfect information, or human workers providing random responses. This thesis first quantifies the effect of such unreliable agents on the classification performance of the systems and then designs schemes that mitigate misclassifications and their effects by adapting the behavior of the classifier on samples from machines and/or humans and ensure an effective and reliable overall classification. In the first part of this thesis, we study the case when only humans are present in the systems, and consider crowdsourcing systems. Human workers in crowdsourcing systems observe the data and respond individually by providing label related information to a fusion center in a distributed manner. In such systems, we consider the presence of unskilled human workers where they have a reject option so that they may choose not to provide information regarding the label of the data. To maximize the classification performance at the fusion center, an optimal aggregation rule is proposed to fuse the human workers\u27 responses in a weighted majority voting manner. Next, the presence of unreliable human workers, referred to as spammers, is considered. Spammers are human workers that provide random guesses regarding the data label information to the fusion center in crowdsourcing systems. The effect of spammers on the overall classification performance is characterized when the spammers can strategically respond to maximize their reward in reward-based crowdsourcing systems. For such systems, an optimal aggregation rule is proposed by adapting the classifier based on the responses from the workers. The next line of human-driven classification is considered in the context of social networks. The classification problem is studied to classify a human whether he/she is influential or not in propagating information in social networks. Since the knowledge of social network structures is not always available, the influential agent classification problem without knowing the social network structure is studied. A multi-task low rank linear influence model is proposed to exploit the relationships between different information topics. The proposed approach can simultaneously predict the volume of information diffusion for each topic and automatically classify the influential nodes for each topic. In the third part of the thesis, a data-driven decentralized classification framework is developed where machines interact with each other to perform complex classification tasks. However, the machines in the system can be unreliable due to a variety of reasons such as noise, faults and attacks. Providing erroneous updates leads the classification process in a wrong direction, and degrades the performance of decentralized classification algorithms. First, the effect of erroneous updates on the convergence of the classification algorithm is analyzed, and it is shown that the algorithm linearly converges to a neighborhood of the optimal classification solution. Next, guidelines are provided for network design to achieve faster convergence. Finally, to mitigate the impact of unreliable machines, a robust variant of ADMM is proposed, and its resilience to unreliable machines is shown with an exact convergence to the optimal classification result. The final part of research in this thesis considers machine-only data-driven classification problems. First, the fundamentals of classification are studied in an information theoretic framework. We investigate the nonparametric classification problem for arbitrary unknown composite distributions in the asymptotic regime where both the sample size and the number of classes grow exponentially large. The notion of discrimination capacity is introduced, which captures the largest exponential growth rate of the number of classes relative to the samples size so that there exists a test with asymptotically vanishing probability of error. Error exponent analysis using the maximum mean discrepancy is provided and the discrimination rate, i.e., lower bound on the discrimination capacity is characterized. Furthermore, an upper bound on the discrimination capacity based on Fano\u27s inequality is developed

    Varieties of platform work: Platforms and social inequality in Germany and the United States

    Get PDF
    The platform economy has been criticized for exacerbating social inequalities in various ways. This study draws on these discussions and examines the extent to which social inequalities are being reproduced, reduced, or even increased within platform work. The first central question is that of the precariousness of this form of work and the vulnerability of the platform workers as a group. This is followed by a second question about the role of classical dimensions of inequality of education and gender within the group of platform workers. The study focuses on inequalities related to income, workload, and the subjective perception of platform work. It follows a comparative approach, building on institutionalist analyses developed in labor market and inequality research. The empirical analysis is based on case studies of 15 crowdwork platforms in the United States and Germany and on an online survey of crowdworkers in both countries. While platforms represent a global organizational model, they are embedded in different models of capitalism. The study shows that existing labor market segmentation and social welfare systems determine who works on platforms and to what extent. The weaker the social safety net, the more likely platform work is to be both a curse and a blessing: It offers a much needed and flexible source of income, albeit under extremely precarious conditions. The stronger the social safety net, on the other hand, the greater the market power of workers vis-Ă -vis the platforms

    In What Mood Are You Today?

    Get PDF
    The mood of individuals in the workplace has been well-studied due to its influence on task performance, and work engagement. However, the effect of mood has not been studied in detail in the context of microtask crowdsourcing. In this paper, we investigate the influence of one's mood, a fundamental psychosomatic dimension of a worker's behaviour, on their interaction with tasks, task performance and perceived engagement. To this end, we conducted two comprehensive studies; (i) a survey exploring the perception of crowd workers regarding the role of mood in shaping their work, and (ii) an experimental study to measure and analyze the actual impact of workers' moods in information findings microtasks. We found evidence of the impact of mood on a worker's perceived engagement through the feeling of reward or accomplishment, and we argue as to why the same impact is not perceived in the evaluation of task performance. Our findings have broad implications on the design and workflow of crowdsourcing systems
    • …
    corecore