806 research outputs found

    Partnering People with Deep Learning Systems: Human Cognitive Effects of Explanations

    Get PDF
    Advances in “deep learning” algorithms have led to intelligent systems that provide automated classifications of unstructured data. Until recently these systems could not provide the reasons behind a classification. This lack of “explainability” has led to resistance in applying these systems in some contexts. An intensive research and development effort to make such systems more transparent and interpretable has proposed and developed multiple types of explanation to address this challenge. Relatively little research has been conducted into how humans process these explanations. Theories and measures from areas of research in social cognition were selected to evaluate attribution of mental processes from intentional systems theory, measures of working memory demands from cognitive load theory, and self-efficacy from social cognition theory. Crowdsourced natural disaster damage assessment of aerial images was employed using a written assessment guideline as the task. The “Wizard of Oz” method was used to generate the damage assessment output of a simulated agent. The output and explanations contained errors consistent with transferring a deep learning system to a new disaster event. A between-subjects experiment was conducted where three types of natural language explanations were manipulated between conditions. Counterfactual explanations increased intrinsic cognitive load and made participants more aware of the challenges of the task. Explanations that described boundary conditions and failure modes (“hedging explanations”) decreased agreement with erroneous agent ratings without a detectable effect on cognitive load. However, these effects were not large enough to counteract decreases in self-efficacy and increases in erroneous agreement as a result of providing a causal explanation. The extraneous cognitive load generated by explanations had the strongest influence on self-efficacy in the task. Presenting all of the explanation types at the same time maximized cognitive load and agreement with erroneous simulated output. Perceived interdependence with the simulated agent was also associated with increases in self-efficacy; however, trust in the agent was not associated with differences in self-efficacy. These findings identify effects related to research areas which have developed methods to design tasks that may increase the effectiveness of explanations

    It's getting crowded! : improving the effectiveness of microtask crowdsourcing

    Get PDF
    [no abstract

    Crowdsourcing Relevance: Two Studies on Assessment

    Get PDF
    Crowdsourcing has become an alternative approach to collect relevance judgments at large scale. In this thesis, we focus on some specific aspects related to time, scale, and agreement. First, we address the issue of the time factor in gathering relevance label: we study how much time the judges need to assess documents. We conduct a series of four experiments which unexpectedly reveal us how introducing time limitations leads to benefits in terms of the quality of the results. Furthermore, we discuss strategies aimed to determine the right amount of time to make available to the workers for the relevance assessment, in order to both guarantee the high quality of the gathered results and the saving of the valuable resources of time and money. Then we explore the application of magnitude estimation, a psychophysical scaling technique for the measurement of sensation, for relevance assessment. We conduct a large-scale user study across 18 TREC topics, collecting more than 50,000 magnitude estimation judgments, which result to be overall rank-aligned with ordinal judgments made by expert relevance assessors. We discuss the benefits, the reliability of the judgements collected, and the competitiveness in terms of assessor cost. We also report some preliminary results on the agreement among judges. Often, the results of crowdsourcing experiments are affected by noise, that can be ascribed to lack of agreement among workers. This aspect should be considered as it can affect the reliability of the gathered relevance labels, as well as the overall repeatability of the experiments.openDottorato di ricerca in Informatica e scienze matematiche e fisicheopenMaddalena, Edd

    Spam elimination and bias correction : ensuring label quality in crowdsourced tasks.

    Get PDF
    Crowdsourcing is proposed as a powerful mechanism for accomplishing large scale tasks via anonymous workers online. It has been demonstrated as an effective and important approach for collecting labeled data in application domains which require human intelligence, such as image labeling, video annotation, natural language processing, etc. Despite the promises, one big challenge still exists in crowdsourcing systems: the difficulty of controlling the quality of crowds. The workers usually have diverse education levels, personal preferences, and motivations, leading to unknown work performance while completing a crowdsourced task. Among them, some are reliable, and some might provide noisy feedback. It is intrinsic to apply worker filtering approach to crowdsourcing applications, which recognizes and tackles noisy workers, in order to obtain high-quality labels. The presented work in this dissertation provides discussions in this area of research, and proposes efficient probabilistic based worker filtering models to distinguish varied types of poor quality workers. Most of the existing work in literature in the field of worker filtering either only concentrates on binary labeling tasks, or fails to separate the low quality workers whose label errors can be corrected from the other spam workers (with label errors which cannot be corrected). As such, we first propose a Spam Removing and De-biasing Framework (SRDF), to deal with the worker filtering procedure in labeling tasks with numerical label scales. The developed framework can detect spam workers and biased workers separately. The biased workers are defined as those who show tendencies of providing higher (or lower) labels than truths, and their errors are able to be corrected. To tackle the biasing problem, an iterative bias detection approach is introduced to recognize the biased workers. The spam filtering algorithm proposes to eliminate three types of spam workers, including random spammers who provide random labels, uniform spammers who give same labels for most of the items, and sloppy workers who offer low accuracy labels. Integrating the spam filtering and bias detection approaches into aggregating algorithms, which infer truths from labels obtained from crowds, can lead to high quality consensus results. The common characteristic of random spammers and uniform spammers is that they provide useless feedback without making efforts for a labeling task. Thus, it is not necessary to distinguish them separately. In addition, the removal of sloppy workers has great impact on the detection of biased workers, with the SRDF framework. To combat these problems, a different way of worker classification is presented in this dissertation. In particular, the biased workers are classified as a subcategory of sloppy workers. Finally, an ITerative Self Correcting - Truth Discovery (ITSC-TD) framework is then proposed, which can reliably recognize biased workers in ordinal labeling tasks, based on a probabilistic based bias detection model. ITSC-TD estimates true labels through applying an optimization based truth discovery method, which minimizes overall label errors by assigning different weights to workers. The typical tasks posted on popular crowdsourcing platforms, such as MTurk, are simple tasks, which are low in complexity, independent, and require little time to complete. Complex tasks, however, in many cases require the crowd workers to possess specialized skills in task domains. As a result, this type of task is more inclined to have the problem of poor quality of feedback from crowds, compared to simple tasks. As such, we propose a multiple views approach, for the purpose of obtaining high quality consensus labels in complex labeling tasks. In this approach, each view is defined as a labeling critique or rubric, which aims to guide the workers to become aware of the desirable work characteristics or goals. Combining the view labels results in the overall estimated labels for each item. The multiple views approach is developed under the hypothesis that workers\u27 performance might differ from one view to another. Varied weights are then assigned to different views for each worker. Additionally, the ITSC-TD framework is integrated into the multiple views model to achieve high quality estimated truths for each view. Next, we propose a Semi-supervised Worker Filtering (SWF) model to eliminate spam workers, who assign random labels for each item. The SWF approach conducts worker filtering with a limited set of gold truths available as priori. Each worker is associated with a spammer score, which is estimated via the developed semi-supervised model, and low quality workers are efficiently detected by comparing the spammer score with a predefined threshold value. The efficiency of all the developed frameworks and models are demonstrated on simulated and real-world data sets. By comparing the proposed frameworks to a set of state-of-art methodologies, such as expectation maximization based aggregating algorithm, GLAD and optimization based truth discovery approach, in the domain of crowdsourcing, up to 28.0% improvement can be obtained for the accuracy of true label estimation

    First experiences with a novel farmer citizen science approach: crowdsourcing participatory variety selection through on-farm triadic comparisons of technologies (TRICOT)

    Get PDF
    Rapid climatic and socio-economic changes challenge current agricultural R&D capacity. The necessary quantum leap in knowledge generation should build on the innovation capacity of farmers themselves. A novel citizen science methodology, triadic comparisons of technologies or tricot, was implemented in pilot studies in India, East Africa, and Central America. The methodology involves distributing a pool of agricultural technologies in different combinations of three to individual farmers who observe these technologies under farm conditions and compare their performance. Since the combinations of three technologies overlap, statistical methods can piece together the overall performance ranking of the complete pool of technologies. The tricot approach affords wide scaling, as the distribution of trial packages and instruction sessions is relatively easy to execute, farmers do not need to be organized in collaborative groups, and feedback is easy to collect, even by phone. The tricot approach provides interpretable, meaningful results and was widely accepted by farmers. The methodology underwent improvement in data input formats. A number of methodological issues remain: integrating environmental analysis, capturing gender-specific differences, stimulating farmers' motivation, and supporting implementation with an integrated digital platform. Future studies should apply the tricot approach to a wider range of technologies, quantify its potential contribution to climate adaptation, and embed the approach in appropriate institutions and business models, empowering participants and democratizing science

    EXAMINING THE UTILITY OF BEHAVIORAL ECONOMIC DEMAND IN ADDICTION SCIENCE

    Get PDF
    The marriage of perspectives from behavioral economic theory and learning theory has the potential to advance an understanding of substance use and substance use disorder. Behavioral economic demand is a central concept to this interdisciplinary approach. Evaluating demand in the laboratory and clinic can improve previous research on the relative reinforcing effects of drugs by accounting for the multi-dimensional nature of reinforcement rather than viewing reinforcement as a unitary construct. Recent advances in the commodity purchase task methodology have further simplified the measurement of demand values in human participants. This dissertation project presents a programmatic series of studies designed to demonstrate the utility of using a behavioral economic demand framework and the purchase task methodology for understanding substance use disorder through basic and applied science research. Experiments are presented spanning a continuum from theoretical and methodological development to longitudinal work and clinical application. These experiments demonstrate three key conclusions regarding behavioral economic demand. First, behavioral economic demand provides a reliable and valid measure of drug valuation that is applicable to varied drug types and participant populations. Second, behavioral economic demand is a stimulus-selective measure specifically reflecting valuation for the commodity under study. Third, behavioral economic demand provides incremental information about substance use in the laboratory and clinical setting above and beyond traditional measures of reinforcer valuation and other behavioral economic variables. These findings collectively highlight the benefits of behavioral economic demand and provide an important platform for future work in addiction science

    Designing for quality in real-world mobile crowdsourcing systems

    Get PDF
    PhD ThesisCrowdsourcing has emerged as a popular means to collect and analyse data on a scale for problems that require human intelligence to resolve. Its prompt response and low cost have made it attractive to businesses and academic institutions. In response, various online crowdsourcing platforms, such as Amazon MTurk, Figure Eight and Prolific have successfully emerged to facilitate the entire crowdsourcing process. However, the quality of results has been a major concern in crowdsourcing literature. Previous work has identified various key factors that contribute to issues of quality and need to be addressed in order to produce high quality results. Crowd tasks design, in particular, is a major key factor that impacts the efficiency and effectiveness of crowd workers as well as the entire crowdsourcing process. This research investigates crowdsourcing task designs to collect and analyse two distinct types of data, and examines the value of creating high-quality crowdwork activities on new crowdsource enabled systems for end-users. The main contribution of this research includes 1) a set of guidelines for designing crowdsourcing tasks that support quality collection, analysis and translation of speech and eye tracking data in real-world scenarios; and 2) Crowdsourcing applications that capture real-world data and coordinate the entire crowdsourcing process to analyse and feed quality results back. Furthermore, this research proposes a new quality control method based on workers trust and self-verification. To achieve this, the research follows the case study approach with a focus on two real-world data collection and analysis case studies. The first case study, Speeching, explores real-world speech data collection, analysis, and feedback for people with speech disorder, particularly with Parkinson’s. The second case study, CrowdEyes, examines the development and use of a hybrid system combined of crowdsourcing and low-cost DIY mobile eye trackers for real-world visual data collection, analysis, and feedback. Both case studies have established the capability of crowdsourcing to obtain high quality responses comparable to that of an expert. The Speeching app, and the provision of feedback in particular were well perceived by the participants. This opens up new opportunities in digital health and wellbeing. Besides, the proposed crowd-powered eye tracker is fully functional under real-world settings. The results showed how this approach outperforms all current state-of-the-art algorithms under all conditions, which opens up the technology for wide variety of eye tracking applications in real-world settings

    Novel Methods for Designing Tasks in Crowdsourcing

    Get PDF
    Crowdsourcing is becoming more popular as a means for scalable data processing that requires human intelligence. The involvement of groups of people to accomplish tasks could be an effective success factor for data-driven businesses. Unlike in other technical systems, the quality of the results depends on human factors and how well crowd workers understand the requirements of the task, to produce high-quality results. Looking at previous studies in this area, we found that one of the main factors that affect workers’ performance is the design of the crowdsourcing tasks. Previous studies of crowdsourcing task design covered a limited set of factors. The main contribution of this research is the focus on some of the less-studied technical factors, such as examining the effect of task ordering and class balance and measuring the consistency of the same task design over time and on different crowdsourcing platforms. Furthermore, this study ambitiously extends work towards understanding workers’ point of view in terms of the quality of the task and the payment aspect by performing a qualitative study with crowd workers and shedding light on some of the ethical issues around payments for crowdsourcing tasks. To achieve our goal, we performed several crowdsourcing experiments on specific platforms and measured the factors that influenced the quality of the overall result

    FIRST EXPERIENCES WITH A NOVEL FARMER CITIZEN SCIENCE APPROACH: CROWDSOURCING PARTICIPATORY VARIETY SELECTION THROUGH ON-FARM TRIADIC COMPARISONS OF TECHNOLOGIES (TRICOT)

    Get PDF
    SUMMARYRapid climatic and socio-economic changes challenge current agricultural R&D capacity. The necessary quantum leap in knowledge generation should build on the innovation capacity of farmers themselves. A novel citizen science methodology, triadic comparisons of technologies or tricot, was implemented in pilot studies in India, East Africa, and Central America. The methodology involves distributing a pool of agricultural technologies in different combinations of three to individual farmers who observe these technologies under farm conditions and compare their performance. Since the combinations of three technologies overlap, statistical methods can piece together the overall performance ranking of the complete pool of technologies. The tricot approach affords wide scaling, as the distribution of trial packages and instruction sessions is relatively easy to execute, farmers do not need to be organized in collaborative groups, and feedback is easy to collect, even by phone. The tricot approach provides interpretable, meaningful results and was widely accepted by farmers. The methodology underwent improvement in data input formats. A number of methodological issues remain: integrating environmental analysis, capturing gender-specific differences, stimulating farmers' motivation, and supporting implementation with an integrated digital platform. Future studies should apply the tricot approach to a wider range of technologies, quantify its potential contribution to climate adaptation, and embed the approach in appropriate institutions and business models, empowering participants and democratizing science

    Sharing Economy Last Mile Delivery: Three Essays Addressing Operational Challenges, Customer Expectations, and Supply Uncertainty

    Get PDF
    Last mile delivery has become a critical competitive dimension facing retail supply chains. At the same time, the emergence of sharing economy platforms has introduced unique operational challenges and benefits that enable and inhibit retailers’ last mile delivery goals. This dissertation investigates key challenges faced by crowdshipping platforms used in last mile delivery related to crowdsourced delivery drivers, driver-customer interaction, and customer expectations. We investigate the research questions of this dissertation through a multi-method design approach, complementing a rich archival dataset comprised of several million orders retrieved from a Fortune 100 retail crowdshipping platform, with scenario-based experiments. Specifically, the first study analyzes the impact of delivery task remuneration and operational characteristics that impact drivers’ pre-task, task, and post-task behaviors. We found that monetary incentives are not the sole factor influencing drivers’ behaviors. Drivers also consider the operational characteristics of the task when accepting, performing, and evaluating a delivery task. The second study examines a driver’s learning experience relative to a delivery task and the context where it takes place. Results show the positive impact of driver familiarity on delivery time performance, and that learning enhances the positive effect. Finally, the third study focuses on how delivery performance shape customers’ experience and future engagement with the retailer, examining important contingency factors in these relationships. Findings support the notion that consumers time-related expectations on the last mile delivery service influence their perceptions of the delivery performance, and their repurchase behaviors. Overall, this dissertation provides new insights in this emerging field that advance theory and practice
    • …
    corecore