496 research outputs found

    It's getting crowded! : improving the effectiveness of microtask crowdsourcing

    Get PDF
    [no abstract

    The Use of Online Panel Data in Management Research: A Review and Recommendations

    Get PDF
    Management scholars have long depended on convenience samples to conduct research involving human participants. However, the past decade has seen an emergence of a new convenience sample: online panels and online panel participants. The data these participants provide—online panel data (OPD)—has been embraced by many management scholars owing to the numerous benefits it provides over “traditional” convenience samples. Despite those advantages, OPD has not been warmly received by all. Currently, there is a divide in the field over the appropriateness of OPD in management scholarship. Our review takes aim at the divide with the goal of providing a common understanding of OPD and its utility and providing recommendations regarding when and how to use OPD and how and where to publish it. To accomplish these goals, we inventoried and reviewed OPD use across 13 management journals spanning 2006 to 2017. Our search resulted in 804 OPD-based studies across 439 articles. Notably, our search also identified 26 online panel platforms (“brokers”) used to connect researchers with online panel participants. Importantly, we offer specific guidance to authors, reviewers, and editors, having implications for both micro and macro management scholars

    Partnering People with Deep Learning Systems: Human Cognitive Effects of Explanations

    Get PDF
    Advances in “deep learning” algorithms have led to intelligent systems that provide automated classifications of unstructured data. Until recently these systems could not provide the reasons behind a classification. This lack of “explainability” has led to resistance in applying these systems in some contexts. An intensive research and development effort to make such systems more transparent and interpretable has proposed and developed multiple types of explanation to address this challenge. Relatively little research has been conducted into how humans process these explanations. Theories and measures from areas of research in social cognition were selected to evaluate attribution of mental processes from intentional systems theory, measures of working memory demands from cognitive load theory, and self-efficacy from social cognition theory. Crowdsourced natural disaster damage assessment of aerial images was employed using a written assessment guideline as the task. The “Wizard of Oz” method was used to generate the damage assessment output of a simulated agent. The output and explanations contained errors consistent with transferring a deep learning system to a new disaster event. A between-subjects experiment was conducted where three types of natural language explanations were manipulated between conditions. Counterfactual explanations increased intrinsic cognitive load and made participants more aware of the challenges of the task. Explanations that described boundary conditions and failure modes (“hedging explanations”) decreased agreement with erroneous agent ratings without a detectable effect on cognitive load. However, these effects were not large enough to counteract decreases in self-efficacy and increases in erroneous agreement as a result of providing a causal explanation. The extraneous cognitive load generated by explanations had the strongest influence on self-efficacy in the task. Presenting all of the explanation types at the same time maximized cognitive load and agreement with erroneous simulated output. Perceived interdependence with the simulated agent was also associated with increases in self-efficacy; however, trust in the agent was not associated with differences in self-efficacy. These findings identify effects related to research areas which have developed methods to design tasks that may increase the effectiveness of explanations

    Understanding and improving subjective measures in human-computer interaction

    Get PDF
    In Human-Computer Interaction (HCI), research has shifted from a focus on usability and performance towards the holistic notion of User Experience (UX). Research into UX places special emphasis on concepts from psychology, such as emotion, trust, and motivation. Under this paradigm, elaborate methods to capture the richness and diversity of subjective experiences are needed. Although psychology offers a long-standing tradition of developing self-reported scales, it is currently undergoing radical changes in research and reporting practice. Hence, UX research is facing several challenges, such as the widespread use of ad-hoc questionnaires with unknown or unsatisfactory psychometric properties, or a lack of replication and transparency. Therefore, this thesis contributes to several gaps in the research by developing and validating self-reported scales in the domain of user motivation (manuscript 1), perceived user interface language quality (manuscript 2), and user trust (manuscript 3). Furthermore, issues of online research and practical considerations to ensure data quality are empirically examined (manuscript 4). Overall, this thesis provides well-documented templates for scale development, and may help improve scientific rigor in HCI

    Hot Topics Surrounding Acceptability Judgement Tasks

    Get PDF
    This paper discusses various "hot topics" concerning methodological issues in experimental syntax, with a focus on acceptability judgement tasks. We first review the literature on the question whether formal methods are necessary at all and argue that this is indeed the case. We then address questions concerning running experiments, with a focus on running experiments via the internet and dealing with non-cooperative behaviour. We review strategies to fend-off and to detect non-cooperative behaviour. Strategies based on response times can be used effectively to do so, already during the actual experiment. We show how quick clicking through an experiment can be prevented by giving a warning when response times fall below a predefined threshold. Sometimes participants counterbalance extremely short response times by pausing. Therefore, median response times rather than mean response times should be used for excluding participants post-experiment. In the final section, we present some thoughts on gradience and argue that recent findings make a case that the observed gradience is not just a by-product, but comes from the grammar itself and should be modelled as such

    Designing AI Support for Human Involvement in AI-assisted Decision Making: A Taxonomy of Human-AI Interactions from a Systematic Review

    Full text link
    Efforts in levering Artificial Intelligence (AI) in decision support systems have disproportionately focused on technological advancements, often overlooking the alignment between algorithmic outputs and human expectations. To address this, explainable AI promotes AI development from a more human-centered perspective. Determining what information AI should provide to aid humans is vital, however, how the information is presented, e. g., the sequence of recommendations and the solicitation of interpretations, is equally crucial. This motivates the need to more precisely study Human-AI interaction as a pivotal component of AI-based decision support. While several empirical studies have evaluated Human-AI interactions in multiple application domains in which interactions can take many forms, there is not yet a common vocabulary to describe human-AI interaction protocols. To address this gap, we describe the results of a systematic review of the AI-assisted decision making literature, analyzing 105 selected articles, which grounds the introduction of a taxonomy of interaction patterns that delineate various modes of human-AI interactivity. We find that current interactions are dominated by simplistic collaboration paradigms and report comparatively little support for truly interactive functionality. Our taxonomy serves as a valuable tool to understand how interactivity with AI is currently supported in decision-making contexts and foster deliberate choices of interaction designs

    Spam elimination and bias correction : ensuring label quality in crowdsourced tasks.

    Get PDF
    Crowdsourcing is proposed as a powerful mechanism for accomplishing large scale tasks via anonymous workers online. It has been demonstrated as an effective and important approach for collecting labeled data in application domains which require human intelligence, such as image labeling, video annotation, natural language processing, etc. Despite the promises, one big challenge still exists in crowdsourcing systems: the difficulty of controlling the quality of crowds. The workers usually have diverse education levels, personal preferences, and motivations, leading to unknown work performance while completing a crowdsourced task. Among them, some are reliable, and some might provide noisy feedback. It is intrinsic to apply worker filtering approach to crowdsourcing applications, which recognizes and tackles noisy workers, in order to obtain high-quality labels. The presented work in this dissertation provides discussions in this area of research, and proposes efficient probabilistic based worker filtering models to distinguish varied types of poor quality workers. Most of the existing work in literature in the field of worker filtering either only concentrates on binary labeling tasks, or fails to separate the low quality workers whose label errors can be corrected from the other spam workers (with label errors which cannot be corrected). As such, we first propose a Spam Removing and De-biasing Framework (SRDF), to deal with the worker filtering procedure in labeling tasks with numerical label scales. The developed framework can detect spam workers and biased workers separately. The biased workers are defined as those who show tendencies of providing higher (or lower) labels than truths, and their errors are able to be corrected. To tackle the biasing problem, an iterative bias detection approach is introduced to recognize the biased workers. The spam filtering algorithm proposes to eliminate three types of spam workers, including random spammers who provide random labels, uniform spammers who give same labels for most of the items, and sloppy workers who offer low accuracy labels. Integrating the spam filtering and bias detection approaches into aggregating algorithms, which infer truths from labels obtained from crowds, can lead to high quality consensus results. The common characteristic of random spammers and uniform spammers is that they provide useless feedback without making efforts for a labeling task. Thus, it is not necessary to distinguish them separately. In addition, the removal of sloppy workers has great impact on the detection of biased workers, with the SRDF framework. To combat these problems, a different way of worker classification is presented in this dissertation. In particular, the biased workers are classified as a subcategory of sloppy workers. Finally, an ITerative Self Correcting - Truth Discovery (ITSC-TD) framework is then proposed, which can reliably recognize biased workers in ordinal labeling tasks, based on a probabilistic based bias detection model. ITSC-TD estimates true labels through applying an optimization based truth discovery method, which minimizes overall label errors by assigning different weights to workers. The typical tasks posted on popular crowdsourcing platforms, such as MTurk, are simple tasks, which are low in complexity, independent, and require little time to complete. Complex tasks, however, in many cases require the crowd workers to possess specialized skills in task domains. As a result, this type of task is more inclined to have the problem of poor quality of feedback from crowds, compared to simple tasks. As such, we propose a multiple views approach, for the purpose of obtaining high quality consensus labels in complex labeling tasks. In this approach, each view is defined as a labeling critique or rubric, which aims to guide the workers to become aware of the desirable work characteristics or goals. Combining the view labels results in the overall estimated labels for each item. The multiple views approach is developed under the hypothesis that workers\u27 performance might differ from one view to another. Varied weights are then assigned to different views for each worker. Additionally, the ITSC-TD framework is integrated into the multiple views model to achieve high quality estimated truths for each view. Next, we propose a Semi-supervised Worker Filtering (SWF) model to eliminate spam workers, who assign random labels for each item. The SWF approach conducts worker filtering with a limited set of gold truths available as priori. Each worker is associated with a spammer score, which is estimated via the developed semi-supervised model, and low quality workers are efficiently detected by comparing the spammer score with a predefined threshold value. The efficiency of all the developed frameworks and models are demonstrated on simulated and real-world data sets. By comparing the proposed frameworks to a set of state-of-art methodologies, such as expectation maximization based aggregating algorithm, GLAD and optimization based truth discovery approach, in the domain of crowdsourcing, up to 28.0% improvement can be obtained for the accuracy of true label estimation

    When in doubt ask the crowd : leveraging collective intelligence for improving event detection and machine learning

    Get PDF
    [no abstract
    • …
    corecore