454 research outputs found

    Does Confidence Reporting from the Crowd Benefit Crowdsourcing Performance?

    Full text link
    We explore the design of an effective crowdsourcing system for an MM-ary classification task. Crowd workers complete simple binary microtasks whose results are aggregated to give the final classification decision. We consider the scenario where the workers have a reject option so that they are allowed to skip microtasks when they are unable to or choose not to respond to binary microtasks. Additionally, the workers report quantized confidence levels when they are able to submit definitive answers. We present an aggregation approach using a weighted majority voting rule, where each worker's response is assigned an optimized weight to maximize crowd's classification performance. We obtain a couterintuitive result that the classification performance does not benefit from workers reporting quantized confidence. Therefore, the crowdsourcing system designer should employ the reject option without requiring confidence reporting.Comment: 6 pages, 4 figures, SocialSens 2017. arXiv admin note: text overlap with arXiv:1602.0057

    Improve learning combining crowdsourced labels by weighting Areas Under the Margin

    Full text link
    In supervised learning -- for instance in image classification -- modern massive datasets are commonly labeled by a crowd of workers. The obtained labels in this crowdsourcing setting are then aggregated for training. The aggregation step generally leverages a per worker trust score. Yet, such worker-centric approaches discard each task ambiguity. Some intrinsically ambiguous tasks might even fool expert workers, which could eventually be harmful for the learning step. In a standard supervised learning setting -- with one label per task and balanced classes -- the Area Under the Margin (AUM) statistic is tailored to identify mislabeled data. We adapt the AUM to identify ambiguous tasks in crowdsourced learning scenarios, introducing the Weighted AUM (WAUM). The WAUM is an average of AUMs weighted by worker and task dependent scores. We show that the WAUM can help discarding ambiguous tasks from the training set, leading to better generalization or calibration performance. We report improvements with respect to feature-blind aggregation strategies both for simulated settings and for the CIFAR-10H crowdsourced dataset

    When in doubt ask the crowd : leveraging collective intelligence for improving event detection and machine learning

    Get PDF
    [no abstract

    The AI Neuropsychologist: Automatic scoring of memory deficits with deep learning

    Full text link
    Memory deficits are a hallmark of many different neurological and psychiatric conditions. The Rey-Osterrieth complex figure (ROCF) is the state–of-the-art assessment tool for neuropsychologists across the globe to assess the degree of non-verbal visual memory deterioration. To obtain a score, a trained clinician inspects a patient’s ROCF drawing and quantifies deviations from the original figure. This manual procedure is time-consuming, slow and scores vary depending on the clinician’s experience, motivation and tiredness. Here, we leverage novel deep learning architectures to automatize the rating of memory deficits. For this, a multi-head convolutional neural network was trained on 20225 ROCF drawings. Unbiased ground truth ROCF scores were obtained from crowdsourced human intelligence. The neural network outperforms both online raters and clinicians. Our AI-powered scoring system provides healthcare institutions worldwide with a digital tool to assess objectively, reliably and time-efficiently the performance in the ROCF test from hand-drawn images

    行動認識機械学習データセット収集のためのクラウドソーシングの研究

    Get PDF
    In this thesis, we propose novel methods to explore and improve crowdsourced data labeling for mobile activity recognition. This thesis concerns itself with the quality (i.e., the performance of a classification model), quantity (i.e., the number of data collected), and motivation (i.e., the process that initiates and maintains goal-oriented behaviors) of participant contributions in mobile activity data collection studies. We focus on achieving high-quality and consistent ground-truth labeling and, particularly, on user feedback’s impact under different conditions. Although prior works have used several techniques to improve activity recognition performance, differences to our approach exist in terms of the end goals, proposed method, and implementation. Many researchers commonly investigate post-data collection to increase activity recognition accuracy, such as implementing advanced machine learning algorithms to improve data quality or exploring several preprocessing ways to increase data quantity. However, utilizing post-data collection results is very difficult and time-consuming due to dirty data challenges for most real-world situations. Unlike those commonly used in other literature, in this thesis, we aim to motivate and sustain user engagement during their on-going-self-labeling task to optimize activity recognition accuracy. The outline of the thesis is as follows: In chapter 1 and 2, we briefly introduce the thesis work and literature review. In Chapter 3, we introduce novel gamified active learning and inaccuracy detection for crowdsourced data labeling for an activity recognition system (CrowdAct) using mobile sensing. We exploited active learning to address the lack of accurate information. We presented the integration of gamification into active learning to overcome the lack of motivation and sustained engagement. We introduced an inaccuracy detection algorithm to minimize inaccurate data. In Chapter 4, we introduce a novel method to exploit on-device deep learning inference using a long short-term memory (LSTM)-based approach to alleviate the labeling effort and ground truth data collection in activity recognition systems using smartphone sensors. The novel idea behind this is that estimated activities are used as feedback for motivating users to collect accurate activity labels. In Chapter 5, we introduce a novel on-device personalization for data labeling for an activity recognition system using mobile sensing. The key idea behind this system is that estimated activities personalized for a specific individual user can be used as feedback to motivate user contribution and improve data labeling quality. We exploited finetuning using a Deep Recurrent Neural Network (RNN) to address the lack of sufficient training data and minimize the need for training deep learning on mobile devices from scratch. We utilized a model pruning technique to reduce the computation cost of on-device personalization without affecting the accuracy. Finally, we built a robust activity data labeling system by integrating the two techniques outlined above, allowing the mobile application to create a personalized experience for the user. To demonstrate the proposed methods’ capability and feasibility in realistic settings, we developed and deployed the systems to real-world settings such as crowdsourcing. For the process of data labeling, we challenged online and self-labeling scenarios using inertial smartphone sensors, such as accelerometers. We recruited diverse participants and con- ducted the experiments both in a laboratory setting and in a semi-natural setting. We also applied both manual labeling and the assistance of semi-automated labeling. Addition- ally, we gathered massive labeled training data in activity recognition using smartphone sensors and other information such as user demographics and engagement. Chapter 6 offers a brief discussion of the thesis. In Chapter 7, we conclude the thesis with conclusion and some future work issues. We empirically evaluated these methods across various study goals such as machine learning and descriptive and inferential statistics. Our results indicated that this study enabled us to effectively collect crowdsourced activity data. Our work revealed clear opportunities and challenges in combining human and mobile phone-based sensing techniques for researchers interested in studying human behavior in situ. Researchers and practitioners can apply our findings to improve recognition accuracy and reduce unreliable labels by human users, increase the total number of collected responses, as well as enhance participant motivation for activity data collection.九州工業大学博士学位論文 学位記番号:工博甲第526号 学位授与年月日:令和3年6月28日1 Introduction|2 Related work|3 Achieving High-Quality Crowdsourced Datasets in Mobile Activity Recognition|4 On-Device Deep Learning Inference for Activity Data Collection|5 On-Device Deep Personalization for Activity Data Collection|6 Discussion|7 Conclusion九州工業大学令和3年

    Spam elimination and bias correction : ensuring label quality in crowdsourced tasks.

    Get PDF
    Crowdsourcing is proposed as a powerful mechanism for accomplishing large scale tasks via anonymous workers online. It has been demonstrated as an effective and important approach for collecting labeled data in application domains which require human intelligence, such as image labeling, video annotation, natural language processing, etc. Despite the promises, one big challenge still exists in crowdsourcing systems: the difficulty of controlling the quality of crowds. The workers usually have diverse education levels, personal preferences, and motivations, leading to unknown work performance while completing a crowdsourced task. Among them, some are reliable, and some might provide noisy feedback. It is intrinsic to apply worker filtering approach to crowdsourcing applications, which recognizes and tackles noisy workers, in order to obtain high-quality labels. The presented work in this dissertation provides discussions in this area of research, and proposes efficient probabilistic based worker filtering models to distinguish varied types of poor quality workers. Most of the existing work in literature in the field of worker filtering either only concentrates on binary labeling tasks, or fails to separate the low quality workers whose label errors can be corrected from the other spam workers (with label errors which cannot be corrected). As such, we first propose a Spam Removing and De-biasing Framework (SRDF), to deal with the worker filtering procedure in labeling tasks with numerical label scales. The developed framework can detect spam workers and biased workers separately. The biased workers are defined as those who show tendencies of providing higher (or lower) labels than truths, and their errors are able to be corrected. To tackle the biasing problem, an iterative bias detection approach is introduced to recognize the biased workers. The spam filtering algorithm proposes to eliminate three types of spam workers, including random spammers who provide random labels, uniform spammers who give same labels for most of the items, and sloppy workers who offer low accuracy labels. Integrating the spam filtering and bias detection approaches into aggregating algorithms, which infer truths from labels obtained from crowds, can lead to high quality consensus results. The common characteristic of random spammers and uniform spammers is that they provide useless feedback without making efforts for a labeling task. Thus, it is not necessary to distinguish them separately. In addition, the removal of sloppy workers has great impact on the detection of biased workers, with the SRDF framework. To combat these problems, a different way of worker classification is presented in this dissertation. In particular, the biased workers are classified as a subcategory of sloppy workers. Finally, an ITerative Self Correcting - Truth Discovery (ITSC-TD) framework is then proposed, which can reliably recognize biased workers in ordinal labeling tasks, based on a probabilistic based bias detection model. ITSC-TD estimates true labels through applying an optimization based truth discovery method, which minimizes overall label errors by assigning different weights to workers. The typical tasks posted on popular crowdsourcing platforms, such as MTurk, are simple tasks, which are low in complexity, independent, and require little time to complete. Complex tasks, however, in many cases require the crowd workers to possess specialized skills in task domains. As a result, this type of task is more inclined to have the problem of poor quality of feedback from crowds, compared to simple tasks. As such, we propose a multiple views approach, for the purpose of obtaining high quality consensus labels in complex labeling tasks. In this approach, each view is defined as a labeling critique or rubric, which aims to guide the workers to become aware of the desirable work characteristics or goals. Combining the view labels results in the overall estimated labels for each item. The multiple views approach is developed under the hypothesis that workers\u27 performance might differ from one view to another. Varied weights are then assigned to different views for each worker. Additionally, the ITSC-TD framework is integrated into the multiple views model to achieve high quality estimated truths for each view. Next, we propose a Semi-supervised Worker Filtering (SWF) model to eliminate spam workers, who assign random labels for each item. The SWF approach conducts worker filtering with a limited set of gold truths available as priori. Each worker is associated with a spammer score, which is estimated via the developed semi-supervised model, and low quality workers are efficiently detected by comparing the spammer score with a predefined threshold value. The efficiency of all the developed frameworks and models are demonstrated on simulated and real-world data sets. By comparing the proposed frameworks to a set of state-of-art methodologies, such as expectation maximization based aggregating algorithm, GLAD and optimization based truth discovery approach, in the domain of crowdsourcing, up to 28.0% improvement can be obtained for the accuracy of true label estimation
    corecore