9 research outputs found

    A small note on variation in segmentation annotations

    Full text link
    We report on the results of a small crowdsourcing experiment conducted at a workshop on machine learning for segmentation held at the Danish Bio Imaging network meeting 2020. During the workshop we asked participants to manually segment mitochondria in three 2D patches. The aim of the experiment was to illustrate that manual annotations should not be seen as the ground truth, but as a reference standard that is subject to substantial variation. In this note we show how the large variation we observed in the segmentations can be reduced by removing the annotators with worst pair-wise agreement. Having removed the annotators with worst performance, we illustrate that the remaining variance is semantically meaningful and can be exploited to obtain segmentations of cell boundary and cell interior

    Crowdsourcing of Histological Image Labeling and Object Delineation by Medical Students

    Get PDF
    Crowdsourcing in pathology has been performed on tasks that are assumed to be manageable by nonexperts. Demand remains high for annotations of more complex elements in digital microscopic images, such as anatomical structures. Therefore, this work investigates conditions to enable crowdsourced annotations of high-level image objects, a complex task considered to require expert knowledge. 76 medical students without specific domain knowledge who voluntarily participated in three experiments solved two relevant annotation tasks on histopathological images: (1) Labeling of images showing tissue regions, and (2) delineation of morphologically defined image objects. We focus on methods to ensure sufficient annotation quality including several tests on the required number of participants and on the correlation of participants' performance between tasks. In a set up simulating annotation of images with limited ground truth, we validated the feasibility of a confidence score using full ground truth. For this, we computed a majority vote using weighting factors based on individual assessment of contributors against scattered gold standard annotated by pathologists. In conclusion, we provide guidance for task design and quality control to enable a crowdsourced approach to obtain accurate annotations required in the era of digital pathology

    A crowdsourcing semi-automatic image segmentation platform for cell biology

    Get PDF
    State-of-the-art computer-vision algorithms rely on big and accurately annotated data, which are expensive, laborious and time-consuming to generate. This task is even more challenging when it comes to microbiological images, because they require specialized expertise for accurate annotation. Previous studies show that crowdsourcing and assistive-annotation tools are two potential solutions to address this challenge. In this work, we have developed a web-based platform to enable crowdsourcing annotation of image data; the platform is powered by a semi-automated assistive tool to support non-expert annotators to improve the annotation efficiency. The behavior of annotators with and without the assistive tool is analyzed, using biological images of different complexity. More specifically, non-experts have been asked to use the platform to annotate microbiological images of gut parasites, which are compared with annotations by experts. A quantitative evaluation is carried out on the results, confirming that the assistive tools can noticeably decrease the non-expert annotation�s cost (time, click, interaction, etc.) while preserving or even improving the annotation�s quality. The annotation quality of non-experts has been investigated using IOU (intersection of union), precision and recall; based on this analysis we propose some ideas on how to better design similar crowdsourcing and assistive platforms

    Accurate and budget-efficient text, image, and video analysis systems powered by the crowd

    Full text link
    Crowdsourcing systems empower individuals and companies to outsource labor-intensive tasks that cannot currently be solved by automated methods and are expensive to tackle by domain experts. Crowdsourcing platforms are traditionally used to provide training labels for supervised machine learning algorithms. Crowdsourced tasks are distributed among internet workers who typically have a range of skills and knowledge, differing previous exposure to the task at hand, and biases that may influence their work. This inhomogeneity of the workforce makes the design of accurate and efficient crowdsourcing systems challenging. This dissertation presents solutions to improve existing crowdsourcing systems in terms of accuracy and efficiency. It explores crowdsourcing tasks in two application areas, political discourse and annotation of biomedical and everyday images. The first part of the dissertation investigates how workers' behavioral factors and their unfamiliarity with data can be leveraged by crowdsourcing systems to control quality. Through studies that involve familiar and unfamiliar image content, the thesis demonstrates the benefit of explicitly accounting for a worker's familiarity with the data when designing annotation systems powered by the crowd. The thesis next presents Crowd-O-Meter, a system that automatically predicts the vulnerability of crowd workers to believe \enquote{fake news} in text and video. The second part of the dissertation explores the reversed relationship between machine learning and crowdsourcing by incorporating machine learning techniques for quality control of crowdsourced end products. In particular, it investigates if machine learning can be used to improve the quality of crowdsourced results and also consider budget constraints. The thesis proposes an image analysis system called ICORD that utilizes behavioral cues of the crowd worker, augmented by automated evaluation of image features, to infer the quality of a worker-drawn outline of a cell in a microscope image dynamically. ICORD determines the need to seek additional annotations from other workers in a budget-efficient manner. Next, the thesis proposes a budget-efficient machine learning system that uses fewer workers to analyze easy-to-label data and more workers for data that require extra scrutiny. The system learns a mapping from data features to number of allocated crowd workers for two case studies, sentiment analysis of twitter messages and segmentation of biomedical images. Finally, the thesis uncovers the potential for design of hybrid crowd-algorithm methods by describing an interactive system for cell tracking in time-lapse microscopy videos, based on a prediction model that determines when automated cell tracking algorithms fail and human interaction is needed to ensure accurate tracking

    Investigating the Influence of Data Familiarity to Improve the Design of a Crowdsourcing Image Annotation System

    No full text
    Crowdsourced demarcations of object boundaries in images (segmentations) are important for many vision-based applications. A commonly reported challenge is that a large percentage of crowd results are discarded due to concerns about quality. We conducted three studies to examine (1) how does the quality of crowdsourced segmentations differ for familiar everyday images versus unfamiliar biomedical images?, (2) how does making familiar images less recognizable (rotating images upside down) influence crowd work with respect to the quality of results, segmentation time, and segmentation detail?, and (3) how does crowd workers’ judgments of the ambiguity of the segmentation task, collected by voting, differ for familiar everyday images and unfamiliar biomedical images? We analyzed a total of 2,525 segmentations collected from 121 crowd workers and 1,850 votes from 55 crowd workers. Our results illustrate the potential benefit of explicitly accounting for human familiarity with the data when designing computer interfaces for human interaction

    Assessing emphysema in CT scans of the lungs:Using machine learning, crowdsourcing and visual similarity

    Get PDF

    Eliciting and Leveraging Input Diversity in Crowd-Powered Intelligent Systems

    Full text link
    Collecting high quality annotations plays a crucial role in supporting machine learning algorithms, and thus, the creation of intelligent systems. Over the past decade, crowdsourcing has become a widely adopted means of manually creating annotations for various intelligent tasks, spanning from object boundary detection in images to sentiment understanding in text. This thesis presents new crowdsourcing workflows and answer aggregation algorithms that can effectively and efficiently improve collective annotation quality from crowd workers. While conventional microtask crowdsourcing approaches generally focus on improving annotation quality by promoting consensus among workers, this thesis proposes a novel concept of a diversity-driven approach. We show that leveraging diversity in workers' responses is effective in improving the accuracy of aggregate annotations because it compensates for biases or uncertainty caused by the system, tool, or the data. We then present techniques that elicit the diversity in workers' responses. These techniques are orthogonal to other quality control methods, such as filtering, training or incentives, which means they can be used in combination with existing methods. The crowd-powered intelligent systems presented in this thesis are evaluated through visual perception tasks in order to demonstrate the effectiveness of our proposed approach. The advantage of our approach is an improvement in collective quality even in settings where worker skill may vary widely, potentially lowering barriers to entry for novice workers and making it easier for requesters to find workers who can make productive contributions. This thesis demonstrates that crowd workers' input diversity can be a useful property that yields better aggregate performance than any homogeneous set of input.PHDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/153428/1/jyskwon_1.pd
    corecore