12 research outputs found

    Lean Multiclass Crowdsourcing

    Get PDF
    We introduce a method for efficiently crowdsourcing multiclass annotations in challenging, real world image datasets. Our method is designed to minimize the number of human annotations that are necessary to achieve a desired level of confidence on class labels. It is based on combining models of worker behavior with computer vision. Our method is general: it can handle a large number of classes, worker labels that come from a taxonomy rather than a flat list, and can model the dependence of labels when workers can see a history of previous annotations. Our method may be used as a drop-in replacement for the majority vote algorithms used in online crowdsourcing services that aggregate multiple human annotations into a final consolidated label. In experiments conducted on two real-life applications we find that our method can reduce the number of required annotations by as much as a factor of 5.4 and can reduce the residual annotation error by up to 90% when compared with majority voting. Furthermore, the online risk estimates of the models may be used to sort the annotated collection and minimize subsequent expert review effort

    Lean Multiclass Crowdsourcing

    Get PDF
    We introduce a method for efficiently crowdsourcing multiclass annotations in challenging, real world image datasets. Our method is designed to minimize the number of human annotations that are necessary to achieve a desired level of confidence on class labels. It is based on combining models of worker behavior with computer vision. Our method is general: it can handle a large number of classes, worker labels that come from a taxonomy rather than a flat list, and can model the dependence of labels when workers can see a history of previous annotations. Our method may be used as a drop-in replacement for the majority vote algorithms used in online crowdsourcing services that aggregate multiple human annotations into a final consolidated label. In experiments conducted on two real-life applications we find that our method can reduce the number of required annotations by as much as a factor of 5.4 and can reduce the residual annotation error by up to 90% when compared with majority voting. Furthermore, the online risk estimates of the models may be used to sort the annotated collection and minimize subsequent expert review effort

    Recognition in Terra Incognita

    Get PDF
    It is desirable for detection and classification algorithms to generalize to unfamiliar environments, but suitable benchmarks for quantitatively studying this phenomenon are not yet available. We present a dataset designed to measure recognition generalization to novel environments. The images in our dataset are harvested from twenty camera traps deployed to monitor animal populations. Camera traps are fixed at one location, hence the background changes little across images; capture is triggered automatically, hence there is no human bias. The challenge is learning recognition in a handful of locations, and generalizing animal detection and classification to new locations where no training data is available. In our experiments state-of-the-art algorithms show excellent performance when tested at the same location where they were trained. However, we find that generalization to new locations is poor, especially for classification systems. (The dataset is available at https://beerys.github.io/CaltechCameraTraps/

    Recognition in Terra Incognita

    Get PDF
    It is desirable for detection and classification algorithms to generalize to unfamiliar environments, but suitable benchmarks for quantitatively studying this phenomenon are not yet available. We present a dataset designed to measure recognition generalization to novel environments. The images in our dataset are harvested from twenty camera traps deployed to monitor animal populations. Camera traps are fixed at one location, hence the background changes little across images; capture is triggered automatically, hence there is no human bias. The challenge is learning recognition in a handful of locations, and generalizing animal detection and classification to new locations where no training data is available. In our experiments state-of-the-art algorithms show excellent performance when tested at the same location where they were trained. However, we find that generalization to new locations is poor, especially for classification systems. (The dataset is available at https://beerys.github.io/CaltechCameraTraps/

    Learning From Noisy Labels By Regularized Estimation Of Annotator Confusion

    Get PDF
    The predictive performance of supervised learning algorithms depends on the quality of labels. In a typical label collection process, multiple annotators provide subjective noisy estimates of the "truth" under the influence of their varying skill-levels and biases. Blindly treating these noisy labels as the ground truth limits the accuracy of learning algorithms in the presence of strong disagreement. This problem is critical for applications in domains such as medical imaging where both the annotation cost and inter-observer variability are high. In this work, we present a method for simultaneously learning the individual annotator model and the underlying true label distribution, using only noisy observations. Each annotator is modeled by a confusion matrix that is jointly estimated along with the classifier predictions. We propose to add a regularization term to the loss function that encourages convergence to the true annotator confusion matrix. We provide a theoretical argument as to how the regularization is essential to our approach both for the case of single annotator and multiple annotators. Despite the simplicity of the idea, experiments on image classification tasks with both simulated and real labels show that our method either outperforms or performs on par with the state-of-the-art methods and is capable of estimating the skills of annotators even with a single label available per image.Comment: CVPR 2019, code snippets include

    Disentangling Human Error from the Ground Truth in Segmentation of Medical Images

    Get PDF
    Recent years have seen increasing use of supervised learning methods for segmentation tasks. However, the predictive performance of these algorithms depends on the quality of labels. This problem is particularly pertinent in the medical image domain, where both the annotation cost and inter-observer variability are high. In a typical label acquisition process, different human experts provide their estimates of the 'true' segmentation labels under the influence of their own biases and competence levels. Treating these noisy labels blindly as the ground truth limits the performance that automatic segmentation algorithms can achieve. In this work, we present a method for jointly learning, from purely noisy observations alone, the reliability of individual annotators and the true segmentation label distributions, using two coupled CNNs. The separation of the two is achieved by encouraging the estimated annotators to be maximally unreliable while achieving high fidelity with the noisy training data. We first define a toy segmentation dataset based on MNIST and study the properties of the proposed algorithm. We then demonstrate the utility of the method on three public medical imaging segmentation datasets with simulated (when necessary) and real diverse annotations: 1) MSLSC (multiple-sclerosis lesions); 2) BraTS (brain tumours); 3) LIDC-IDRI (lung abnormalities). In all cases, our method outperforms competing methods and relevant baselines particularly in cases where the number of annotations is small and the amount of disagreement is large. The experiments also show strong ability to capture the complex spatial characteristics of annotators' mistakes

    "Флора России" на платформе iNaturalist: большие данные о биоразнообразии большой страны

    Get PDF
    Проект "Флора России" на международной платформе iNaturalist объединил профессиональных ученых и любителей природы со всей страны. Это третий по объему массив открытых пространственных данных о биоразнообразии страны (и второй по распространению растений), ведущий источник данных по современному состоянию флор

    Recognition in Terra Incognita

    Get PDF
    It is desirable for detection and classification algorithms to generalize to unfamiliar environments, but suitable benchmarks for quantitatively studying this phenomenon are not yet available. We present a dataset designed to measure recognition generalization to novel environments. The images in our dataset are harvested from twenty camera traps deployed to monitor animal populations. Camera traps are fixed at one location, hence the background changes little across images; capture is triggered automatically, hence there is no human bias. The challenge is learning recognition in a handful of locations, and generalizing animal detection and classification to new locations where no training data is available. In our experiments state-of-the-art algorithms show excellent performance when tested at the same location where they were trained. However, we find that generalization to new locations is poor, especially for classification systems. (The dataset is available at https://beerys.github.io/CaltechCameraTraps/

    크라우드소싱 시스템에서의 빠르고 신뢰성 높은 추론 알고리즘

    Get PDF
    학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2021. 2. 정교민.As the need for large scale labeled data grows in various fields, the appearance of web-based crowdsourcing systems gives a promising solution to exploiting the wisdom of crowds efficiently in a short time with a relatively low budget. Despite their efficiency, crowdsourcing systems have an inherent problem in that responses from workers can be unreliable since workers are low-paid and have low responsibility. Although simple majority voting can be a natural solution, various research studies have sought to aggregate noisy responses to obtain greater reliability in results. In this dissertation, we propose novel iterative massage-passing style algorithms to infer the groundtruths from noisy answers, which can be directly applied to real crowdsourcing systems. While EM-based algorithms get the limelight in crowdsourcing systems due to their useful inference techniques, our proposed algorithms draw faster and more reliable answers through an iterative scheme based on the idea of low-rank matrix approximations. We show that the performance of our proposed iterative algorithms are order-optimal, which outperforms majority voting and EM-based algorithms. Unlike other researches solving simple binary-choice questions (yes & no), our studies cover more complex task types which contain multiple-choice questions, short-answer questions, K-approval voting, and real-valued vector regression.다양한 분야에서 라벨된 빅데이터를 필요로 하는 현재, 웹 기반 크라우드소싱 서비스들이 출범하며 상대적으로 적은 예산과 짧은 시간에도 효율적으로 사람들의 지혜를 활용할 수 있는 방법들이 제시되고 있다. 이러한 방법들의 효율성에도 불구하고, 크라우드소싱 시스템의 선천적인 문제점은 일을 맡은 사람들의 적은 보상 및 책임감 결여로 인해 그들의 응답을 완전히 신뢰할 수 없다는 점에 있다. 이에 다수결 방식이 자연스러운 해법으로 사용되지만, 보다 신뢰 높은 답을 얻어내기 위해 많은 연구들이 진행되고 있다. 본 박사학위 논문에서는 크라우드소싱 시스템에서 수많은 사람들로부터 받은 응답들을 모아 신뢰성 높은 응답을 추론하는 반복적 메세지전달 형태의 알고리즘들을 제시한다. 본 알고리즘들은 낮은랭크근사에 기반한 반복 추론 방법으로, 기존에 각광받던 EM 알고리즘들에 비해 더 빠르고 신뢰적인 정답을 추론해낸다. 더불어 본 알고리즘들의 추론 정확도가 최적에 매우 근접하며 다수결 방식 및 EM 알고리즘들의 정확도를 상회한다는 것을 이론적 증명 및 실험적 결과를 통해 제시한다. 본 연구는 실제 크라우드소싱에서 대다수의 응답 유형을 차지하는 객관식 응답, 주관식 응답, 복수 선택 응답, 및 실수 값 응답의 추론 문제를 다루며, 기존 양자택일 응답 추론 문제만을 다루는 기존 연구들과 큰 차별성을 가진다.1 Introduction 1 2 Background 9 2.1 Crowdsourcing Systems for Binary-choice Questions 9 2.1.1 Majority Voting 10 2.1.2 Expectation Maximization 11 2.1.3 Message Passing 11 3 Crowdsourcing Systems for Multiple-choice Questions 12 3.1 Related Work 13 3.2 Problem Setup 16 3.3 Inference Algorithm 17 3.3.1 Task Allocation 17 3.3.2 Multiple Iterative Algorithm 18 3.3.3 Task Allocation for General Setting 20 3.4 Applications 23 3.5 Analysis of Algorithms 25 3.5.1 Quality of Workers 25 3.5.2 Bound on the Average Error Probability 27 3.5.3 Proof of the Error Bounds 29 3.5.4 Proof of Sub-Gaussianity 32 3.6 Experimental Results 36 3.6.1 Comparison with Other Algorithms 37 3.6.2 Adaptive Scenario 38 3.6.3 Simulations on a Set of Various D Values 41 3.7 Conclusion 43 4 Crowdsourcing Systems for Multiple-choice Questions with K-Approval Voting 45 4.1 Related Work 47 4.2 Problem Setup 49 4.2.1 Problem Definition 49 4.2.2 Worker Model for Various (D, K) 50 4.3 Inference Algorithm 51 4.4 Analysis of Algorithms 53 4.4.1 Worker Model 55 4.4.2 Quality of Workers 56 4.4.3 Bound on the Average Error Probability 58 4.4.4 Proof of the Error Bounds 59 4.4.5 Proof of Sub-Gaussianity 62 4.4.6 Phase Transition 67 4.5 Experimental Results 68 4.5.1 Performance on the Average Error with q and l 68 4.5.2 Relationship between Reliability and y-message 69 4.5.3 Performance on the Average Error with Various (D, K) Pairs 69 4.6 Conclusion 72 5 Crowdsourcing Systems for Real-valued Vector Regression 73 5.1 Related Work 75 5.2 Problem Setup 77 5.3 Inference Algorithm 78 5.3.1 Task Message 79 5.3.2 Worker Message 80 5.4 Analysis of Algorithms 81 5.4.1 Worker Model 81 5.4.2 Oracle Estimator 84 5.4.3 Bound on the Average Error Probability 86 5.5 Experimental Results 91 5.5.1 Real Crowdsourcing Data 91 5.5.2 Verification of the Error Bounds with Synthetic data 96 5.6 Conclusion 98 6 Conclusions 99Docto
    corecore