11 research outputs found

    Permutation-Invariant Consensus over Crowdsourced Labels

    Get PDF
    This Major Qualifying Project introduces a novel crowdsourcing consensus model and inference algorithm which we call PICA (Permutation-Invariant Crowdsourcing Aggregation) that is designed to recover the ground-truth labels of a dataset while being invariant to the class permutations enacted by the different annotators. The PICA model is constructed by endowing each annotator with a doubly-stochastic matrix (DSM), which models the probabilities that an annotator will perceive one class and transcribe it into another. We conduct simulations and experiments to show the advantage of PICA over similar models for three different clustering/labeling tasks, including aggregating dense image segmentations and clustering text passages. Our work was published in HCOMP 2018

    On the Impossibility of Convex Inference in Human Computation

    Full text link
    Human computation or crowdsourcing involves joint inference of the ground-truth-answers and the worker-abilities by optimizing an objective function, for instance, by maximizing the data likelihood based on an assumed underlying model. A variety of methods have been proposed in the literature to address this inference problem. As far as we know, none of the objective functions in existing methods is convex. In machine learning and applied statistics, a convex function such as the objective function of support vector machines (SVMs) is generally preferred, since it can leverage the high-performance algorithms and rigorous guarantees established in the extensive literature on convex optimization. One may thus wonder if there exists a meaningful convex objective function for the inference problem in human computation. In this paper, we investigate this convexity issue for human computation. We take an axiomatic approach by formulating a set of axioms that impose two mild and natural assumptions on the objective function for the inference. Under these axioms, we show that it is unfortunately impossible to ensure convexity of the inference problem. On the other hand, we show that interestingly, in the absence of a requirement to model "spammers", one can construct reasonable objective functions for crowdsourcing that guarantee convex inference.Comment: AAAI 201

    ํฌ๋ผ์šฐ๋“œ์†Œ์‹ฑ ์‹œ์Šคํ…œ์—์„œ์˜ ๋น ๋ฅด๊ณ  ์‹ ๋ขฐ์„ฑ ๋†’์€ ์ถ”๋ก  ์•Œ๊ณ ๋ฆฌ์ฆ˜

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021. 2. ์ •๊ต๋ฏผ.As the need for large scale labeled data grows in various fields, the appearance of web-based crowdsourcing systems gives a promising solution to exploiting the wisdom of crowds efficiently in a short time with a relatively low budget. Despite their efficiency, crowdsourcing systems have an inherent problem in that responses from workers can be unreliable since workers are low-paid and have low responsibility. Although simple majority voting can be a natural solution, various research studies have sought to aggregate noisy responses to obtain greater reliability in results. In this dissertation, we propose novel iterative massage-passing style algorithms to infer the groundtruths from noisy answers, which can be directly applied to real crowdsourcing systems. While EM-based algorithms get the limelight in crowdsourcing systems due to their useful inference techniques, our proposed algorithms draw faster and more reliable answers through an iterative scheme based on the idea of low-rank matrix approximations. We show that the performance of our proposed iterative algorithms are order-optimal, which outperforms majority voting and EM-based algorithms. Unlike other researches solving simple binary-choice questions (yes & no), our studies cover more complex task types which contain multiple-choice questions, short-answer questions, K-approval voting, and real-valued vector regression.๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ๋ผ๋ฒจ๋œ ๋น…๋ฐ์ดํ„ฐ๋ฅผ ํ•„์š”๋กœ ํ•˜๋Š” ํ˜„์žฌ, ์›น ๊ธฐ๋ฐ˜ ํฌ๋ผ์šฐ๋“œ์†Œ์‹ฑ ์„œ๋น„์Šค๋“ค์ด ์ถœ๋ฒ”ํ•˜๋ฉฐ ์ƒ๋Œ€์ ์œผ๋กœ ์ ์€ ์˜ˆ์‚ฐ๊ณผ ์งง์€ ์‹œ๊ฐ„์—๋„ ํšจ์œจ์ ์œผ๋กœ ์‚ฌ๋žŒ๋“ค์˜ ์ง€ํ˜œ๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋“ค์ด ์ œ์‹œ๋˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•๋“ค์˜ ํšจ์œจ์„ฑ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ํฌ๋ผ์šฐ๋“œ์†Œ์‹ฑ ์‹œ์Šคํ…œ์˜ ์„ ์ฒœ์ ์ธ ๋ฌธ์ œ์ ์€ ์ผ์„ ๋งก์€ ์‚ฌ๋žŒ๋“ค์˜ ์ ์€ ๋ณด์ƒ ๋ฐ ์ฑ…์ž„๊ฐ ๊ฒฐ์—ฌ๋กœ ์ธํ•ด ๊ทธ๋“ค์˜ ์‘๋‹ต์„ ์™„์ „ํžˆ ์‹ ๋ขฐํ•  ์ˆ˜ ์—†๋‹ค๋Š” ์ ์— ์žˆ๋‹ค. ์ด์— ๋‹ค์ˆ˜๊ฒฐ ๋ฐฉ์‹์ด ์ž์—ฐ์Šค๋Ÿฌ์šด ํ•ด๋ฒ•์œผ๋กœ ์‚ฌ์šฉ๋˜์ง€๋งŒ, ๋ณด๋‹ค ์‹ ๋ขฐ ๋†’์€ ๋‹ต์„ ์–ป์–ด๋‚ด๊ธฐ ์œ„ํ•ด ๋งŽ์€ ์—ฐ๊ตฌ๋“ค์ด ์ง„ํ–‰๋˜๊ณ  ์žˆ๋‹ค. ๋ณธ ๋ฐ•์‚ฌํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ํฌ๋ผ์šฐ๋“œ์†Œ์‹ฑ ์‹œ์Šคํ…œ์—์„œ ์ˆ˜๋งŽ์€ ์‚ฌ๋žŒ๋“ค๋กœ๋ถ€ํ„ฐ ๋ฐ›์€ ์‘๋‹ต๋“ค์„ ๋ชจ์•„ ์‹ ๋ขฐ์„ฑ ๋†’์€ ์‘๋‹ต์„ ์ถ”๋ก ํ•˜๋Š” ๋ฐ˜๋ณต์  ๋ฉ”์„ธ์ง€์ „๋‹ฌ ํ˜•ํƒœ์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์„ ์ œ์‹œํ•œ๋‹ค. ๋ณธ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์€ ๋‚ฎ์€๋žญํฌ๊ทผ์‚ฌ์— ๊ธฐ๋ฐ˜ํ•œ ๋ฐ˜๋ณต ์ถ”๋ก  ๋ฐฉ๋ฒ•์œผ๋กœ, ๊ธฐ์กด์— ๊ฐ๊ด‘๋ฐ›๋˜ EM ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์— ๋น„ํ•ด ๋” ๋น ๋ฅด๊ณ  ์‹ ๋ขฐ์ ์ธ ์ •๋‹ต์„ ์ถ”๋ก ํ•ด๋‚ธ๋‹ค. ๋”๋ถˆ์–ด ๋ณธ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์˜ ์ถ”๋ก  ์ •ํ™•๋„๊ฐ€ ์ตœ์ ์— ๋งค์šฐ ๊ทผ์ ‘ํ•˜๋ฉฐ ๋‹ค์ˆ˜๊ฒฐ ๋ฐฉ์‹ ๋ฐ EM ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์˜ ์ •ํ™•๋„๋ฅผ ์ƒํšŒํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์ด๋ก ์  ์ฆ๋ช… ๋ฐ ์‹คํ—˜์  ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ์ œ์‹œํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์‹ค์ œ ํฌ๋ผ์šฐ๋“œ์†Œ์‹ฑ์—์„œ ๋Œ€๋‹ค์ˆ˜์˜ ์‘๋‹ต ์œ ํ˜•์„ ์ฐจ์ง€ํ•˜๋Š” ๊ฐ๊ด€์‹ ์‘๋‹ต, ์ฃผ๊ด€์‹ ์‘๋‹ต, ๋ณต์ˆ˜ ์„ ํƒ ์‘๋‹ต, ๋ฐ ์‹ค์ˆ˜ ๊ฐ’ ์‘๋‹ต์˜ ์ถ”๋ก  ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃจ๋ฉฐ, ๊ธฐ์กด ์–‘์žํƒ์ผ ์‘๋‹ต ์ถ”๋ก  ๋ฌธ์ œ๋งŒ์„ ๋‹ค๋ฃจ๋Š” ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค๊ณผ ํฐ ์ฐจ๋ณ„์„ฑ์„ ๊ฐ€์ง„๋‹ค.1 Introduction 1 2 Background 9 2.1 Crowdsourcing Systems for Binary-choice Questions 9 2.1.1 Majority Voting 10 2.1.2 Expectation Maximization 11 2.1.3 Message Passing 11 3 Crowdsourcing Systems for Multiple-choice Questions 12 3.1 Related Work 13 3.2 Problem Setup 16 3.3 Inference Algorithm 17 3.3.1 Task Allocation 17 3.3.2 Multiple Iterative Algorithm 18 3.3.3 Task Allocation for General Setting 20 3.4 Applications 23 3.5 Analysis of Algorithms 25 3.5.1 Quality of Workers 25 3.5.2 Bound on the Average Error Probability 27 3.5.3 Proof of the Error Bounds 29 3.5.4 Proof of Sub-Gaussianity 32 3.6 Experimental Results 36 3.6.1 Comparison with Other Algorithms 37 3.6.2 Adaptive Scenario 38 3.6.3 Simulations on a Set of Various D Values 41 3.7 Conclusion 43 4 Crowdsourcing Systems for Multiple-choice Questions with K-Approval Voting 45 4.1 Related Work 47 4.2 Problem Setup 49 4.2.1 Problem Definition 49 4.2.2 Worker Model for Various (D, K) 50 4.3 Inference Algorithm 51 4.4 Analysis of Algorithms 53 4.4.1 Worker Model 55 4.4.2 Quality of Workers 56 4.4.3 Bound on the Average Error Probability 58 4.4.4 Proof of the Error Bounds 59 4.4.5 Proof of Sub-Gaussianity 62 4.4.6 Phase Transition 67 4.5 Experimental Results 68 4.5.1 Performance on the Average Error with q and l 68 4.5.2 Relationship between Reliability and y-message 69 4.5.3 Performance on the Average Error with Various (D, K) Pairs 69 4.6 Conclusion 72 5 Crowdsourcing Systems for Real-valued Vector Regression 73 5.1 Related Work 75 5.2 Problem Setup 77 5.3 Inference Algorithm 78 5.3.1 Task Message 79 5.3.2 Worker Message 80 5.4 Analysis of Algorithms 81 5.4.1 Worker Model 81 5.4.2 Oracle Estimator 84 5.4.3 Bound on the Average Error Probability 86 5.5 Experimental Results 91 5.5.1 Real Crowdsourcing Data 91 5.5.2 Verification of the Error Bounds with Synthetic data 96 5.6 Conclusion 98 6 Conclusions 99Docto

    ํŒŒ๋ผ๋ฏธํ„ฐ ํ•™์Šต ํ†ตํ•œ ๋ฐ์ดํ„ฐ ์žก์Œ ๋ฐ ๊ฐ„์„ญ๊ทน๋ณต ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021. 2. ์ •๊ต๋ฏผ.์ธ๊ณต์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์— ๋‹ค๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ์‹์€ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ฐ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋ถ„์•ผ์˜ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•˜๋Š”๋ฐ ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์œผ๋กœ ์ž๋ฆฌ๋งค๊น€ํ•˜์˜€๋‹ค. ๊ธฐ์กด ์‚ฌ๋žŒ์˜ ์ง๊ด€์œผ๋กœ ๋ชจ๋ธ์„ ์„ค์ •ํ•˜๋Š” ๋ฐฉ์‹๊ณผ ๋น„๊ตํ•˜์—ฌ ๋†’์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋‚˜, ํ•™์Šต๋ฐ์ดํ„ฐ์˜ ์–‘๊ณผ ํ’ˆ์งˆ์— ๋”ฐ๋ผ์„œ ๊ทธ ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ์ขŒ์šฐ๋œ๋‹ค. ์ด๋ ‡๊ฒŒ ์ธ๊ณต ์‹ ๊ฒฝ๋ง์„ ํšจ๊ณผ์ ์œผ๋กœ ํ›ˆ๋ จํ•˜๋ ค๋ฉด ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์œผ๋Š” ๊ฒƒ๊ณผ ๋ฐ์ดํ„ฐ์˜ ํ’ˆ์งˆ์„ ์ €ํ•˜์‹œํ‚ค๋Š” ์š”์ธ์„ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ผ๋ฒจ๋ง๋œ ๋ฐ์ดํ„ฐ์˜ ํ’ˆ์งˆ์„ ๊ฒฐ์ •ํ•˜๋Š” ์ฃผ์š” ์š”์ธ์œผ๋กœ ์•Œ๋ ค์ ธ ์žˆ๋Š” ์žก์Œ(Noise)๊ณผ ๊ฐ„์„ญ(Interference)์„ ๊ทน๋ณตํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ์—ฐ๊ตฌ์ž๋“ค์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์›น๊ธฐ๋ฐ˜์˜ ํฌ๋ผ์šฐ๋“œ ์†Œ์‹ฑ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์‚ฌ๋žŒ๋“ค๋กœ๋ถ€ํ„ฐ ๋‹ต๋ณ€์„ ์ˆ˜์ง‘ํ•˜์—ฌ ๋ฐ์ดํ„ฐ๊ทธ๋ฃน์„ ๊ตฌ์„ฑํ•œ๋‹ค\cite{simonyan2014very}. ๊ทธ๋Ÿฌ๋‚˜ ์‚ฌ๋žŒ๋“ค์˜ ๋‹ต๋ณ€์œผ๋กœ ์–ป๋Š” ๋ฐ์ดํ„ฐ๋Š” ์ž‘์—… ์ง€์นจ์— ๋Œ€ํ•œ ์˜คํ•ด, ์ฑ…์ž„ ๋ถ€์กฑ ๋ฐ ๊ณ ์œ ํ•œ ์˜ค๋ฅ˜๋กœ ์ธํ•ด์„œ ๋ฐ์ดํ„ฐ ์ž…๋ ฅ(Input)๊ณผ ์ถœ๋ ฅ(Target)์‚ฌ์ด์— ์žก์Œ์ด ํฌํ•จ๋œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ด๋ ‡๊ฒŒ ํฌ๋ผ์šฐ๋“œ ์†Œ์‹ฑ์„ ํ†ตํ•ด ๋ผ๋ฒจ๋ง๋œ ๋ฐ์ดํ„ฐ์— ์กด์žฌํ•˜๋Š” ์žก์Œ์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•œ ์ถ”๋ก  ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•œ๋‹ค. ๋‘๋ฒˆ์งธ๋กœ, ๋ชจ๋ธ์˜ ํ•™์Šต์„ฑ๋Šฅ์„ ์ €ํ•˜์‹œํ‚ค๋Š” ์š”์ธ์ธ ๋ฐ์ดํ„ฐ๊ฐ„์˜ ๊ฐ„์„ญ์„ ๋‹ค๋ฃฌ๋‹ค. ์žก์Œ์ด ์ œ๊ฑฐ๋˜์–ด ์ •์ œ๋œ ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ์„ ๋ผ๋ฒจ๋ง๋œ ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ์ด๋ผ๊ณ  ํ•˜๋ฉด, ํ•™์Šต์‹œ์— ์ƒ˜ํ”Œ๋“ค ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ์‚ฌ๋žŒ ์ˆ˜์ค€์˜ ์ธ๊ณต์ง€๋Šฅ์— ๋„๋‹ฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ•˜๋‚˜์˜ ๋ชจ๋ธ์ด ํ•˜๋‚˜์˜ ๋ฌธ์ œ๋งŒ์„ ํ•ด๊ฒฐํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์‹œ๊ฐ„์ƒ ์ˆœ์ฐจ์ ์œผ๋กœ ์ง๋ฉดํ•˜๋Š” ์—ฌ๋Ÿฌ ๋ฌธ์ œ๋ฅผ ๋™์‹œ์— ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์ƒํ™ฉ์—์„œ, ์ƒ˜ํ”Œ๋“ค ์‚ฌ์ด์— ๊ฐ„์„ญ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๊ณ , ํ•™๊ณ„์—์„œ๋Š” ์—ฐ์†ํ•™์Šต(Continual Learning)์—์„œ์˜ "Catastrophic Forgetting"๋˜๋Š” "Semantic Drift"์œผ๋กœ ์ •์˜ํ•˜๊ณ  ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๊ฐ„์„ญ์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋ฅผ ๋‹ค๋ฃฌ๋‹ค. ์•ž์„œ ์–ธ๊ธ‰ํ•œ ๋ฐ์ดํ„ฐ ์žก์Œ์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ฒซ ๋ฒˆ์งธ ์žฅ์—์„œ๋Š” ํฌ๋ผ์šฐ๋“œ ์†Œ์‹ฑ ์‹œ์Šคํ…œ์˜ ์ด์‚ฐ ๊ฐ๊ด€์‹ ๋ฐ ์‹ค์ˆ˜ ๋ฒกํ„ฐ ํšŒ๊ท€ ์ž‘์—…์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด ์ถ”๋ก  ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐ๊ฐ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํฌ๋ผ์šฐ๋“œ ์†Œ์‹ฑ ๋ชจ๋ธ์„ ๊ทธ๋ž˜ํ”„ ๋ชจ๋ธ(Graphical Model)๋กœ์„œ ์ƒ์ •ํ•˜๊ณ , ํ…Œ์Šคํฌ์™€ ๋‹ต๋ณ€์„ ์ฃผ๋Š” ์‚ฌ๋žŒ๋“ค๊ฐ„์˜ ๋‘ ๊ฐ€์ง€ ์œ ํ˜•์˜ ๋ฉ”์‹œ์ง€๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ฃผ๊ณ  ๋ฐ›์Œ์œผ๋กœ์จ ๊ฐ ์ž‘์—…์˜ ์ •๋‹ต๊ณผ ๊ฐ ์ž‘์—…์ž์˜ ์‹ ๋ขฐ์„ฑ์„ ์ถ”์ • ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ์ด๋“ค์˜ ํ‰๊ท  ์„ฑ๋Šฅ์€ ํ™•๋ฅ ์  ๊ตฐ์ค‘ ๋ชจ๋ธ์„ ์ด์šฉํ•˜์—ฌ ๋ถ„์„ํ•˜๊ณ  ์ž…์ฆํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์„ฑ๋Šฅ์—๋Ÿฌ ํ•œ๊ณ„๋Š” ์ž‘์—…๋‹น ํ• ๋‹น๋˜๋Š” ์‚ฌ๋žŒ๋“ค์˜ ์ˆ˜์™€ ์ž‘์—…์ž์˜ ํ‰๊ท  ์‹ ๋ขฐ์„ฑ์˜ํ•ด ๊ฒฐ์ •๋œ๋‹ค. ์‚ฌ๋žŒ๋“ค์˜ ํ‰๊ท  ์‹ ๋ขฐ๋„๊ฐ€ ์ผ์ • ์ˆ˜์ค€์„ ๋„˜์–ด์„œ๋ฉด, ์ œ์•ˆ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํ‰๊ท  ์„ฑ๋Šฅ์€ ๋ชจ๋“  ์ž‘์—…์ž์˜ ์‹ ๋ขฐ์„ฑ์„ ์•Œ๊ณ ์žˆ๋Š” ์˜ค๋ผํด ์ถ”์ •๊ธฐ (์ด๋ก ์ ์ธ ํ•œ๊ณ„)์— ์ˆ˜๋ ดํ•œ๋‹ค. ์‹ค์ œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์™€ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ ์„ธํŠธ ๋ชจ๋‘์— ๋Œ€ํ•œ ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด, ์ œ์•ˆ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์‹ค์ œ ์„ฑ๋Šฅ์ด ์ด์ „์˜ state-of-the-art ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค ๋ณด๋‹ค ์šฐ์ˆ˜ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์ž…์ฆํ•œ๋‹ค. ๋…ผ๋ฌธ์˜ ๋‘ ๋ฒˆ์งธ ์žฅ์—์„œ๋Š” ์—ฐ์†ํ•™์Šต์ƒํ™ฉ์—์„œ ๋ฐ์ดํ„ฐ์ƒ˜ํ”Œ์‚ฌ์ด์— ๋ฐœ์ƒํ•˜๋Š” ๊ฐ„์„ญ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ํ•ญ์ƒ์„ฑ๊ธฐ๋ฐ˜์˜ ๋ฉ”ํƒ€ ํ•™์Šต ๊ตฌ์กฐ (Homeostatic Meta Model)๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์ด์ „ ํ…Œ์Šคํฌ ์ค‘์š”ํ•œ ํ•™์Šต ๋ณ€์ˆ˜๋ฅผ ์ฐพ๊ณ  ์ •๊ทœํ™”์— ์„ ๋ณ„์ ์œผ๋กœ ์ ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋Š”๋ฐ, ์ œ์•ˆ๋œ ๋ชจ๋ธ์€ ์ด๋Ÿฌํ•œ ์ •๊ทœํ™”์˜ ๊ฐ•๋„๋ฅผ ์ž๋™์œผ๋กœ ์ œ์–ดํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ธฐ๋ฒ•์€ ์ƒˆ๋กœ์šด ํ•™์Šต์„ ์ง„ํ–‰ํ•  ๋•Œ ์ด์ „์— ํš๋“ํ•œ ์ง€์‹์„ ์ตœ์†Œํ•œ์œผ๋กœ ์žƒ์–ด๋ฒ„๋ฆฌ๋„๋ก ์ธ๊ณต์‹ ๊ฒฝ๋ง์˜ ํ•™์Šต์„ ์œ ๋„ํ•œ๋‹ค. ๋‹ค์–‘ํ•œ ์œ ํ˜•์˜ ์—ฐ์† ํ•™์Šต ๊ณผ์ œ์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์„ ๊ฒ€์ฆํ•˜๋Š”๋ฐ, ์‹คํ—˜์ ์œผ๋กœ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ํ•™์Šต์˜ ๊ฐ„์„ญ์™„ํ™” ์ธก๋ฉด์—์„œ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ณด๋‹ค ์šฐ์ˆ˜ํ•˜๋‹ค๋Š” ์ ์„ ๋ณด์ธ๋‹ค.๋˜ํ•œ ๊ธฐ์กด ์‹œ๋ƒ…์Šค ๊ฐ€์†Œ์„ฑ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋“ค์— ๋น„ํ•ด ์ƒ๋Œ€์ ์œผ๋กœ ๋ณ€ํ™”์— ๊ฐ•์ธํ•˜๋‹ค.์ œ์•ˆ๋œ ๋ชจ๋ธ์— ์˜ํ•ด ์ƒ์„ฑ๋œ ์ •๊ทœํ™”์˜ ๊ฐ•๋„ ๊ฐ’์€ ์‹œ๋ƒ…์Šค์—์„œ ํ•ญ์ƒ์„ฑ ์˜ ์Œ์˜ ํ”ผ๋“œ๋ฐฑ ๋ฉ”์ปค๋‹ˆ์ฆ˜๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ, ํŠน์ • ๋ฒ”์œ„ ๋‚ด์—์„œ ๋Šฅ๋™์ ์œผ๋กœ ์ œ์–ด๋œ๋‹ค.Data-driven approaches based on neural networks have emerged as new paradigm to solve problems in computer vision and natural language processing fields. These approaches achieve better performance compared to existing human-design approaches (heuristic), however, these performance gains solely relies on a large amount of high quality labeled data. Accordingly, it is important to collect a large amount of data and improve the quality of data by analyzing degrading factors in order to well-train a model. In this dissertation, I propose iterative algorithms to relieve noise of labeled data in crowdsourcing system and meta architecture to alleviate interference among them in continual learning scenarios respectively. Researchers generally collect data using crowdsourcing system which utilizes human evaluations. However, human annotators' decisions may vary significantly due to misconceptions of task instructions, the lack of responsibility, and inherent noise. To relieve the noise in responses from crowd annotators, I propose novel inference algorithms for discrete multiple choice and real-valued vector regression tasks. Web-based crowdsourcing platforms are widely used for collecting large amount of labeled data. Due to low-paid workers and inherent noise, the quality of acquired data could be easily degraded. The proposed algorithms can overcome the noise by estimating the true answer of each task and a reliability of each worker updating two types of messages iteratively. For performance guarantee, the performances of the algorithms are theoretically proved under probabilistic crowd model. Interestingly, their performance bounds depend on the number of queries per task and the average quality of workers. Under a certain condition, each average performance becomes close to an oracle estimator which knows the reliability of every worker (theoretical upper bound). Through extensive experiments with both real-world and synthetic datasets, the practical performance of algorithms are verified. In fact, they are superior to other state-of-the-art algorithms. Second, when a model learns a sequence of tasks one by one (continual learning), previously learned knowledge may conflict with new knowledge. It is well-known phenomenon called "Catastrophic Forgetting" or "Semantic Drift". In this dissertation, we call the phenomena "Interference" since it occurs between two knowledge from labeled data separated in time. It is essential to control the amount of noise and interference for neural network to be well-trained. In the second part of dissertation, to solve the Interference among labeled data from consecutive tasks in continual learning scenario, a homeostasis-inspired meta learning architecture (HM) is proposed. The HM automatically controls the intensity of regularization (IoR) by capturing important parameters from the previous tasks and the current learning direction. By adjusting IoR, a learner can balance the amount of interference and degrees of freedom for its current learning. Experimental results are provided on various types of continual learning tasks. Those results show that the proposed method notably outperforms the conventional methods in terms of average accuracy and amount of the interference. In experiments, I verify that HM is relatively stable and robust compared to the existing Synaptic Plasticity based methods. Interestingly, the IoR generated by HM appears to be proactively controlled within a certain range, which resembles a negative feedback mechanism of homeostasis in synapses.Contents Abstract Contents List of Tables List of Figures 1 INTRODUCTION 1 2 Reliable multiple-choice iterative algorithm for crowdsourcing systems 6 2.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 Task Allocation . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 Multiple Iterative Algorithm . . . . . . . . . . . . . . . . . . 8 2.2.3 Task Allocation for General Setting . . . . . . . . . . . . . . 10 2.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 Analysis of algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.1 Quality of workers . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.2 Bound on the Average Error Probability . . . . . . . . . . . . 18 2.4.3 Proof of the Theorem 1 . . . . . . . . . . . . . . . . . . . . . 20 2.4.4 Proof of Sub-Gaussianity . . . . . . . . . . . . . . . . . . . . 22 2.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 iii2.6 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3 Reliable Aggregation Method for Vector Regression in Crowdsourcing 38 3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2 Inference Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2.1 Task Message . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.2 Worker Message . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3.1 Real crowdsourcing data . . . . . . . . . . . . . . . . . . . . 43 3.4 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.4.1 Dirichlet crowd model . . . . . . . . . . . . . . . . . . . . . 48 3.4.2 Error Bound . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4.3 Optimality of Oracle Estimator . . . . . . . . . . . . . . . . . 51 3.4.4 Performance Proofs . . . . . . . . . . . . . . . . . . . . . . . 52 3.5 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4 Homeostasis-Inspired Meta Continual Learning 60 4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.1.1 Continual Learning . . . . . . . . . . . . . . . . . . . . . . . 60 4.1.2 Meta Learning . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2 Homeostatic Meta-Model . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3 Preliminary Experiments and Findings . . . . . . . . . . . . . . . . . 66 4.3.1 Block-wise Permutation . . . . . . . . . . . . . . . . . . . . 67 4.3.2 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . 68 4.4 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4.3 Overall Performance . . . . . . . . . . . . . . . . . . . . . . 70 4.5 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 iv4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5 Conclusion 78 Abstract (In Korean) 89Docto

    Machine learning from crowds a systematic review of its applications

    Get PDF
    Crowdsourcing opens the door to solving a wide variety of problems that previ-ously were unfeasible in the field of machine learning, allowing us to obtain rela-tively low cost labeled data in a small amount of time. However, due to theuncertain quality of labelers, the data to deal with are sometimes unreliable, forcingpractitioners to collect information redundantly, which poses new challenges in thefield. Despite these difficulties, many applications of machine learning usingcrowdsourced data have recently been published that achieved state of the artresults in relevant problems. We have analyzed these applications following a sys-tematic methodology, classifying them into different fields of study, highlightingseveral of their characteristics and showing the recent interest in the use of crowd-sourcing for machine learning. We also identify several exciting research linesbased on the problems that remain unsolved to foster future research in this field

    Considering Human Aspects on Strategies for Designing and Managing Distributed Human Computation

    Full text link
    A human computation system can be viewed as a distributed system in which the processors are humans, called workers. Such systems harness the cognitive power of a group of workers connected to the Internet to execute relatively simple tasks, whose solutions, once grouped, solve a problem that systems equipped with only machines could not solve satisfactorily. Examples of such systems are Amazon Mechanical Turk and the Zooniverse platform. A human computation application comprises a group of tasks, each of them can be performed by one worker. Tasks might have dependencies among each other. In this study, we propose a theoretical framework to analyze such type of application from a distributed systems point of view. Our framework is established on three dimensions that represent different perspectives in which human computation applications can be approached: quality-of-service requirements, design and management strategies, and human aspects. By using this framework, we review human computation in the perspective of programmers seeking to improve the design of human computation applications and managers seeking to increase the effectiveness of human computation infrastructures in running such applications. In doing so, besides integrating and organizing what has been done in this direction, we also put into perspective the fact that the human aspects of the workers in such systems introduce new challenges in terms of, for example, task assignment, dependency management, and fault prevention and tolerance. We discuss how they are related to distributed systems and other areas of knowledge.Comment: 3 figures, 1 tabl

    Enhancing the use of online 3d multimedia content through the analysis of user interactions

    Get PDF
    De plus en plus de contenus 3D interactifs sont disponibles sur la toile. Visualiser et manipuler ces contenus 3D en temps rรฉel, de faรงon naturelle et intuitive, devient donc une nรฉcessitรฉ. Les applications visรฉes sont nombreuses : le e-commerce, l'รฉducation et la formation en ligne, la conception, ou l'architecture dans le contexte par exemple de musรฉes virtuels ou de communautรฉs virtuelles. L'utilisation de contenus 3D en ligne ne propose pas de remplacer les contenus traditionnels, tels que les textes, les images ou les vidรฉos, mais plutรดt d'utiliser la 3D en complรฉment, pour enrichir ces contenus. La toile est dรฉsormais une plate-forme oรน les contenus hypertexte, hypermรฉdia, et 3D sont simultanรฉment disponibles pour les utilisateurs. Cette utilisation des contenus 3D pose cependant deux questions principales. Tout d'abord, les interactions 3D sont souvent lourdes puisqu'elles comprennent de nombreux degrรฉs de libertรฉ; la navigation dans les contenus 3D peut s'en trouver inefficace et lente. Nous abordons ce problรจme en proposant un nouveau paradigme basรฉ sur l'analyse des interactions (crowdsourcing). En analysant les interactions d'utilisateurs 3D, nous identifions des rรฉgions d'intรฉrรชt (ROI), et gรฉnรฉrons des recommandations pour les utilisateurs suivants. Ces recommandations permettent ร  la fois de rรฉduire le temps d'interaction pour identifier une ROI d'un objet 3D et รฉgalement de simplifier les interactions 3D nรฉcessaires. De plus, les scรจnes ou objets 3D contiennent une information visuelle riche. Les sites Web traditionnels contiennent, eux, principalement des informations descriptives (textuelles) ainsi que des hyperliens pour permettre la navigation. Des sites contenants d'une part de l'information textuelle, et d'autre part de l'information 3D peuvent s'avรฉrer difficile ร  apprรฉhender pour les utilisateurs. Pour permettre une navigation cohรฉrente entre les informations 3D et textuelles, nous proposons d'utiliser le crowdsourcing pour la construction d'associations sรฉmantiques entre le texte et la visualisation en 3D. Les liens produits sont proposรฉs aux utilisateurs suivants pour naviguer facilement vers un point de vue d'un objet 3D associรฉ ร  un contenu textuel. Nous รฉvaluons ces deux mรฉthodes par des รฉtudes expรฉrimentales. Les รฉvaluations montrent que les recommandations rรฉduisent le temps d'interaction 3D. En outre, les utilisateurs apprรฉcient l'association sรฉmantique proposรฉe, c'est-ร -dire, une majoritรฉ d'utilisateurs indique que les recommandations ont รฉtรฉ utiles pour eux, et prรฉfรจrent la navigation en 3D proposรฉe qui consiste ร  utiliser les liens sรฉmantiques ainsi que la souris par rapport ร  des interactions utilisant seulement la souris. ABSTRACT : Recent years have seen the development of interactive 3D graphics on the Web. The ability to visualize and manipulate 3D content in real time seems to be the next evolution of the Web for a wide number of application areas such as e-commerce, education and training, architecture design, virtual museums and virtual communities. The use of online 3D graphics in these application domains does not mean to substitute traditional web content of texts, images and videos, but rather acts as a complement for it. The Web is now a platform where hypertext, hypermedia, and 3D graphics are simultaneously available to users. This use of online 3D graphics, however, poses two main issues. First, since 3D interactions are cumbersome as they provide numerous degrees of freedom, 3D browsing may be inefficient. We tackle this problem by proposing a new paradigm based on crowdsourcing to ease online 3D interactions, that consists of analyzing 3D user interactions to identify Regions of Interest (ROIs), and generating recommendations to subsequent users. The recommendations both reduce 3D browsing time and simplify 3D interactions. Second, 3D graphics contain purely rich visual information of the concepts. On the other hand, traditional websites mainly contain descriptive information (text) with hyperlinks as navigation means. The problem is that viewing and interacting with the websites that use two very different mediums (hypertext and 3D graphics) may be complicated for users. To address this issue, we propose to use crowdsourcing for building semantic associations between texts and 3D visualizations. The produced links are suggested to upcoming users so that they can readily locate 3D visualization associated with a textual content. We evaluate the proposed methods with experimental user studies. The evaluations show that the recommendations reduce 3D interaction time. Moreover, the results from the user study showed that our proposed semantic association is appreciated by users, that is, a majority of users assess that recommendations were helpful for them, and browsing 3D objects using both mouse interactions and the proposed links is preferred compared to having only mouse interactions

    Essays In Algorithmic Market Design Under Social Constraints

    Get PDF
    Rapid technological advances over the past few decades---in particular, the rise of the internet---has significantly reshaped and expanded the meaning of our everyday social activities, including our interactions with our social circle, the media, and our political and economic activities This dissertation aims to tackle some of the unique societal challenges underlying the design of automated online platforms that interact with people and organizations---namely, those imposed by legal, ethical, and strategic considerations. I narrow down attention to fairness considerations, learning with repeated trials, and competition for market share. In each case, I investigate the broad issue in a particular context (i.e. online market), and present the solution my research offers to the problem in that application. Addressing interdisciplinary problems, such as the ones in this dissertation, requires drawing ideas and techniques from various disciplines, including theoretical computer science, microeconomics, and applied statistics. The research presented here utilizes a combination of theoretical and data analysis tools to shed light on some of the key challenges in designing algorithms for today\u27s online markets, including crowdsourcing and labor markets, online advertising, and social networks among others
    corecore