165 research outputs found

    Considering Human Aspects on Strategies for Designing and Managing Distributed Human Computation

    Full text link
    A human computation system can be viewed as a distributed system in which the processors are humans, called workers. Such systems harness the cognitive power of a group of workers connected to the Internet to execute relatively simple tasks, whose solutions, once grouped, solve a problem that systems equipped with only machines could not solve satisfactorily. Examples of such systems are Amazon Mechanical Turk and the Zooniverse platform. A human computation application comprises a group of tasks, each of them can be performed by one worker. Tasks might have dependencies among each other. In this study, we propose a theoretical framework to analyze such type of application from a distributed systems point of view. Our framework is established on three dimensions that represent different perspectives in which human computation applications can be approached: quality-of-service requirements, design and management strategies, and human aspects. By using this framework, we review human computation in the perspective of programmers seeking to improve the design of human computation applications and managers seeking to increase the effectiveness of human computation infrastructures in running such applications. In doing so, besides integrating and organizing what has been done in this direction, we also put into perspective the fact that the human aspects of the workers in such systems introduce new challenges in terms of, for example, task assignment, dependency management, and fault prevention and tolerance. We discuss how they are related to distributed systems and other areas of knowledge.Comment: 3 figures, 1 tabl

    Eliciting and Leveraging Input Diversity in Crowd-Powered Intelligent Systems

    Full text link
    Collecting high quality annotations plays a crucial role in supporting machine learning algorithms, and thus, the creation of intelligent systems. Over the past decade, crowdsourcing has become a widely adopted means of manually creating annotations for various intelligent tasks, spanning from object boundary detection in images to sentiment understanding in text. This thesis presents new crowdsourcing workflows and answer aggregation algorithms that can effectively and efficiently improve collective annotation quality from crowd workers. While conventional microtask crowdsourcing approaches generally focus on improving annotation quality by promoting consensus among workers, this thesis proposes a novel concept of a diversity-driven approach. We show that leveraging diversity in workers' responses is effective in improving the accuracy of aggregate annotations because it compensates for biases or uncertainty caused by the system, tool, or the data. We then present techniques that elicit the diversity in workers' responses. These techniques are orthogonal to other quality control methods, such as filtering, training or incentives, which means they can be used in combination with existing methods. The crowd-powered intelligent systems presented in this thesis are evaluated through visual perception tasks in order to demonstrate the effectiveness of our proposed approach. The advantage of our approach is an improvement in collective quality even in settings where worker skill may vary widely, potentially lowering barriers to entry for novice workers and making it easier for requesters to find workers who can make productive contributions. This thesis demonstrates that crowd workers' input diversity can be a useful property that yields better aggregate performance than any homogeneous set of input.PHDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/153428/1/jyskwon_1.pd

    Finish Them!: Pricing Algorithms for Human Computation

    Full text link
    Given a batch of human computation tasks, a commonly ignored aspect is how the price (i.e., the reward paid to human workers) of these tasks must be set or varied in order to meet latency or cost constraints. Often, the price is set up-front and not modified, leading to either a much higher monetary cost than needed (if the price is set too high), or to a much larger latency than expected (if the price is set too low). Leveraging a pricing model from prior work, we develop algorithms to optimally set and then vary price over time in order to meet a (a) user-specified deadline while minimizing total monetary cost (b) user-specified monetary budget constraint while minimizing total elapsed time. We leverage techniques from decision theory (specifically, Markov Decision Processes) for both these problems, and demonstrate that our techniques lead to upto 30\% reduction in cost over schemes proposed in prior work. Furthermore, we develop techniques to speed-up the computation, enabling users to leverage the price setting algorithms on-the-fly

    Cybersecurity issues in software architectures for innovative services

    Get PDF
    The recent advances in data center development have been at the basis of the widespread success of the cloud computing paradigm, which is at the basis of models for software based applications and services, which is the "Everything as a Service" (XaaS) model. According to the XaaS model, service of any kind are deployed on demand as cloud based applications, with a great degree of flexibility and a limited need for investments in dedicated hardware and or software components. This approach opens up a lot of opportunities, for instance providing access to complex and widely distributed applications, whose cost and complexity represented in the past a significant entry barrier, also to small or emerging businesses. Unfortunately, networking is now embedded in every service and application, raising several cybersecurity issues related to corruption and leakage of data, unauthorized access, etc. However, new service-oriented architectures are emerging in this context, the so-called services enabler architecture. The aim of these architectures is not only to expose and give the resources to these types of services, but it is also to validate them. The validation includes numerous aspects, from the legal to the infrastructural ones e.g., but above all the cybersecurity threats. A solid threat analysis of the aforementioned architecture is therefore necessary, and this is the main goal of this thesis. This work investigate the security threats of the emerging service enabler architectures, providing proof of concepts for these issues and the solutions too, based on several use-cases implemented in real world scenarios

    The Federal Big Data Research and Development Strategic Plan

    Get PDF
    This document was developed through the contributions of the NITRD Big Data SSG members and staff. A special thanks and appreciation to the core team of editors, writers, and reviewers: Lida Beninson (NSF), Quincy Brown (NSF), Elizabeth Burrows (NSF), Dana Hunter (NSF), Craig Jolley (USAID), Meredith Lee (DHS), Nishal Mohan (NSF), Chloe Poston (NSF), Renata Rawlings-Goss (NSF), Carly Robinson (DOE Science), Alejandro Suarez (NSF), Martin Wiener (NSF), and Fen Zhao (NSF). A national Big Data1 innovation ecosystem is essential to enabling knowledge discovery from and confident action informed by the vast resource of new and diverse datasets that are rapidly becoming available in nearly every aspect of life. Big Data has the potential to radically improve the lives of all Americans. It is now possible to combine disparate, dynamic, and distributed datasets and enable everything from predicting the future behavior of complex systems to precise medical treatments, smart energy usage, and focused educational curricula. Government agency research and public-private partnerships, together with the education and training of future data scientists, will enable applications that directly benefit society and the economy of the Nation. To derive the greatest benefits from the many, rich sources of Big Data, the Administration announced a “Big Data Research and Development Initiative” on March 29, 2012.2 Dr. John P. Holdren, Assistant to the President for Science and Technology and Director of the Office of Science and Technology Policy, stated that the initiative “promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security.” The Federal Big Data Research and Development Strategic Plan (Plan) builds upon the promise and excitement of the myriad applications enabled by Big Data with the objective of guiding Federal agencies as they develop and expand their individual mission-driven programs and investments related to Big Data. The Plan is based on inputs from a series of Federal agency and public activities, and a shared vision: We envision a Big Data innovation ecosystem in which the ability to analyze, extract information from, and make decisions and discoveries based upon large, diverse, and real-time datasets enables new capabilities for Federal agencies and the Nation at large; accelerates the process of scientific discovery and innovation; leads to new fields of research and new areas of inquiry that would otherwise be impossible; educates the next generation of 21st century scientists and engineers; and promotes new economic growth. The Plan is built around seven strategies that represent key areas of importance for Big Data research and development (R&D). Priorities listed within each strategy highlight the intended outcomes that can be addressed by the missions and research funding of NITRD agencies. These include advancing human understanding in all branches of science, medicine, and security; ensuring the Nation’s continued leadership in research and development; and enhancing the Nation’s ability to address pressing societal and environmental issues facing the Nation and the world through research and development

    The Federal Big Data Research and Development Strategic Plan

    Get PDF
    This document was developed through the contributions of the NITRD Big Data SSG members and staff. A special thanks and appreciation to the core team of editors, writers, and reviewers: Lida Beninson (NSF), Quincy Brown (NSF), Elizabeth Burrows (NSF), Dana Hunter (NSF), Craig Jolley (USAID), Meredith Lee (DHS), Nishal Mohan (NSF), Chloe Poston (NSF), Renata Rawlings-Goss (NSF), Carly Robinson (DOE Science), Alejandro Suarez (NSF), Martin Wiener (NSF), and Fen Zhao (NSF). A national Big Data1 innovation ecosystem is essential to enabling knowledge discovery from and confident action informed by the vast resource of new and diverse datasets that are rapidly becoming available in nearly every aspect of life. Big Data has the potential to radically improve the lives of all Americans. It is now possible to combine disparate, dynamic, and distributed datasets and enable everything from predicting the future behavior of complex systems to precise medical treatments, smart energy usage, and focused educational curricula. Government agency research and public-private partnerships, together with the education and training of future data scientists, will enable applications that directly benefit society and the economy of the Nation. To derive the greatest benefits from the many, rich sources of Big Data, the Administration announced a “Big Data Research and Development Initiative” on March 29, 2012.2 Dr. John P. Holdren, Assistant to the President for Science and Technology and Director of the Office of Science and Technology Policy, stated that the initiative “promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security.” The Federal Big Data Research and Development Strategic Plan (Plan) builds upon the promise and excitement of the myriad applications enabled by Big Data with the objective of guiding Federal agencies as they develop and expand their individual mission-driven programs and investments related to Big Data. The Plan is based on inputs from a series of Federal agency and public activities, and a shared vision: We envision a Big Data innovation ecosystem in which the ability to analyze, extract information from, and make decisions and discoveries based upon large, diverse, and real-time datasets enables new capabilities for Federal agencies and the Nation at large; accelerates the process of scientific discovery and innovation; leads to new fields of research and new areas of inquiry that would otherwise be impossible; educates the next generation of 21st century scientists and engineers; and promotes new economic growth. The Plan is built around seven strategies that represent key areas of importance for Big Data research and development (R&D). Priorities listed within each strategy highlight the intended outcomes that can be addressed by the missions and research funding of NITRD agencies. These include advancing human understanding in all branches of science, medicine, and security; ensuring the Nation’s continued leadership in research and development; and enhancing the Nation’s ability to address pressing societal and environmental issues facing the Nation and the world through research and development

    Streamlining code smells: Using collective intelligence and visualization

    Get PDF
    Context. Code smells are seen as major source of technical debt and, as such, should be detected and removed. Code smells have long been catalogued with corresponding mitigating solutions called refactoring operations. However, while the latter are supported in current IDEs (e.g., Eclipse), code smells detection scaffolding has still many limitations. Researchers argue that the subjectiveness of the code smells detection process is a major hindrance to mitigate the problem of smells-infected code. Objective. This thesis presents a new approach to code smells detection that we have called CrowdSmelling and the results of a validation experiment for this approach. The latter is based on supervised machine learning techniques, where the wisdom of the crowd (of software developers) is used to collectively calibrate code smells detection algorithms, thereby lessening the subjectivity issue. Method. In the context of three consecutive years of a Software Engineering course, a total “crowd” of around a hundred teams, with an average of three members each, classified the presence of 3 code smells (Long Method, God Class, and Feature Envy) in Java source code. These classifications were the basis of the oracles used for training six machine learning algorithms. Over one hundred models were generated and evaluated to determine which machine learning algorithms had the best performance in detecting each of the aforementioned code smells. Results. Good performances were obtained for God Class detection (ROC=0.896 for Naive Bayes) and Long Method detection (ROC=0.870 for AdaBoostM1), but much lower for Feature Envy (ROC=0.570 for Random Forrest). Conclusions. Obtained results suggest that Crowdsmelling is a feasible approach for the detection of code smells, but further validation experiments are required to cover more code smells and to increase external validityContexto. Os cheiros de código são a principal causa de dívida técnica (technical debt), como tal, devem ser detectados e removidos. Os cheiros de código já foram há muito tempo catalogados juntamente com as correspondentes soluções mitigadoras chamadas operações de refabricação (refactoring). No entanto, embora estas últimas sejam suportadas nas IDEs actuais (por exemplo, Eclipse), a deteção de cheiros de código têm ainda muitas limitações. Os investigadores argumentam que a subjectividade do processo de deteção de cheiros de código é um dos principais obstáculo à mitigação do problema da qualidade do código. Objectivo. Esta tese apresenta uma nova abordagem à detecção de cheiros de código, a que chamámos CrowdSmelling, e os resultados de uma experiência de validação para esta abordagem. A nossa abordagem de CrowdSmelling baseia-se em técnicas de aprendizagem automática supervisionada, onde a sabedoria da multidão (dos programadores de software) é utilizada para calibrar colectivamente algoritmos de detecção de cheiros de código, diminuindo assim a questão da subjectividade. Método. Em três anos consecutivos, no âmbito da Unidade Curricular de Engenharia de Software, uma "multidão", num total de cerca de uma centena de equipas, com uma média de três membros cada, classificou a presença de 3 cheiros de código (Long Method, God Class, and Feature Envy) em código fonte Java. Estas classificações foram a base dos oráculos utilizados para o treino de seis algoritmos de aprendizagem automática. Mais de cem modelos foram gerados e avaliados para determinar quais os algoritmos de aprendizagem de máquinas com melhor desempenho na detecção de cada um dos cheiros de código acima mencionados. Resultados. Foram obtidos bons desempenhos na detecção do God Class (ROC=0,896 para Naive Bayes) e na detecção do Long Method (ROC=0,870 para AdaBoostM1), mas muito mais baixos para Feature Envy (ROC=0,570 para Random Forrest). Conclusões. Os resultados obtidos sugerem que o Crowdsmelling é uma abordagem viável para a detecção de cheiros de código, mas são necessárias mais experiências de validação para cobrir mais cheiros de código e para aumentar a validade externa

    Advancements in the Elicitation and Aggregation of Private Information

    Get PDF
    There are many situations where one might be interested in eliciting and aggregating the private information of a group of agents. For example, a recommendation system might suggest recommendations based on the aggregate opinions of a group of like-minded agents, or a decision maker might take a decision based on the aggregate forecasts from a group of experts. When agents are self-interested, they are not necessarily honest when reporting their private information. For example, agents who have a reputation to protect might tend to produce forecasts near the most likely group consensus, whereas agents who have a reputation to build might tend to overstate the probabilities of outcomes they feel will be understated in a possible consensus. Therefore, economic incentives are necessary to incentivize self-interested agents to honestly report their private information. Our first contribution in this thesis is a scoring method to induce honest reporting of an answer to a multiple-choice question. We formally show that, in the presence of social projection, one can induce honest reporting in this setting by comparing reported answers and rewarding agreements. Our experimental results show that the act of encouraging honest reporting through the proposed scoring method results in more accurate answers than when agents have no direct incentives for expressing their true answers. Our second contribution is about how to incentivize honest reporting when private information are subjective probabilities (beliefs). Proper scoring rules are traditional scoring methods that incentivize honest reporting of subjective probabilities, where the expected score received by an agent is maximized when that agent reports his true belief. An implicit assumption behind proper scoring rules is that agents are risk neutral. In an experiment involving proper scoring rules, we find that human beings fail to be risk neutral. We then start our discussion on how to adapt proper scoring rules to cumulative prospect theory, a modern theory of choice under uncertainty. We explain why a property called comonotonicity is a sufficient condition for proper scoring rules to be indeed proper under cumulative prospect theory. Moreover, we show how to construct a comonotonic proper scoring rule from any traditional proper scoring rule. We also propose a new approach that uses non-deterministic payments based on proper scoring rules to elicit an agent's true belief when the components that drive the agent's attitude towards uncertainty are unknown. After agents report their private information, there is still the question on how to aggregate the reported information. Our third contribution in this thesis is an empirical study on the influence of the number of agents on the quality of the aggregate information in a crowdsourcing setting. We find that both the expected error in the aggregate information as well as the risk of a poor combination of agents decrease as the number of agents increases. Moreover, we find that the top-performing agents are consistent across multiple tasks, whereas worst-performing agents tend to be inconsistent. Our final contribution in this thesis is a pooling method to aggregate reported beliefs. Intuitively, the proposed method works as if the agents were continuously updating their beliefs in order to accommodate the expertise of others. Each updated belief takes the form of a linear opinion pool, where the weight that an agent assigns to a peer's belief is inversely related to the distance between their beliefs. In other words, agents are assumed to prefer beliefs that are close to their own beliefs. We prove that such an updating process leads to consensus, i.e., the agents all converge towards the same belief. Further, we show that if risk-neutral agents are rewarded using the quadratic scoring rule, then the assumption that they prefer beliefs that are close to their own beliefs follows naturally. We empirically demonstrate the effectiveness of the proposed method using real-world data. In particular, the results of our experiment show that the proposed method outperforms the traditional unweighted average approach and another distance-based method when measured in terms of both overall accuracy and absolute error
    corecore