46 research outputs found

    An Incentive Compatible Multi-Armed-Bandit Crowdsourcing Mechanism with Quality Assurance

    Full text link
    Consider a requester who wishes to crowdsource a series of identical binary labeling tasks to a pool of workers so as to achieve an assured accuracy for each task, in a cost optimal way. The workers are heterogeneous with unknown but fixed qualities and their costs are private. The problem is to select for each task an optimal subset of workers so that the outcome obtained from the selected workers guarantees a target accuracy level. The problem is a challenging one even in a non strategic setting since the accuracy of aggregated label depends on unknown qualities. We develop a novel multi-armed bandit (MAB) mechanism for solving this problem. First, we propose a framework, Assured Accuracy Bandit (AAB), which leads to an MAB algorithm, Constrained Confidence Bound for a Non Strategic setting (CCB-NS). We derive an upper bound on the number of time steps the algorithm chooses a sub-optimal set that depends on the target accuracy level and true qualities. A more challenging situation arises when the requester not only has to learn the qualities of the workers but also elicit their true costs. We modify the CCB-NS algorithm to obtain an adaptive exploration separated algorithm which we call { \em Constrained Confidence Bound for a Strategic setting (CCB-S)}. CCB-S algorithm produces an ex-post monotone allocation rule and thus can be transformed into an ex-post incentive compatible and ex-post individually rational mechanism that learns the qualities of the workers and guarantees a given target accuracy level in a cost optimal way. We provide a lower bound on the number of times any algorithm should select a sub-optimal set and we see that the lower bound matches our upper bound upto a constant factor. We provide insights on the practical implementation of this framework through an illustrative example and we show the efficacy of our algorithms through simulations

    A Framework for Quality Assurance in Crowdsourcing

    Get PDF
    The emergence of online paid micro-crowdsourcing platforms, such as Amazon Mechanical Turk (AMT), allows on-demand and at scale distribution of tasks to human workers around the world. In such settings, online workers come and complete small tasks posted by a company, working for as long or as little as they wish. Such temporary employer-employee relationships give rise to adverse selection, moral hazard, and many other challenges. How can we ensure that the submitted work is accurate, especially when the verification cost is comparable to the cost of performing the task? How can we estimate the exhibited quality of the workers? What pricing strategies should be used to induce the effort of workers with varying ability levels? We develop a comprehensive framework for managing the quality in such micro crowdsourcing settings: First, we describe an algorithm for estimating the error rates of the participating workers, and show how to separate systematic worker biases from unrecoverable errors and generate an unbiased “worker quality” measurement. Next, we present a selective repeated-labeling algorithm that acquires labels in a way so that quality requirements can be met at minimum cost. Then, we propose a quality-adjusted pricing scheme that adjusts the payment level according to the contributed value by each worker. We test our compensation scheme in a principal-agent setting in which workers respond to incentives by varying their effort. Our simulation results demonstrate that the proposed pricing scheme is able to induce workers to exert higher levels of effort and yield larger profits for employers compared to the commonly adopted uniform pricing schemes. We also describe strategies that build on our quality control and pricing framework, to tackle crowdsourced tasks of increasingly higher complexity, while still maintaining a tight quality control of the process

    Mining the Medical and Patent Literature to Support Healthcare and Pharmacovigilance

    Get PDF
    Recent advancements in healthcare practices and the increasing use of information technology in the medical domain has lead to the rapid generation of free-text data in forms of scientific articles, e-health records, patents, and document inventories. This has urged the development of sophisticated information retrieval and information extraction technologies. A fundamental requirement for the automatic processing of biomedical text is the identification of information carrying units such as the concepts or named entities. In this context, this work focuses on the identification of medical disorders (such as diseases and adverse effects) which denote an important category of concepts in the medical text. Two methodologies were investigated in this regard and they are dictionary-based and machine learning-based approaches. Futhermore, the capabilities of the concept recognition techniques were systematically exploited to build a semantic search platform for the retrieval of e-health records and patents. The system facilitates conventional text search as well as semantic and ontological searches. Performance of the adapted retrieval platform for e-health records and patents was evaluated within open assessment challenges (i.e. TRECMED and TRECCHEM respectively) wherein the system was best rated in comparison to several other competing information retrieval platforms. Finally, from the medico-pharma perspective, a strategy for the identification of adverse drug events from medical case reports was developed. Qualitative evaluation as well as an expert validation of the developed system's performance showed robust results. In conclusion, this thesis presents approaches for efficient information retrieval and information extraction from various biomedical literature sources in the support of healthcare and pharmacovigilance. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. This can promote the literature-based knowledge discovery, improve the safety and effectiveness of medical practices, and drive the research and development in medical and healthcare arena

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

    Get PDF
    The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at UniversitĂ  degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown

    Machine Learning Methods with Noisy, Incomplete or Small Datasets

    Get PDF
    In many machine learning applications, available datasets are sometimes incomplete, noisy or affected by artifacts. In supervised scenarios, it could happen that label information has low quality, which might include unbalanced training sets, noisy labels and other problems. Moreover, in practice, it is very common that available data samples are not enough to derive useful supervised or unsupervised classifiers. All these issues are commonly referred to as the low-quality data problem. This book collects novel contributions on machine learning methods for low-quality datasets, to contribute to the dissemination of new ideas to solve this challenging problem, and to provide clear examples of application in real scenarios

    Data Mining

    Get PDF
    The availability of big data due to computerization and automation has generated an urgent need for new techniques to analyze and convert big data into useful information and knowledge. Data mining is a promising and leading-edge technology for mining large volumes of data, looking for hidden information, and aiding knowledge discovery. It can be used for characterization, classification, discrimination, anomaly detection, association, clustering, trend or evolution prediction, and much more in fields such as science, medicine, economics, engineering, computers, and even business analytics. This book presents basic concepts, ideas, and research in data mining
    corecore