31,082 research outputs found

    Empirical Methodology for Crowdsourcing Ground Truth

    Full text link
    The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods for populating the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to volume of data and lack of annotators. Typically these practices use inter-annotator agreement as a measure of quality. However, in many domains, such as event detection, there is ambiguity in the data, as well as a multitude of perspectives of the information examples. We present an empirically derived methodology for efficiently gathering of ground truth data in a diverse set of use cases covering a variety of domains and annotation tasks. Central to our approach is the use of CrowdTruth metrics that capture inter-annotator disagreement. We show that measuring disagreement is essential for acquiring a high quality ground truth. We achieve this by comparing the quality of the data aggregated with CrowdTruth metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical Relation Extraction, Twitter Event Identification, News Event Extraction and Sound Interpretation. We also show that an increased number of crowd workers leads to growth and stabilization in the quality of annotations, going against the usual practice of employing a small number of annotators.Comment: in publication at the Semantic Web Journa

    Citizen Science 2.0 : Data Management Principles to Harness the Power of the Crowd

    Get PDF
    Citizen science refers to voluntary participation by the general public in scientific endeavors. Although citizen science has a long tradition, the rise of online communities and user-generated web content has the potential to greatly expand its scope and contributions. Citizens spread across a large area will collect more information than an individual researcher can. Because citizen scientists tend to make observations about areas they know well, data are likely to be very detailed. Although the potential for engaging citizen scientists is extensive, there are challenges as well. In this paper we consider one such challenge – creating an environment in which non-experts in a scientific domain can provide appropriate and accurate data regarding their observations. We describe the problem in the context of a research project that includes the development of a website to collect citizen-generated data on the distribution of plants and animals in a geographic region. We propose an approach that can improve the quantity and quality of data collected in such projects by organizing data using instance-based data structures. Potential implications of this approach are discussed and plans for future research to validate the design are described

    Online algorithms for POMDPs with continuous state, action, and observation spaces

    Full text link
    Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. This paper proposes and evaluates two new algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using weighted particle filtering. Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.Comment: Added Multilane sectio

    Meeting of the MINDS: an information retrieval research agenda

    Get PDF
    Since its inception in the late 1950s, the field of Information Retrieval (IR) has developed tools that help people find, organize, and analyze information. The key early influences on the field are well-known. Among them are H. P. Luhn's pioneering work, the development of the vector space retrieval model by Salton and his students, Cleverdon's development of the Cranfield experimental methodology, Spärck Jones' development of idf, and a series of probabilistic retrieval models by Robertson and Croft. Until the development of the WorldWideWeb (Web), IR was of greatest interest to professional information analysts such as librarians, intelligence analysts, the legal community, and the pharmaceutical industry
    • …
    corecore