29 research outputs found

    Crowdsourced Rumour Identification During Emergencies

    Get PDF
    When a significant event occurs, many social media users leverage platforms such as Twitter to track that event. Moreover, emergency response agencies are increasingly looking to social media as a source of real-time information about such events. However, false information and rumours are often spread during such events, which can influence public opinion and limit the usefulness of social media for emergency management. In this paper, we present an initial study into rumour identification during emergencies using crowdsourcing. In particular, through an analysis of three tweet datasets relating to emergency events from 2014, we propose a taxonomy of tweets relating to rumours. We then perform a crowdsourced labeling experiment to determine whether crowd assessors can identify rumour-related tweets and where such labeling can fail. Our results show that overall, agreement over the tweet labels produced were high (0.7634 Fleiss Kappa), indicating that crowd-based rumour labeling is possible. However, not all tweets are of equal difficulty to assess. Indeed, we show that tweets containing disputed/controversial information tend to be some of the most difficult to identify

    A Study of Realtime Summarization Metrics

    Get PDF
    Unexpected news events, such as natural disasters or other human tragedies, create a large volume of dynamic text data from official news media as well as less formal social media. Automatic real-time text summarization has become an important tool for quickly transforming this overabundance of text into clear, useful information for end-users including affected individuals, crisis responders, and interested third parties. Despite the importance of real-time summarization systems, their evaluation is not well understood as classic methods for text summarization are inappropriate for real-time and streaming conditions. The TREC 2013-2015 Temporal Summarization (TREC-TS) track was one of the first evaluation campaigns to tackle the challenges of real-time summarization evaluation, introducing new metrics, ground-truth generation methodology and dataset. In this paper, we present a study of TREC-TS track evaluation methodology, with the aim of documenting its design, analyzing its effectiveness, as well as identifying improvements and best practices for the evaluation of temporal summarization systems

    Creation of Reliable Relevance Judgments in Information Retrieval Systems Evaluation Experimentation through Crowdsourcing: A Review

    Get PDF
    Test collection is used to evaluate the information retrieval systems in laboratory-based evaluation experimentation. In a classic setting, generating relevance judgments involves human assessors and is a costly and time consuming task. Researchers and practitioners are still being challenged in performing reliable and low-cost evaluation of retrieval systems. Crowdsourcing as a novel method of data acquisition is broadly used in many research fields. It has been proven that crowdsourcing is an inexpensive and quick solution as well as a reliable alternative for creating relevance judgments. One of the crowdsourcing applications in IR is to judge relevancy of query document pair. In order to have a successful crowdsourcing experiment, the relevance judgment tasks should be designed precisely to emphasize quality control. This paper is intended to explore different factors that have an influence on the accuracy of relevance judgments accomplished by workers and how to intensify the reliability of judgments in crowdsourcing experiment

    Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data

    Get PDF
    Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers’ attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences

    Crowdsourcing solutions to 2D irregular strip packing problems from Internet workers

    Get PDF
    Many industrial processes require the nesting of 2D profiles prior to the cutting, or stamping, of components from raw sheet material. Despite decades of sustained academic effort algorithmic solutions are still sub-optimal and produce results that can frequently be improved by manual inspection. However the Internet offers the prospect of novel ‘human-in-the-loop’ approaches to nesting problems, that uses online workers to produce packing efficiencies beyond the reach of current CAM packages. To investigate the feasibility of such an approach this paper reports on the speed and efficiency of online workers engaged in the interactive nesting of six standard benchmark datasets. To ensure the results accurately characterise the diverse educational and social backgrounds of the many different labour forces available online, the study has been conducted with subjects based in both Indian IT service (i.e. Rural BPOs) centres and a network of homeworkers in northern Scotland. The results (i.e. time and packing efficiency) of the human workers are contrasted with both the baseline performance of a commercial CAM package and recent research results. The paper concludes that online workers could consistently achieve packing efficiencies roughly 4% higher than the commercial based-line established by the project. Beyond characterizing the abilities of online workers to nest components, the results also make a contribution to the development of algorithmic solutions by reporting new solutions to the benchmark problems and demonstrating methods for assessing the packing strategy employed by the best workers

    Personalized diversification of search results

    Full text link

    CrowdTrusting: Case Studies in Crowdsourcing Projects

    Get PDF
    Crowdsourcing has gained popularity over the past few years as a way for library and archive professionals to supplement and enhance the description of their collections. This paper provides case studies of four community archiving projects, focusing on crowdsourcing techniques they used to describe or enlarge their collections. The studies were conducted to determine the kinds of techniques used in community archives, and the potential benefits and barriers they faced in developing and using the techniques. Analysis of the projects indicated that the up-front investment in developing crowdsourcing tools may be prohibitive for community archiving projects. However, the results also indicated that digitization projects were still of value.Master of Science in Information Scienc
    corecore