Search CORE

2,571 research outputs found

Engineering Crowdsourced Stream Processing Systems

Author: Carlos Castillo
Crp Henri Tudor
Ioanna Lykourentzou
Muhammad Imran
Yannick Naudet
Publication venue
Publication date: 04/08/2014
Field of study

A crowdsourced stream processing system (CSP) is a system that incorporates crowdsourced tasks in the processing of a data stream. This can be seen as enabling crowdsourcing work to be applied on a sample of large-scale data at high speed, or equivalently, enabling stream processing to employ human intelligence. It also leads to a substantial expansion of the capabilities of data processing systems. Engineering a CSP system requires the combination of human and machine computation elements. From a general systems theory perspective, this means taking into account inherited as well as emerging properties from both these elements. In this paper, we position CSP systems within a broader taxonomy, outline a series of design principles and evaluation metrics, present an extensible framework for their design, and describe several design patterns. We showcase the capabilities of CSP systems by performing a case study that applies our proposed framework to the design and analysis of a real system (AIDR) that classifies social media messages during time-critical crisis events. Results show that compared to a pure stream processing system, AIDR can achieve a higher data classification accuracy, while compared to a pure crowdsourcing solution, the system makes better use of human workers by requiring much less manual work effort

arXiv.org e-Print Archive

CiteSeerX

A survey of the use of crowdsourcing in software engineering

Author: Capra L
Harman M
Jia Y
Mao K
Publication venue
Publication date: 01/01/2015
Field of study

The term 'crowdsourcing' was initially introduced in 2006 to describe an emerging distributed problem-solving model by online workers. Since then it has been widely studied and practiced to support software engineering. In this paper we provide a comprehensive survey of the use of crowdsourcing in software engineering, seeking to cover all literature on this topic. We first review the definitions of crowdsourcing and derive our definition of Crowdsourcing Software Engineering together with its taxonomy. Then we summarise industrial crowdsourcing practice in software engineering and corresponding case studies. We further analyse the software engineering domains, tasks and applications for crowdsourcing and the platforms and stakeholders involved in realising Crowdsourced Software Engineering solutions. We conclude by exposing trends, open issues and opportunities for future research on Crowdsourced Software Engineering

CiteSeerX

UCL Discovery

Automatic acoustic detection of birds through deep learning : the first bird audio detection challenge

Author: Abrol
Adavanne
Aide
Benetos
Breiman
Cakir
Cakir
Colonna
Digby
Fawcett
Frommolt
Furnas
Gashchak
Goëau
Grill
Hill
Hutto
Johnston
Johnston
Joppa
Kamp
Knight
Kong
LeCun
Mac Aodha
Marques
Matsubayashi
Morfi
Morfi
Niculescu-Mizil
Pamuła
Pellegrini
Salamon
Salamon
Sandsten
Stowell
Stowell
Stowell
Stowell
Sugiyama
Thakur
Tibshirani
Towsey
Urbano
Walters
Witten
Wood
Publication venue: 'Wiley'
Publication date: 26/11/2018
Field of study

Assessing the presence and abundance of birds is important for monitoring specific species as well as overall ecosystem health. Many birds are most readily detected by their sounds, and thus passive acoustic monitoring is highly appropriate. Yet acoustic monitoring is often held back by practical limitations such as the need for manual configuration, reliance on example sound libraries, low accuracy, low robustness, and limited ability to generalise to novel acoustic conditions. Here we report outcomes from a collaborative data challenge. We present new acoustic monitoring datasets, summarise the machine learning techniques proposed by challenge teams, conduct detailed performance evaluation, and discuss how such approaches to detection can be integrated into remote monitoring projects. Multiple methods were able to attain performance of around 88% AUC (area under the ROC curve), much higher performance than previous general‐purpose methods. With modern machine learning including deep learning, general‐purpose acoustic bird detection can achieve very high retrieval rates in remote monitoring data ̶ with no manual recalibration, and no pre‐training of the detector for the target species or the acoustic conditions in the target environment.</ol

University of Salford Institutional Repository

Crossref

Queen Mary Research Online

A Framework for Exploring and Evaluating Mechanics in Human Computation Games

Author: Krause Markus
Lee Jong-Chuan
Siu Kristin
Tuite Kathleen
Publication venue
Publication date: 11/06/2017
Field of study

Human computation games (HCGs) are a crowdsourcing approach to solving computationally-intractable tasks using games. In this paper, we describe the need for generalizable HCG design knowledge that accommodates the needs of both players and tasks. We propose a formal representation of the mechanics in HCGs, providing a structural breakdown to visualize, compare, and explore the space of HCG mechanics. We present a methodology based on small-scale design experiments using fixed tasks while varying game elements to observe effects on both the player experience and the human computation task completion. Finally we discuss applications of our framework using comparisons of prior HCGs and recent design experiments. Ultimately, we wish to enable easier exploration and development of HCGs, helping these games provide meaningful player experiences while solving difficult problems.Comment: 11 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Domain Agnostic Real-Valued Specificity Prediction

Author: Durrett Greg
Ko Wei-Jen
Li Junyi Jessy
Publication venue
Publication date: 14/11/2018
Field of study

Sentence specificity quantifies the level of detail in a sentence, characterizing the organization of information in discourse. While this information is useful for many downstream applications, specificity prediction systems predict very coarse labels (binary or ternary) and are trained on and tailored toward specific domains (e.g., news). The goal of this work is to generalize specificity prediction to domains where no labeled data is available and output more nuanced real-valued specificity ratings. We present an unsupervised domain adaptation system for sentence specificity prediction, specifically designed to output real-valued estimates from binary training labels. To calibrate the values of these predictions appropriately, we regularize the posterior distribution of the labels towards a reference distribution. We show that our framework generalizes well to three different domains with 50%~68% mean absolute error reduction than the current state-of-the-art system trained for news sentence specificity. We also demonstrate the potential of our work in improving the quality and informativeness of dialogue generation systems.Comment: AAAI 2019 camera read

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Explicit diversification of event aspects for temporal summarization

Author: Macdonald Craig
McCreadie Richard
Ounis Iadh
Santos Rodrygo L.T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/02/2018
Field of study

During major events, such as emergencies and disasters, a large volume of information is reported on newswire and social media platforms. Temporal summarization (TS) approaches are used to automatically produce concise overviews of such events by extracting text snippets from related articles over time. Current TS approaches rely on a combination of event relevance and textual novelty for snippet selection. However, for events that span multiple days, textual novelty is often a poor criterion for selecting snippets, since many snippets are textually unique but are semantically redundant or non-informative. In this article, we propose a framework for the diversification of snippets using explicit event aspects, building on recent works in search result diversification. In particular, we first propose two techniques to identify explicit aspects that a user might want to see covered in a summary for different types of event. We then extend a state-of-the-art explicit diversification framework to maximize the coverage of these aspects when selecting summary snippets for unseen events. Through experimentation over the TREC TS 2013, 2014, and 2015 datasets, we show that explicit diversification for temporal summarization significantly outperforms classical novelty-based diversification, as the use of explicit event aspects reduces the amount of redundant and off-topic snippets returned, while also increasing summary timeliness

Crossref

Enlighten