2,570 research outputs found
Engineering Crowdsourced Stream Processing Systems
A crowdsourced stream processing system (CSP) is a system that incorporates
crowdsourced tasks in the processing of a data stream. This can be seen as
enabling crowdsourcing work to be applied on a sample of large-scale data at
high speed, or equivalently, enabling stream processing to employ human
intelligence. It also leads to a substantial expansion of the capabilities of
data processing systems. Engineering a CSP system requires the combination of
human and machine computation elements. From a general systems theory
perspective, this means taking into account inherited as well as emerging
properties from both these elements. In this paper, we position CSP systems
within a broader taxonomy, outline a series of design principles and evaluation
metrics, present an extensible framework for their design, and describe several
design patterns. We showcase the capabilities of CSP systems by performing a
case study that applies our proposed framework to the design and analysis of a
real system (AIDR) that classifies social media messages during time-critical
crisis events. Results show that compared to a pure stream processing system,
AIDR can achieve a higher data classification accuracy, while compared to a
pure crowdsourcing solution, the system makes better use of human workers by
requiring much less manual work effort
A survey of the use of crowdsourcing in software engineering
The term 'crowdsourcing' was initially introduced in 2006 to describe an emerging distributed problem-solving model by online workers. Since then it has been widely studied and practiced to support software engineering. In this paper we provide a comprehensive survey of the use of crowdsourcing in software engineering, seeking to cover all literature on this topic. We first review the definitions of crowdsourcing and derive our definition of Crowdsourcing Software Engineering together with its taxonomy. Then we summarise industrial crowdsourcing practice in software engineering and corresponding case studies. We further analyse the software engineering domains, tasks and applications for crowdsourcing and the platforms and stakeholders involved in realising Crowdsourced Software Engineering solutions. We conclude by exposing trends, open issues and opportunities for future research on Crowdsourced Software Engineering
A Framework for Exploring and Evaluating Mechanics in Human Computation Games
Human computation games (HCGs) are a crowdsourcing approach to solving
computationally-intractable tasks using games. In this paper, we describe the
need for generalizable HCG design knowledge that accommodates the needs of both
players and tasks. We propose a formal representation of the mechanics in HCGs,
providing a structural breakdown to visualize, compare, and explore the space
of HCG mechanics. We present a methodology based on small-scale design
experiments using fixed tasks while varying game elements to observe effects on
both the player experience and the human computation task completion. Finally
we discuss applications of our framework using comparisons of prior HCGs and
recent design experiments. Ultimately, we wish to enable easier exploration and
development of HCGs, helping these games provide meaningful player experiences
while solving difficult problems.Comment: 11 pages, 5 figure
Automatic acoustic detection of birds through deep learning : the first bird audio detection challenge
Assessing the presence and abundance of birds is important for monitoring specific species as well as overall ecosystem health. Many birds are most readily detected by their sounds, and thus passive acoustic monitoring is highly appropriate. Yet acoustic monitoring is often held back by practical limitations such as the need for manual configuration, reliance on example sound libraries, low accuracy, low robustness, and limited ability to generalise to novel acoustic conditions.
Here we report outcomes from a collaborative data challenge. We present new acoustic monitoring datasets, summarise the machine learning techniques proposed by challenge teams, conduct detailed performance evaluation, and discuss how such approaches to detection can be integrated into remote monitoring projects.
Multiple methods were able to attain performance of around 88% AUC (area under the ROC curve), much higher performance than previous generalāpurpose methods.
With modern machine learning including deep learning, generalāpurpose acoustic bird detection can achieve very high retrieval rates in remote monitoring data Ģ¶ with no manual recalibration, and no preātraining of the detector for the target species or the acoustic conditions in the target environment.</ol
Domain Agnostic Real-Valued Specificity Prediction
Sentence specificity quantifies the level of detail in a sentence,
characterizing the organization of information in discourse. While this
information is useful for many downstream applications, specificity prediction
systems predict very coarse labels (binary or ternary) and are trained on and
tailored toward specific domains (e.g., news). The goal of this work is to
generalize specificity prediction to domains where no labeled data is available
and output more nuanced real-valued specificity ratings.
We present an unsupervised domain adaptation system for sentence specificity
prediction, specifically designed to output real-valued estimates from binary
training labels. To calibrate the values of these predictions appropriately, we
regularize the posterior distribution of the labels towards a reference
distribution. We show that our framework generalizes well to three different
domains with 50%~68% mean absolute error reduction than the current
state-of-the-art system trained for news sentence specificity. We also
demonstrate the potential of our work in improving the quality and
informativeness of dialogue generation systems.Comment: AAAI 2019 camera read
Explicit diversification of event aspects for temporal summarization
During major events, such as emergencies and disasters, a large volume of information is reported on newswire and social media platforms. Temporal summarization (TS) approaches are used to automatically produce concise overviews of such events by extracting text snippets from related articles over time. Current TS approaches rely on a combination of event relevance and textual novelty for snippet selection. However, for events that span multiple days, textual novelty is often a poor criterion for selecting snippets, since many snippets are textually unique but are semantically redundant or non-informative. In this article, we propose a framework for the diversification of snippets using explicit event aspects, building on recent works in search result diversification. In particular, we first propose two techniques to identify explicit aspects that a user might want to see covered in a summary for different types of event. We then extend a state-of-the-art explicit diversification framework to maximize the coverage of these aspects when selecting summary snippets for unseen events. Through experimentation over the TREC TS 2013, 2014, and 2015 datasets, we show that explicit diversification for temporal summarization significantly outperforms classical novelty-based diversification, as the use of explicit event aspects reduces the amount of redundant and off-topic snippets returned, while also increasing summary timeliness
- ā¦