2,938 research outputs found
Engineering Crowdsourced Stream Processing Systems
A crowdsourced stream processing system (CSP) is a system that incorporates
crowdsourced tasks in the processing of a data stream. This can be seen as
enabling crowdsourcing work to be applied on a sample of large-scale data at
high speed, or equivalently, enabling stream processing to employ human
intelligence. It also leads to a substantial expansion of the capabilities of
data processing systems. Engineering a CSP system requires the combination of
human and machine computation elements. From a general systems theory
perspective, this means taking into account inherited as well as emerging
properties from both these elements. In this paper, we position CSP systems
within a broader taxonomy, outline a series of design principles and evaluation
metrics, present an extensible framework for their design, and describe several
design patterns. We showcase the capabilities of CSP systems by performing a
case study that applies our proposed framework to the design and analysis of a
real system (AIDR) that classifies social media messages during time-critical
crisis events. Results show that compared to a pure stream processing system,
AIDR can achieve a higher data classification accuracy, while compared to a
pure crowdsourcing solution, the system makes better use of human workers by
requiring much less manual work effort
A-posteriori provenance-enabled linking of publications and datasets via crowdsourcing
This paper aims to share with the digital library community different opportunities to leverage crowdsourcing for a-posteriori capturing of dataset citation graphs. We describe a practical approach, which exploits one possible crowdsourcing technique to collect these graphs from domain experts and proposes their publication as Linked Data using the W3C PROV standard. Based on our findings from a study we ran during the USEWOD 2014 workshop, we propose a semi-automatic approach that generates metadata by leveraging information extraction as an additional step to crowdsourcing, to generate high-quality data citation graphs. Furthermore, we consider the design implications on our crowdsourcing approach when non-expert participants are involved in the process<br/
Revisiting Prompt Engineering via Declarative Crowdsourcing
Large language models (LLMs) are incredibly powerful at comprehending and
generating data in the form of text, but are brittle and error-prone. There has
been an advent of toolkits and recipes centered around so-called prompt
engineering-the process of asking an LLM to do something via a series of
prompts. However, for LLM-powered data processing workflows, in particular,
optimizing for quality, while keeping cost bounded, is a tedious, manual
process. We put forth a vision for declarative prompt engineering. We view LLMs
like crowd workers and leverage ideas from the declarative crowdsourcing
literature-including leveraging multiple prompting strategies, ensuring
internal consistency, and exploring hybrid-LLM-non-LLM approaches-to make
prompt engineering a more principled process. Preliminary case studies on
sorting, entity resolution, and imputation demonstrate the promise of our
approac
- …