3,161 research outputs found
T-Crowd: Effective Crowdsourcing for Tabular Data
Crowdsourcing employs human workers to solve computer-hard problems, such as
data cleaning, entity resolution, and sentiment analysis. When crowdsourcing
tabular data, e.g., the attribute values of an entity set, a worker's answers
on the different attributes (e.g., the nationality and age of a celebrity star)
are often treated independently. This assumption is not always true and can
lead to suboptimal crowdsourcing performance. In this paper, we present the
T-Crowd system, which takes into consideration the intricate relationships
among tasks, in order to converge faster to their true values. Particularly,
T-Crowd integrates each worker's answers on different attributes to effectively
learn his/her trustworthiness and the true data values. The attribute
relationship information is also used to guide task allocation to workers.
Finally, T-Crowd seamlessly supports categorical and continuous attributes,
which are the two main datatypes found in typical databases. Our extensive
experiments on real and synthetic datasets show that T-Crowd outperforms
state-of-the-art methods in terms of truth inference and reducing the cost of
crowdsourcing
Quality of Information in Mobile Crowdsensing: Survey and Research Challenges
Smartphones have become the most pervasive devices in people's lives, and are
clearly transforming the way we live and perceive technology. Today's
smartphones benefit from almost ubiquitous Internet connectivity and come
equipped with a plethora of inexpensive yet powerful embedded sensors, such as
accelerometer, gyroscope, microphone, and camera. This unique combination has
enabled revolutionary applications based on the mobile crowdsensing paradigm,
such as real-time road traffic monitoring, air and noise pollution, crime
control, and wildlife monitoring, just to name a few. Differently from prior
sensing paradigms, humans are now the primary actors of the sensing process,
since they become fundamental in retrieving reliable and up-to-date information
about the event being monitored. As humans may behave unreliably or
maliciously, assessing and guaranteeing Quality of Information (QoI) becomes
more important than ever. In this paper, we provide a new framework for
defining and enforcing the QoI in mobile crowdsensing, and analyze in depth the
current state-of-the-art on the topic. We also outline novel research
challenges, along with possible directions of future work.Comment: To appear in ACM Transactions on Sensor Networks (TOSN
Entities as topic labels: Improving topic interpretability and evaluability combining Entity Linking and Labeled LDA
In order to create a corpus exploration method providing topics that are
easier to interpret than standard LDA topic models, here we propose combining
two techniques called Entity linking and Labeled LDA. Our method identifies in
an ontology a series of descriptive labels for each document in a corpus. Then
it generates a specific topic for each label. Having a direct relation between
topics and labels makes interpretation easier; using an ontology as background
knowledge limits label ambiguity. As our topics are described with a limited
number of clear-cut labels, they promote interpretability, and this may help
quantitative evaluation. We illustrate the potential of the approach by
applying it in order to define the most relevant topics addressed by each party
in the European Parliament's fifth mandate (1999-2004).Comment: in Proceedings of Digital Humanities 2016, Krako
Engineering Crowdsourced Stream Processing Systems
A crowdsourced stream processing system (CSP) is a system that incorporates
crowdsourced tasks in the processing of a data stream. This can be seen as
enabling crowdsourcing work to be applied on a sample of large-scale data at
high speed, or equivalently, enabling stream processing to employ human
intelligence. It also leads to a substantial expansion of the capabilities of
data processing systems. Engineering a CSP system requires the combination of
human and machine computation elements. From a general systems theory
perspective, this means taking into account inherited as well as emerging
properties from both these elements. In this paper, we position CSP systems
within a broader taxonomy, outline a series of design principles and evaluation
metrics, present an extensible framework for their design, and describe several
design patterns. We showcase the capabilities of CSP systems by performing a
case study that applies our proposed framework to the design and analysis of a
real system (AIDR) that classifies social media messages during time-critical
crisis events. Results show that compared to a pure stream processing system,
AIDR can achieve a higher data classification accuracy, while compared to a
pure crowdsourcing solution, the system makes better use of human workers by
requiring much less manual work effort
- …