8,144 research outputs found
Towards a classification framework for social machines
The state of the art in human interaction with computational systems blurs the line between computations performed by machine logic and algorithms, and those that result from input by humans, arising from their own psychological processes and life experience. Current socio-technical systems, known as āsocial machinesā exploit the large-scale interaction of humans with machines. Interactions that are motivated by numerous goals and purposes including financial gain, charitable aid, and simply for fun. In this paper we explore the landscape of social machines, both past and present, with the aim of defining an initial classificatory framework. Through a number of knowledge elicitation and refinement exercises we have identified the polyarchical relationship between infrastructure, social machines, and large-scale social initiatives. Our initial framework describes classification constructs in the areas of contributions, participants, and motivation. We present an initial characterization of some of the most popular social machines, as demonstration of the use of the identified constructs. We believe that it is important to undertake an analysis of the behaviour and phenomenology of social machines, and of their growth and evolution over time. Our future work will seek to elicit additional opinions, classifications and validation from a wider audience, to produce a comprehensive framework for the description, analysis and comparison of social machines
Human Computation and Convergence
Humans are the most effective integrators and producers of information,
directly and through the use of information-processing inventions. As these
inventions become increasingly sophisticated, the substantive role of humans in
processing information will tend toward capabilities that derive from our most
complex cognitive processes, e.g., abstraction, creativity, and applied world
knowledge. Through the advancement of human computation - methods that leverage
the respective strengths of humans and machines in distributed
information-processing systems - formerly discrete processes will combine
synergistically into increasingly integrated and complex information processing
systems. These new, collective systems will exhibit an unprecedented degree of
predictive accuracy in modeling physical and techno-social processes, and may
ultimately coalesce into a single unified predictive organism, with the
capacity to address societies most wicked problems and achieve planetary
homeostasis.Comment: Pre-publication draft of chapter. 24 pages, 3 figures; added
references to page 1 and 3, and corrected typ
Crime applications and social machines: crowdsourcing sensitive data
The authors explore some issues with the United Kingdom (U.K.) crime reporting and recording systems which currently produce Open Crime Data. The availability of Open Crime Data seems to create a potential data ecosystem which would encourage crowdsourcing, or the creation of social machines, in order to counter some of these issues. While such solutions are enticing, we suggest that in fact the theoretical solution brings to light fairly compelling problems, which highlight some limitations of crowdsourcing as a means of addressing Berners-Leeās āsocial constraint.ā The authors present a thought experiment ā a Gendankenexperiment - in order to explore the implications, both good and bad, of a social machine in such a sensitive space and suggest a Web Science perspective to pick apart the ramifications of this thought experiment as a theoretical approach to the characterisation of social machine
Engineering Crowdsourced Stream Processing Systems
A crowdsourced stream processing system (CSP) is a system that incorporates
crowdsourced tasks in the processing of a data stream. This can be seen as
enabling crowdsourcing work to be applied on a sample of large-scale data at
high speed, or equivalently, enabling stream processing to employ human
intelligence. It also leads to a substantial expansion of the capabilities of
data processing systems. Engineering a CSP system requires the combination of
human and machine computation elements. From a general systems theory
perspective, this means taking into account inherited as well as emerging
properties from both these elements. In this paper, we position CSP systems
within a broader taxonomy, outline a series of design principles and evaluation
metrics, present an extensible framework for their design, and describe several
design patterns. We showcase the capabilities of CSP systems by performing a
case study that applies our proposed framework to the design and analysis of a
real system (AIDR) that classifies social media messages during time-critical
crisis events. Results show that compared to a pure stream processing system,
AIDR can achieve a higher data classification accuracy, while compared to a
pure crowdsourcing solution, the system makes better use of human workers by
requiring much less manual work effort
Gravity Spy: Integrating Advanced LIGO Detector Characterization, Machine Learning, and Citizen Science
(abridged for arXiv) With the first direct detection of gravitational waves,
the Advanced Laser Interferometer Gravitational-wave Observatory (LIGO) has
initiated a new field of astronomy by providing an alternate means of sensing
the universe. The extreme sensitivity required to make such detections is
achieved through exquisite isolation of all sensitive components of LIGO from
non-gravitational-wave disturbances. Nonetheless, LIGO is still susceptible to
a variety of instrumental and environmental sources of noise that contaminate
the data. Of particular concern are noise features known as glitches, which are
transient and non-Gaussian in their nature, and occur at a high enough rate so
that accidental coincidence between the two LIGO detectors is non-negligible.
In this paper we describe an innovative project that combines crowdsourcing
with machine learning to aid in the challenging task of categorizing all of the
glitches recorded by the LIGO detectors. Through the Zooniverse platform, we
engage and recruit volunteers from the public to categorize images of glitches
into pre-identified morphological classes and to discover new classes that
appear as the detectors evolve. In addition, machine learning algorithms are
used to categorize images after being trained on human-classified examples of
the morphological classes. Leveraging the strengths of both classification
methods, we create a combined method with the aim of improving the efficiency
and accuracy of each individual classifier. The resulting classification and
characterization should help LIGO scientists to identify causes of glitches and
subsequently eliminate them from the data or the detector entirely, thereby
improving the rate and accuracy of gravitational-wave observations. We
demonstrate these methods using a small subset of data from LIGO's first
observing run.Comment: 27 pages, 8 figures, 1 tabl
Data Cleaning for XML Electronic Dictionaries via Statistical Anomaly Detection
Many important forms of data are stored digitally in XML format. Errors can
occur in the textual content of the data in the fields of the XML. Fixing these
errors manually is time-consuming and expensive, especially for large amounts
of data. There is increasing interest in the research, development, and use of
automated techniques for assisting with data cleaning. Electronic dictionaries
are an important form of data frequently stored in XML format that frequently
have errors introduced through a mixture of manual typographical entry errors
and optical character recognition errors. In this paper we describe methods for
flagging statistical anomalies as likely errors in electronic dictionaries
stored in XML format. We describe six systems based on different sources of
information. The systems detect errors using various signals in the data
including uncommon characters, text length, character-based language models,
word-based language models, tied-field length ratios, and tied-field
transliteration models. Four of the systems detect errors based on expectations
automatically inferred from content within elements of a single field type. We
call these single-field systems. Two of the systems detect errors based on
correspondence expectations automatically inferred from content within elements
of multiple related field types. We call these tied-field systems. For each
system, we provide an intuitive analysis of the type of error that it is
successful at detecting. Finally, we describe two larger-scale evaluations
using crowdsourcing with Amazon's Mechanical Turk platform and using the
annotations of a domain expert. The evaluations consistently show that the
systems are useful for improving the efficiency with which errors in XML
electronic dictionaries can be detected.Comment: 8 pages, 4 figures, 5 tables; published in Proceedings of the 2016
IEEE Tenth International Conference on Semantic Computing (ICSC), Laguna
Hills, CA, USA, pages 79-86, February 201
Empirical Methodology for Crowdsourcing Ground Truth
The process of gathering ground truth data through human annotation is a
major bottleneck in the use of information extraction methods for populating
the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the
attempt to solve the issues related to volume of data and lack of annotators.
Typically these practices use inter-annotator agreement as a measure of
quality. However, in many domains, such as event detection, there is ambiguity
in the data, as well as a multitude of perspectives of the information
examples. We present an empirically derived methodology for efficiently
gathering of ground truth data in a diverse set of use cases covering a variety
of domains and annotation tasks. Central to our approach is the use of
CrowdTruth metrics that capture inter-annotator disagreement. We show that
measuring disagreement is essential for acquiring a high quality ground truth.
We achieve this by comparing the quality of the data aggregated with CrowdTruth
metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical
Relation Extraction, Twitter Event Identification, News Event Extraction and
Sound Interpretation. We also show that an increased number of crowd workers
leads to growth and stabilization in the quality of annotations, going against
the usual practice of employing a small number of annotators.Comment: in publication at the Semantic Web Journa
- ā¦