130,813 research outputs found
Engineering Crowdsourced Stream Processing Systems
A crowdsourced stream processing system (CSP) is a system that incorporates
crowdsourced tasks in the processing of a data stream. This can be seen as
enabling crowdsourcing work to be applied on a sample of large-scale data at
high speed, or equivalently, enabling stream processing to employ human
intelligence. It also leads to a substantial expansion of the capabilities of
data processing systems. Engineering a CSP system requires the combination of
human and machine computation elements. From a general systems theory
perspective, this means taking into account inherited as well as emerging
properties from both these elements. In this paper, we position CSP systems
within a broader taxonomy, outline a series of design principles and evaluation
metrics, present an extensible framework for their design, and describe several
design patterns. We showcase the capabilities of CSP systems by performing a
case study that applies our proposed framework to the design and analysis of a
real system (AIDR) that classifies social media messages during time-critical
crisis events. Results show that compared to a pure stream processing system,
AIDR can achieve a higher data classification accuracy, while compared to a
pure crowdsourcing solution, the system makes better use of human workers by
requiring much less manual work effort
TMB: Automatic Differentiation and Laplace Approximation
TMB is an open source R package that enables quick implementation of complex
nonlinear random effect (latent variable) models in a manner similar to the
established AD Model Builder package (ADMB, admb-project.org). In addition, it
offers easy access to parallel computations. The user defines the joint
likelihood for the data and the random effects as a C++ template function,
while all the other operations are done in R; e.g., reading in the data. The
package evaluates and maximizes the Laplace approximation of the marginal
likelihood where the random effects are automatically integrated out. This
approximation, and its derivatives, are obtained using automatic
differentiation (up to order three) of the joint likelihood. The computations
are designed to be fast for problems with many random effects (~10^6) and
parameters (~10^3). Computation times using ADMB and TMB are compared on a
suite of examples ranging from simple models to large spatial models where the
random effects are a Gaussian random field. Speedups ranging from 1.5 to about
100 are obtained with increasing gains for large problems. The package and
examples are available at http://tmb-project.org
Non-intrusive on-the-fly data race detection using execution replay
This paper presents a practical solution for detecting data races in parallel
programs. The solution consists of a combination of execution replay (RecPlay)
with automatic on-the-fly data race detection. This combination enables us to
perform the data race detection on an unaltered execution (almost no probe
effect). Furthermore, the usage of multilevel bitmaps and snooped matrix clocks
limits the amount of memory used. As the record phase of RecPlay is highly
efficient, there is no need to switch it off, hereby eliminating the
possibility of Heisenbugs because tracing can be left on all the time.Comment: In M. Ducasse (ed), proceedings of the Fourth International Workshop
on Automated Debugging (AAdebug 2000), August 2000, Munich. cs.SE/001003
- …