77 research outputs found
Accelerating sequential programs using FastFlow and self-offloading
FastFlow is a programming environment specifically targeting cache-coherent
shared-memory multi-cores. FastFlow is implemented as a stack of C++ template
libraries built on top of lock-free (fence-free) synchronization mechanisms. In
this paper we present a further evolution of FastFlow enabling programmers to
offload part of their workload on a dynamically created software accelerator
running on unused CPUs. The offloaded function can be easily derived from
pre-existing sequential code. We emphasize in particular the effective
trade-off between human productivity and execution efficiency of the approach.Comment: 17 pages + cove
FastFlow tutorial
FastFlow is a structured parallel programming framework targeting shared
memory multicores. Its layered design and the optimized implementation of the
communication mechanisms used to implement the FastFlow streaming networks
provided to the application programmer as algorithmic skeletons support the
development of efficient fine grain parallel applications. FastFlow is
available (open source) at SourceForge
(http://sourceforge.net/projects/mc-fastflow/). This work introduces FastFlow
programming techniques and points out the different ways used to parallelize
existing C/C++ code using FastFlow as a software accelerator. In short: this is
a kind of tutorial on FastFlow.Comment: 49 pages + cove
StochKit-FF: Efficient Systems Biology on Multicore Architectures
The stochastic modelling of biological systems is an informative, and in some
cases, very adequate technique, which may however result in being more
expensive than other modelling approaches, such as differential equations. We
present StochKit-FF, a parallel version of StochKit, a reference toolkit for
stochastic simulations. StochKit-FF is based on the FastFlow programming
toolkit for multicores and exploits the novel concept of selective memory. We
experiment StochKit-FF on a model of HIV infection dynamics, with the aim of
extracting information from efficiently run experiments, here in terms of
average and variance and, on a longer term, of more structured data.Comment: 14 pages + cover pag
Experimenting with Emerging ARM and RISC-V Systems for Decentralised Machine Learning
Decentralised Machine Learning (DML) enables collaborative machine learning
without centralised input data. Federated Learning (FL) and Edge Inference are
examples of DML. While tools for DML (especially FL) are starting to flourish,
many are not flexible and portable enough to experiment with novel systems
(e.g., RISC-V), non-fully connected topologies, and asynchronous collaboration
schemes. We overcome these limitations via a domain-specific language allowing
to map DML schemes to an underlying middleware, i.e. the \ff parallel
programming library. We experiment with it by generating different working DML
schemes on two emerging architectures (ARM-v8, RISC-V) and the x86-64 platform.
We characterise the performance and energy efficiency of the presented schemes
and systems. As a byproduct, we introduce a RISC-V porting of the PyTorch
framework, the first publicly available to our knowledge
- …